使用Python删除子字符串

I already extract some information from a forum. It is the raw string I have now:

我已经从一个论坛中提取了一些信息。这是我现在拥有的原始字符串:

string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'

The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">" and "<font color="green"><font face="Arial">". I do want to keep the other part of string except this. So the result should be like this

我不喜欢的是子字符串""和" font color="green">"。我想保留弦的另一部分，除了这个。结果应该是这样的

resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.

我该怎么做呢?实际上，我用了漂亮的汤从论坛上提取了上面的线。现在我可能更喜欢正则表达式来删除部分。

2 个解决方案

#1

import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

The re.sub function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>') and replacing them with nothing ('').

re.sub函数接受一个常规的expresion，并将字符串中的所有匹配项替换为第二个参数。在本例中，我们搜索所有标记('<.*?>')，并用nothing(")替换它们。

The ? is used in re for non-greedy searches.

的吗?用于非贪婪搜索。

More about the re module.

更多关于re模块的内容。

#2

>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>>

#1

import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"