I already extract some information from a forum. It is the raw string I have now:
我已经从一个论坛中提取了一些信息。这是我现在拥有的原始字符串:
string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'
The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">"
and "<font color="green"><font face="Arial">"
. I do want to keep the other part of string except this. So the result should be like this
我不喜欢的是子字符串""和" font color="green">"。我想保留弦的另一部分,除了这个。结果应该是这样的
resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.
我该怎么做呢?实际上,我用了漂亮的汤从论坛上提取了上面的线。现在我可能更喜欢正则表达式来删除部分。
2 个解决方案
#1
63
import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
The re.sub
function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>'
) and replacing them with nothing (''
).
re.sub函数接受一个常规的expresion,并将字符串中的所有匹配项替换为第二个参数。在本例中,我们搜索所有标记('<.*?>'),并用nothing(")替换它们。
The ?
is used in re
for non-greedy searches.
的吗?用于非贪婪搜索。
More about the re module
.
更多关于re模块的内容。
#2
11
>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>>
#1
63
import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
The re.sub
function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>'
) and replacing them with nothing (''
).
re.sub函数接受一个常规的expresion,并将字符串中的所有匹配项替换为第二个参数。在本例中,我们搜索所有标记('<.*?>'),并用nothing(")替换它们。
The ?
is used in re
for non-greedy searches.
的吗?用于非贪婪搜索。
More about the re module
.
更多关于re模块的内容。
#2
11
>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>>