This question already has an answer here:
这个问题在这里已有答案:
- Reference - What does this regex mean? 1 answer
- 参考 - 这个正则表达式意味着什么? 1个答案
I am new to python. I was going through a repository on gitHub , and I saw the following line of code to extract all URLs from a webpage. I understand Regular expressions and capture groups , but I don't understand why there are extra double quotation marks enclosed within the single quotation marks?
我是python的新手。我正在浏览gitHub上的一个存储库,我看到以下代码行从网页中提取所有URL。我理解正则表达式和捕获组,但我不明白为什么单引号中包含额外的双引号?
links = re.findall('"((http|ftp)s?://.*?)"', html)
That is, how is it different from the following code ?
也就是说,它与以下代码有什么不同?
links = re.findall('((http|ftp)s?://.*?)', html)
I tried experimenting and saw that only the first one matches the URL syntax correctly but the second one doesn't . But I don't understand why.
我试过试验,发现只有第一个正确匹配URL语法,但第二个没有。但我不明白为什么。
Any help is appreciated.
任何帮助表示赞赏。
Thank you.
谢谢。
1 个解决方案
#1
1
The double quotes are part of the regex. They ensure that the pattern only matches if it is actually surrounded by quotes; so foo bar http://whatever.com
wouldn't match, but <a href="http://whatever.com">
will.
双引号是正则表达式的一部分。它们确保模式只有在实际被引号括起时才匹配;所以foo bar http://whatever.com不匹配,但会。
Note this is a really fragile way of doing things, though, since single quotes are also valid in HTML but wouldn't match the regex.
请注意,这是一种非常脆弱的处理方式,因为单引号在HTML中也有效但与正则表达式不匹配。
#1
1
The double quotes are part of the regex. They ensure that the pattern only matches if it is actually surrounded by quotes; so foo bar http://whatever.com
wouldn't match, but <a href="http://whatever.com">
will.
双引号是正则表达式的一部分。它们确保模式只有在实际被引号括起时才匹配;所以foo bar http://whatever.com不匹配,但会。
Note this is a really fragile way of doing things, though, since single quotes are also valid in HTML but wouldn't match the regex.
请注意,这是一种非常脆弱的处理方式,因为单引号在HTML中也有效但与正则表达式不匹配。