re表达式中单引号内的双引号（python）[duplicate]

This question already has an answer here:

这个问题在这里已有答案：

Reference - What does this regex mean? 1 answer
参考 - 这个正则表达式意味着什么？ 1个答案

I am new to python. I was going through a repository on gitHub , and I saw the following line of code to extract all URLs from a webpage. I understand Regular expressions and capture groups , but I don't understand why there are extra double quotation marks enclosed within the single quotation marks?

我是python的新手。我正在浏览gitHub上的一个存储库，我看到以下代码行从网页中提取所有URL。我理解正则表达式和捕获组，但我不明白为什么单引号中包含额外的双引号？

links = re.findall('"((http|ftp)s?://.*?)"', html)

That is, how is it different from the following code ?

也就是说，它与以下代码有什么不同？

links = re.findall('((http|ftp)s?://.*?)', html)

I tried experimenting and saw that only the first one matches the URL syntax correctly but the second one doesn't . But I don't understand why.

我试过试验，发现只有第一个正确匹配URL语法，但第二个没有。但我不明白为什么。

Any help is appreciated.

任何帮助表示赞赏。

Thank you.

谢谢。

1 个解决方案

#1

The double quotes are part of the regex. They ensure that the pattern only matches if it is actually surrounded by quotes; so foo bar http://whatever.com wouldn't match, but <a href="http://whatever.com"> will.

双引号是正则表达式的一部分。它们确保模式只有在实际被引号括起时才匹配;所以foo bar http://whatever.com不匹配，但会。

Note this is a really fragile way of doing things, though, since single quotes are also valid in HTML but wouldn't match the regex.

请注意，这是一种非常脆弱的处理方式，因为单引号在HTML中也有效但与正则表达式不匹配。

#1

双引号是正则表达式的一部分。它们确保模式只有在实际被引号括起时才匹配;所以foo bar http://whatever.com不匹配，但会。

Note this is a really fragile way of doing things, though, since single quotes are also valid in HTML but wouldn't match the regex.

请注意，这是一种非常脆弱的处理方式，因为单引号在HTML中也有效但与正则表达式不匹配。

秒客网

re表达式中单引号内的双引号（python）[duplicate]

1 个解决方案

#1

#1

相关文章