Related question: How can I use regex to match a character (') when not following a specific character (?)?
相关问题:如何在不遵循特定字符(?)时使用正则表达式匹配字符(')?
I'm parsing a log using regex (PHP PCRE library), and trying to extract a URL from it. The URL is encapsulated in double quotes ", but some of the requests also include a double quote ". For example:
我正在使用正则表达式(PHP PCRE库)解析日志,并尝试从中提取URL。 URL用双引号“封装,但有些请求还包含双引号”。例如:
"https://www.amh.net.au/online/dbSearch.php?t=all&q=\"Rosuvastatin\""
My first pattern was basically:
我的第一个模式基本上是:
#\"([^\"]*)\"#
This worked well, until I reached one of the entries as above, and it truncated the match so all I got was:
这很有效,直到我达到上面的一个条目,它截断了匹配,所以我得到的是:
https://www.amh.net.au/online/dbSearch.php?t=all&q=\
After digging around, and rediscovering the cheatsheets for regex at http://addedbytes.com and also some more useful information at http://www.regular-expressions.info/lookaround.html I have now tried the following look-behind:
在浏览了http://addedbytes.com并在http://www.regular-expressions.info/lookaround.html上重新发现了正则表达式的cheatsheets以及更多有用的信息后,我现在尝试了以下的后视:
#"([(?<!\\)"]*)"#
But, now all I get is "" and then an empty string
但是,现在我得到的只是“”然后是一个空字符串
2 个解决方案
#1
1
The URLs in the logs would be URL-encoded. As such, the following pattern should work:
日志中的URL将进行URL编码。因此,以下模式应该起作用:
#\"([^ ]*)\"#
#2
2
You placed your lookbehind INSIDE your group ([]), so it's not interpreted as such, but rather just you say you only want those individual characters.
Basically, I think you'd like something like this:
你把你的lookbehind放在你的组([])中,所以它不是这样解释的,而只是你说你只想要那些单独的角色。基本上,我认为你喜欢这样的东西:
#"(?:[^"]|(?<=\\)")"#
Though you should be aware that you'd be trolled by \\" for example.
虽然你应该意识到你会被\\“所控制。
#1
1
The URLs in the logs would be URL-encoded. As such, the following pattern should work:
日志中的URL将进行URL编码。因此,以下模式应该起作用:
#\"([^ ]*)\"#
#2
2
You placed your lookbehind INSIDE your group ([]), so it's not interpreted as such, but rather just you say you only want those individual characters.
Basically, I think you'd like something like this:
你把你的lookbehind放在你的组([])中,所以它不是这样解释的,而只是你说你只想要那些单独的角色。基本上,我认为你喜欢这样的东西:
#"(?:[^"]|(?<=\\)")"#
Though you should be aware that you'd be trolled by \\" for example.
虽然你应该意识到你会被\\“所控制。