如何匹配字符串,直到不跟随另一个特定字符的字符的第一个实例

时间:2022-09-13 08:35:24

Related question: How can I use regex to match a character (') when not following a specific character (?)?

相关问题:如何在不遵循特定字符(?)时使用正则表达式匹配字符(')?

I'm parsing a log using regex (PHP PCRE library), and trying to extract a URL from it. The URL is encapsulated in double quotes ", but some of the requests also include a double quote ". For example:

我正在使用正则表达式(PHP PCRE库)解析日志,并尝试从中提取URL。 URL用双引号“封装,但有些请求还包含双引号”。例如:

"https://www.amh.net.au/online/dbSearch.php?t=all&q=\"Rosuvastatin\""

My first pattern was basically:

我的第一个模式基本上是:

#\"([^\"]*)\"#

This worked well, until I reached one of the entries as above, and it truncated the match so all I got was:

这很有效,直到我达到上面的一个条目,它截断了匹配,所以我得到的是:

https://www.amh.net.au/online/dbSearch.php?t=all&q=\

After digging around, and rediscovering the cheatsheets for regex at http://addedbytes.com and also some more useful information at http://www.regular-expressions.info/lookaround.html I have now tried the following look-behind:

在浏览了http://addedbytes.com并在http://www.regular-expressions.info/lookaround.html上重新发现了正则表达式的cheatsheets以及更多有用的信息后,我现在尝试了以下的后视:

#"([(?<!\\)"]*)"#

But, now all I get is "" and then an empty string

但是,现在我得到的只是“”然后是一个空字符串

2 个解决方案

#1


1  

The URLs in the logs would be URL-encoded. As such, the following pattern should work:

日志中的URL将进行URL编码。因此,以下模式应该起作用:

#\"([^ ]*)\"#

#2


2  

You placed your lookbehind INSIDE your group ([]), so it's not interpreted as such, but rather just you say you only want those individual characters.
Basically, I think you'd like something like this:

你把你的lookbehind放在你的组([])中,所以它不是这样解释的,而只是你说你只想要那些单独的角色。基本上,我认为你喜欢这样的东西:

#"(?:[^"]|(?<=\\)")"#

Though you should be aware that you'd be trolled by \\" for example.

虽然你应该意识到你会被\\“所控制。

#1


1  

The URLs in the logs would be URL-encoded. As such, the following pattern should work:

日志中的URL将进行URL编码。因此,以下模式应该起作用:

#\"([^ ]*)\"#

#2


2  

You placed your lookbehind INSIDE your group ([]), so it's not interpreted as such, but rather just you say you only want those individual characters.
Basically, I think you'd like something like this:

你把你的lookbehind放在你的组([])中,所以它不是这样解释的,而只是你说你只想要那些单独的角色。基本上,我认为你喜欢这样的东西:

#"(?:[^"]|(?<=\\)")"#

Though you should be aware that you'd be trolled by \\" for example.

虽然你应该意识到你会被\\“所控制。