I looked through related questions before posting this and I couldn't modify any relevant answers to work with my method (not good at regex).
在发布这篇文章之前,我查看了相关的问题,我无法修改任何与我的方法相关的答案(不擅长regex)。
Basically, here are my existing lines:
基本上,这是我现有的台词:
$code = preg_replace_callback( '/"(.*?)"/', array( &$this, '_getPHPString' ), $code );
$code = preg_replace_callback( "#'(.*?)'#", array( &$this, '_getPHPString' ), $code );
They both match strings contained between ''
and ""
. I need the regex to ignore escaped quotes contained between themselves. So data between ''
will ignore \'
and data between ""
will ignore \"
.
它们都匹配包含在"和"之间的字符串。我需要regex忽略包含在它们之间的转义引号。因此,“将忽略\”之间的数据和“将忽略\”之间的数据。
Any help would be greatly appreciated.
如有任何帮助,我们将不胜感激。
5 个解决方案
#1
66
For most strings, you need to allow escaped anything (not just escaped quotes). e.g. you most likely need to allow escaped characters like "\n"
and "\t"
and of course, the escaped-escape: "\\"
.
对于大多数字符串,您需要允许转义任何东西(不仅仅是转义引号)。你很可能需要允许转义的角色,比如“\n”和“\t”,当然还有逃跑的逃跑:“\\”。
This is a frequently asked question, and one which was solved (and optimized) long ago. Jeffrey Friedl covers this question in depth (as an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here is the regex you are looking for:
这是一个经常被问到的问题,而且是很久以前就解决(并优化)的问题。Jeffrey Friedl在他的经典著作《掌握正则表达式》(第三版)中深入讨论了这个问题(作为一个例子)。这是您要找的regex:
Good:
"([^"\\]|\\.)*"
Version 1: Works correctly but is not terribly efficient.
”([^ \ \]| \ \)*”版本1:正常工作,但不是很有效。
Better:
"([^"\\]++|\\.)*"
or "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive quantifiers or atomic groups (See: sin's correct answer which uses the atomic group method).
”([^ \ \]+ + | \ \)。*”或“((? >[^ | \ \ " \ \]+)。)*”版本2:更高效的如果你有占有欲强的量词或原子组(见:罪的正确答案使用原子组方法)。
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements Friedl's: "unrolling-the-loop" technique. Does not require possessive or atomic groups (i.e. this can be used in Javascript and other less-featured regex engines.)
“[^ " \ \]*(?:\ \[^“\ \]*)*”版本3:更加有效。实现fiedl的用于检查电子邮件地址的:“unrolling-the-loop”技术。不需要所有格或原子组(也就是说,这可以用于Javascript和其他功能较低的regex引擎)。
Here are the recommended regexes in PHP syntax for both double and single quoted sub-strings:
以下是PHP语法中推荐的用于双引号和单引号子字符串的regexes:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
#2
10
Try a regex like this:
试试这样的regex:
'/"(\\\\[\\\\"]|[^\\\\"])*"/'
A (short) explanation:
一个(短)的解释:
" # match a `"`
( # open group 1
\\\\[\\\\"] # match either `\\` or `\"`
| # OR
[^\\\\"] # match any char other than `\` and `"`
)* # close group 1, and repeat it zero or more times
" # match a `"`
The following snippet:
以下代码片段:
<?php
$text = 'abc "string \\\\ \\" literal" def';
preg_match_all('/"(\\\\[\\\\"]|[^\\\\"])*"/', $text, $matches);
echo $text . "\n";
print_r($matches);
?>
produces:
生产:
abc "string \\ \" literal" def
Array
(
[0] => Array
(
[0] => "string \\ \" literal"
)
[1] => Array
(
[0] => l
)
)
as you can see on Ideone.
正如你在Ideone上看到的。
#3
1
This seems to be as fast as the unrolled loop, based on some cursory benchmarks, but is much easier to read and understand. It doesn't require any backtracking in the first place.
这似乎和基于一些粗略的基准的展开循环一样快,但是更容易阅读和理解。它不需要任何回溯。
"[^"\\]*(\\.[^"\\]*)*"
#4
0
This has possibilities:
这有可能:
/"(?>(?:(?>[^"\\]+)|\\.)*)"/
/”(? >(?:(? >[^ | \ \ " \ \]+)。)*)“/
/'(?>(?:(?>[^'\\]+)|\\.)*)'/
/‘(? >(?:(? >[^ | \ \ ' \ \]+)。)*)/
#5
0
This will leave the quotes outside
这将把引号放在外面。
(?<=['"])(.*?)(?=["'])
and use global
/g will match all groups
使用全局/g将匹配所有组
#1
66
For most strings, you need to allow escaped anything (not just escaped quotes). e.g. you most likely need to allow escaped characters like "\n"
and "\t"
and of course, the escaped-escape: "\\"
.
对于大多数字符串,您需要允许转义任何东西(不仅仅是转义引号)。你很可能需要允许转义的角色,比如“\n”和“\t”,当然还有逃跑的逃跑:“\\”。
This is a frequently asked question, and one which was solved (and optimized) long ago. Jeffrey Friedl covers this question in depth (as an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here is the regex you are looking for:
这是一个经常被问到的问题,而且是很久以前就解决(并优化)的问题。Jeffrey Friedl在他的经典著作《掌握正则表达式》(第三版)中深入讨论了这个问题(作为一个例子)。这是您要找的regex:
Good:
"([^"\\]|\\.)*"
Version 1: Works correctly but is not terribly efficient.
”([^ \ \]| \ \)*”版本1:正常工作,但不是很有效。
Better:
"([^"\\]++|\\.)*"
or "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive quantifiers or atomic groups (See: sin's correct answer which uses the atomic group method).
”([^ \ \]+ + | \ \)。*”或“((? >[^ | \ \ " \ \]+)。)*”版本2:更高效的如果你有占有欲强的量词或原子组(见:罪的正确答案使用原子组方法)。
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements Friedl's: "unrolling-the-loop" technique. Does not require possessive or atomic groups (i.e. this can be used in Javascript and other less-featured regex engines.)
“[^ " \ \]*(?:\ \[^“\ \]*)*”版本3:更加有效。实现fiedl的用于检查电子邮件地址的:“unrolling-the-loop”技术。不需要所有格或原子组(也就是说,这可以用于Javascript和其他功能较低的regex引擎)。
Here are the recommended regexes in PHP syntax for both double and single quoted sub-strings:
以下是PHP语法中推荐的用于双引号和单引号子字符串的regexes:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
#2
10
Try a regex like this:
试试这样的regex:
'/"(\\\\[\\\\"]|[^\\\\"])*"/'
A (short) explanation:
一个(短)的解释:
" # match a `"`
( # open group 1
\\\\[\\\\"] # match either `\\` or `\"`
| # OR
[^\\\\"] # match any char other than `\` and `"`
)* # close group 1, and repeat it zero or more times
" # match a `"`
The following snippet:
以下代码片段:
<?php
$text = 'abc "string \\\\ \\" literal" def';
preg_match_all('/"(\\\\[\\\\"]|[^\\\\"])*"/', $text, $matches);
echo $text . "\n";
print_r($matches);
?>
produces:
生产:
abc "string \\ \" literal" def
Array
(
[0] => Array
(
[0] => "string \\ \" literal"
)
[1] => Array
(
[0] => l
)
)
as you can see on Ideone.
正如你在Ideone上看到的。
#3
1
This seems to be as fast as the unrolled loop, based on some cursory benchmarks, but is much easier to read and understand. It doesn't require any backtracking in the first place.
这似乎和基于一些粗略的基准的展开循环一样快,但是更容易阅读和理解。它不需要任何回溯。
"[^"\\]*(\\.[^"\\]*)*"
#4
0
This has possibilities:
这有可能:
/"(?>(?:(?>[^"\\]+)|\\.)*)"/
/”(? >(?:(? >[^ | \ \ " \ \]+)。)*)“/
/'(?>(?:(?>[^'\\]+)|\\.)*)'/
/‘(? >(?:(? >[^ | \ \ ' \ \]+)。)*)/
#5
0
This will leave the quotes outside
这将把引号放在外面。
(?<=['"])(.*?)(?=["'])
and use global
/g will match all groups
使用全局/g将匹配所有组