I was looking for a solid article on when double escaping is necessary and when it is not, but I was not able to find anything. Perhaps I didn't look hard enough, because I'm sure there is an explanation out there somewhere, but lets just make it easy to find for the next guy that has this question!
我正在寻找一篇关于什么时候需要双重逃逸以及什么时候没有,但我无法找到任何东西的文章。也许我看起来不够努力,因为我确定在那里有一个解释,但是让我们很容易找到有这个问题的下一个人!
Take for example the following regex patterns:
以下面的正则表达式模式为例:
/\n/
/domain\.com/
/myfeet \$ your feet/
Nothing ground breaking right? OK, lets use those examples within the context of PHP's preg_match function:
没有什么可以突破的吗?好的,让我们在PHP的preg_match函数的上下文中使用这些示例:
$foo = preg_match("/\n/", $bar);
$foo = preg_match("/domain\.com/", $bar);
$foo = preg_match("/myfeet \$ your feet/", $bar);
To my understanding, a backslash in the context of a quoted string value escapes the following character, and the expression is being given via a quoted string value.
据我所知,带引号的字符串值的上下文中的反斜杠会转义后续字符,并且表达式是通过带引号的字符串值给出的。
Would the previous be like doing the folloing, and wouldnt this cause an error?:
以前会不会像下面这样做,这不会导致错误吗?:
$foo = preg_match("/n/", $bar);
$foo = preg_match("/domain.com/", $bar);
$foo = preg_match("/myfeet $ your feet/", $bar);
Which is not what I want right? those expressions are not the same as above.
这不是我想要的吗?那些表达方式与上面的不一样。
Would I not have to write them double escaped like this?
难道我不必像这样写双逃生吗?
$foo = preg_match("/\\n/", $bar);
$foo = preg_match("/domain\\.com/", $bar);
$foo = preg_match("/myfeet \\$ your feet/", $bar);
So that when PHP processes the string it escapes the backslash to a backslash which is then left in when its passed to the PCRE interpreter?
那么当PHP处理字符串时,它会将反斜杠转义为反斜杠,然后在传递给PCRE解释器时留下反斜杠?
Or does PHP just magically know that I want to pass that backslash to the PCRE interpreter... i mean how does it know I'm not trying to \"
escape a quote that I want to use in my expression? or are only double slashes required when using an escaped quote? And for that matter, would you need to TRIPLE escape a quote? \\\"
You know, so that the quote is escaped and a double is left over?
或者PHP只是神奇地知道我想将反斜杠传递给PCRE解释器...我的意思是它怎么知道我不是要“逃避我想在我的表达中使用的引用?或者只是双重使用转义引用时需要斜线?对于这个问题,你需要TRIPLE转义一个引用吗?\\\“你知道吗,所以报价被转义并留下了双倍的结果?
Whats the rule of thumb with this?
这个经验法则是什么?
I just did a test with PHP:
我刚刚用PHP做了一个测试:
$bar = "asdfasdf a\"ONE\"sfda dsf adsf me & mine adsf asdf asfd ";
echo preg_match("/me \$ mine/", $bar);
echo "<br /><br />";
echo preg_match("/me \\$ mine/", $bar);
echo "<br /><br />";
echo preg_match("/a\"ONE\"/", $bar);
echo "<br /><br />";
echo preg_match("/a\\\"ONE\\\"/", $bar);
echo "<br /><br />";
Output:
0
1
1
1
So, it looks like somehow it doesnt really matter for quotes, but for the dollar sign, a double escape is required as I thought.
因此,看起来某种程度上它对报价并不重要,但对于美元符号,我认为需要双重逃避。
5 个解决方案
#1
7
Double quoted strings
双引号字符串
When it comes to escaping inside double quotes, the rule is that PHP will inspect the character(s) immediately following the backslash.
当涉及到双引号内转义时,规则是PHP将在反斜杠后立即检查字符。
If the neighboring character is in the set ntrvef\$"
or if a numeric value follows it (rules can be found here) it gets evaluated as the corresponding control character or ordinal (hexadecimal or octal) representation, respectively.
如果相邻字符在set ntrvef \ $“中,或者如果数字值跟在其后面(可以在此处找到规则),则它将分别作为相应的控制字符或序数(十六进制或八进制)表示进行评估。
It's important to note that if an invalid escape sequence is given, the expression is not evaluated and both the backslash and character remain. This is different from some other languages where an invalid escape sequence would cause an error instead.
重要的是要注意,如果给出了无效的转义序列,则不评估表达式,并且反斜杠和字符都保留。这与其他语言不同,其中无效的转义序列会导致错误。
E.g. "domain\.com"
will be left as is.
例如。 “domain \ .com”将保留原样。
Note that variables get expanded inside double quotes as well, e.g. "$var"
needs to be escaped as "\$var"
.
请注意,变量也会在双引号内扩展,例如“$ var”需要以“\ $ var”的形式进行转义。
Single quotes strings
单引号字符串
Since PHP 5.1.1, any backslash inside single quoted strings (and followed by at least one character) will get printed as is and no variables get substituted either. This is by far the most convenient feature of single quoted strings.
从PHP 5.1.1开始,单引号字符串中的任何反斜杠(后跟至少一个字符)将按原样打印,并且不会替换任何变量。这是迄今为止单引号字符串最方便的特性。
Regular expressions
For escaping regular expressions, it's best to leave escaping to preg_quote()
:
对于转义正则表达式,最好将转义转义为preg_quote():
$foo = preg_match('/' . preg_quote('mine & yours', '/') . '/', $bar);
This way you don't have to worry about which characters need to be escaped, so it works well for user input.
这样您就不必担心需要转义哪些字符,因此适用于用户输入。
See also: preg_quote
另见:preg_quote
Update
You added this test:
你添加了这个测试:
"/me \$ mine/"
This gets evaluated as "/me $ mine/"
; but in PCRE the $
has a special meaning (it's an end-of-subject anchor).
这被评估为“/ me $ mine /”;但在PCRE中,$具有特殊含义(它是一个主题终结点)。
"/me \\$ mine/"
This is evaluated as "/me \$ mine/"
and so the backslashes is escaped for PHP itself whereas the $
is escaped for PCRE. This only works by accident btw.
这被评估为“/ me \ $ mine /”,所以反斜杠为PHP本身进行了转义,而$则为PCRE进行了转义。这只能偶然发挥作用。
$var = 'something';
"/me \\$var mine/"
This gets evaluated as "/me \something"
, so you need to escape the $
again.
这被评估为“/ me \ something”,所以你需要再次逃避$。
"/me \\\$var mine/"
#2
1
Use single quotes. They prevent escape sequences from occurring.
使用单引号。它们可以防止逃逸序列的发生。
For example:
php > print "hi\n";
hi
php > print 'hi\n';
hi\nphp >
#3
0
Whenever you have an invalid escape sequence, PHP actually leaves the characters literally in the string. From the documentation:
每当你有一个无效的转义序列时,PHP实际上将字符保留在字符串中。从文档:
As in single quoted strings, escaping any other character will result in the backslash being printed too.
与单引号字符串一样,转义任何其他字符也会导致反斜杠被打印。
I.e. "\&"
really is interpreted as "\&"
. There are not that many escape sequences, so in most cases you probably get away with a single backslash. But for consistency, escaping the backslash might be a better choice.
即“\&”实际上被解释为“\&”。没有那么多的转义序列,所以在大多数情况下,你可能只用一个反斜杠就可以了。但为了保持一致性,逃避反斜杠可能是更好的选择。
As always: Know what you are doing :)
一如既往:知道你在做什么:)
#4
0
OK So I did some more testing and discovered the RULE OF THUMB when encapsulating a PCRE in DOUBLE QUOTES, the following holds true:
好的所以我做了一些更多的测试,并在将PCRE封装在DOUBLE QUOTES中时发现了THUMB的规则,以下情况属实:
$
- Requires double escape because PHP will interpret that as the beginning of a variable if text is immediately following it. Left unescaped and it will indicate the end of your needle and will break.
$ - 需要双重转义,因为如果文本紧跟在后面,PHP会将其解释为变量的开头。保持未转义状态,它将指示针的末端并将断裂。
\r\n\t\v
- Special PHP string escapes, single escape required only.
\ r \ n \ t \ v - 特殊的PHP字符串转义,仅需要单个转义。
[\^$.|?*+()
- Special RegEx characters, require single escape only. Double escape does not seem to break expressions when used unnecessarily.
[\ ^ $。|?* +() - 特殊的RegEx字符,仅需要单个转义。双重转义在不必要地使用时似乎不会破坏表达式。
"
- Quotes are obviously going to have to be escaped due to the encapsulation, but only need to be escaped once.
“ - 由于封装,行情显然必须被转义,但只需要转义一次。
\
- Searching for a backslash? Using the double quote encapsulation of your expression, this will require 3 escapes! \\ (four backslashes in total)
\ - 搜索反斜杠?使用表达式的双引号封装,这将需要3次转义! \\(总共四个反斜杠)
Anything I'm missing?
我错过了什么?
#5
0
I'll start saying that all I'll write below is not exactly what happens, but, for clarity, I'll simplify it.
我会开始说我下面写的所有内容并不完全是这样,但为了清楚起见,我会简化它。
Imagine that there are two evaluations happening when using regular expressions: the first being done by PHP and the second being done by PCRE, as if they were separate engines. And for our bad luck,
想象一下,使用正则表达式时会发生两次评估:第一次是由PHP完成,第二次是由PCRE完成,就好像它们是独立的引擎一样。为了我们的运气不好,
PHP AND PCRE EVALUATES THINGS IN DIFFERENT WAYS.
PHP和PCRE以不同的方式评估这些事情。
We have 3 "guys" here: 1) the USER; 2) the PHP and; 3) the PCRE.
我们这里有3个“伙伴”:1)用户; 2)PHP和; 3)PCRE。
The USER communicates with PHP by writing the CODE, which is exactly what you type in a code editor. PHP then evaluates this CODE and sends another bit of information to PCRE. This bit of information is different from what you typed in your CODE. PCRE then evaluates it and returns something to PHP, that evaluates this response and returns something to the USER.
USER通过编写CODE与PHP进行通信,这正是您在代码编辑器中键入的内容。 PHP然后评估此CODE并向PCRE发送另一部分信息。这些信息与您在CODE中输入的信息不同。 PCRE然后对其进行评估并向PHP返回一些内容,评估此响应并向USER返回一些内容。
I'll explain better in the exemple below. There I'm going to use the backslash ("\") to ilustrate what's going on.
我将在下面的例子中更好地解释。在那里,我将使用反斜杠(“\”)来说明正在发生的事情。
Assume this bit of CODE in a php file:
在php文件中假设这个CODE:
<?php
$sub = "A backslash \ in a string";
$pat1 = "#\#";
$pat2 = "#\\#";
$pat3 = "#\\\#";
$pat4 = "#\\\\#";
echo "sub: ".$sub;
echo "\n\n";
echo "pat1: ".$pat1;
echo "\n";
echo "pat2: ".$pat2;
echo "\n";
echo "pat3: ".$pat3;
echo "\n";
echo "pat4: ".$pat4;
?>
This will print:
这将打印:
sub: A backslash \ in a string pat1: #\# pat2: #\# pat3: #\\# pat4: #\\#
In this exemple, there is no regular expression involved, so there is only the PHP evaluation of the code happening. PHP leaves a backslash as is if it doesn't precede any special character. That's why it prints the backslash correctly in $sub.
在这个例子中,没有涉及正则表达式,因此只有代码的PHP评估发生。如果它不在任何特殊字符之前,PHP会留下反斜杠。这就是为什么它在$ sub中正确打印反斜杠的原因。
PHP evaluates $pat1 and $pat2 EXACTLY the same, because in $pat1 the backslash is left as is, and in $pat2 the first backslash escapes the second, resulting in a single backslash.
PHP评估$ pat1和$ pat2完全相同,因为在$ pat1中反斜杠保持原样,而在$ pat2中第一个反斜杠转义为第二个,导致一个反斜杠。
Now, in $pat3, the first backslash escapes the second, resulting in one backslash. Then PHP evaluates the third backslash and leaves it as is because it is not preceding anything special. The result is going to be the double backslash.
现在,在$ pat3中,第一个反斜杠逃脱了第二个,导致一个反斜杠。然后PHP评估第三个反斜杠并将其保留原样,因为它不会出现任何特殊情况。结果将是双反斜杠。
Now someone could say "but now we have two backslashes again! shouldn't the first one escape the second one again?!" The answer is "No". After PHP evaluates the first two backslashes into a single one, it doesn't look back again, and keeps moving on evaluating what is next.
现在有人可以说“但现在我们又有两个反斜杠了!不应该第一个再次逃脱第二个吗?!”答案是不”。在PHP将前两个反斜杠评估为单个反斜杠之后,它不再回头看,并继续评估接下来的内容。
At this point you already know what's going on with $pat4: the first backslash escapes the second and the third escapes the fourth, leaving two in the end.
此时你已经知道$ pat4发生了什么:第一个反斜杠逃脱第二个,第三个逃脱第四个,最后留下两个。
Now that it's clear what PHP is doing to these strings, let's add some more code after the previous one.
现在很清楚PHP对这些字符串做了什么,让我们在前一个字符串之后再添加一些代码。
if (preg_match($pat1, $sub)) echo "test1: true"; else echo "test1: false";
echo "\n";
if (preg_match($pat2, $sub)) echo "test2: true"; else echo "test2: false";
echo "\n";
if (preg_match($pat3, $sub)) echo "test3: true"; else echo "test3: false";
echo "\n";
if (preg_match($pat4, $sub)) echo "test4: true"; else echo "test4: false";
And the result is:
结果是:
test1: false test2: false test3: true test4: true
So, what's going on here is that PHP is not sending "what you typed" in the CODE directly to PCRE. Instead, PHP is sending what it has evaluated previously (which are exactly what we saw above).
所以,这里发生的是PHP没有直接向CORE发送“你键入的内容”。相反,PHP正在发送它之前评估过的内容(这正是我们上面看到的)。
For test1 and test2, even though we have written different patterns in the CODE for each test, PHP is sending the same pattern #\# to PCRE. The same thing happens for test3 and test4: PHP is sending #\\#. So, the results for test1 and test2 are the same, as well as for test3 and test4.
对于test1和test2,即使我们在CODE中为每个测试编写了不同的模式,PHP也会向PCRE发送相同的模式#\#。 test3和test4也是如此:PHP正在发送#\\#。因此,test1和test2的结果以及test3和test4的结果相同。
Now, what's going on when PCRE evaluates these patterns? PCRE doesn't act like PHP.
现在,当PCRE评估这些模式时会发生什么? PCRE不像PHP那样。
In test1 and test2, when PCRE sees a single backslash escaping nothing special (or nothing at all), it doesn't leave it as is. Instead, it problably thinks "what the hell is this?" and returns an error to PHP (actually, I don't really know what goes on when sending a single backslash to PCRE, searched for this, but still no conclusive). Then PHP takes what we are assuming is an error and evaluates it as "false" and returns that to the rest of the CODE (in this exemple, the if () function).
在test1和test2中,当PCRE看到一个反斜杠没有任何特殊情况(或根本没有)时,它不会保持原样。相反,它可能会认为“这到底是什么?”并向PHP返回一个错误(实际上,我真的不知道在向PCRE发送单个反斜杠时发生了什么,搜索了这个,但仍然没有定论)。然后PHP采用我们假设的错误并将其评估为“false”并将其返回到CODE的其余部分(在此示例中,if()函数)。
In test3 and test4, things go as we now expect: PCRE evaluates the first backslash as escaping the second, resulting in a single backslash. That of course matches the $sub string and returns a "successful message" to PHP, which evaluates it as "true".
在test3和test4中,事情就像我们现在所期望的那样:PCRE将第一个反斜杠计算为逃避第二个反斜杠,从而产生一个反斜杠。那当然匹配$ sub字符串并向PHP返回一条“成功消息”,它将其评估为“true”。
ANSWERING QUESTIONS
Some characters are special to PHP (e.g. n for NEW LINE, t for TAB).
Some characters are special to PCRE (e.g. . (dot) to match any character, s to match whitespaces).
And some characters are special to both (e.g. $ to php is the beginning of the name of a variable and to PCRE it asserts the end of the subject).
回答问题有些字符对PHP来说很特殊(例如,n代表NEW LINE,t代表TAB)。某些字符对PCRE是特殊的(例如。(点)匹配任何字符,s匹配空格)。并且一些字符对两者都是特殊的(例如,$ to php是变量名称的开头,而PCRE则断言主题的结尾)。
That's why you need to escape newlines just once, like this \n. PHP will evaluate it as the REAL character NEW LINE and send that to PCRE.
这就是为什么你需要一次转义换行,就像这样\ n。 PHP会将其评估为REAL字符NEW LINE并将其发送给PCRE。
For the dot, if you want to match that specific character, you should use \. and PHP will do nothing because the dot isn't a special character to PHP in a string. Instead, it will send them as is to PCRE. Now on PCRE, it will "see" a backslash preceding a dot and understand that it should match that specific character. If you use a double escape \\. the first backslash will escape the second, leaving you with the same result.
对于点,如果要匹配该特定字符,则应使用\。并且PHP将不执行任何操作,因为点不是字符串中PHP的特殊字符。相反,它会按原样发送给PCRE。现在在PCRE上,它将“看到”一个点之前的反斜杠并理解它应该与该特定字符匹配。如果你使用双逃生\\。第一个反斜杠将逃脱第二个,留下你相同的结果。
And if you want to match a dollar sign in a string, then you should use \\\$. In PHP, the first backslash will escape the second one, leaving a single backslash. Then the third backslash will escape the dollar sign. In the end, the result is \$. This is what PCRE will receive. PCRE will see that backslash and understand that the dollar sign is not asserting end of subject, but the literal character.
如果你想匹配字符串中的美元符号,那么你应该使用\\\ $。在PHP中,第一个反斜杠将逃脱第二个反斜杠,留下一个反斜杠。然后第三个反斜杠将逃脱美元符号。最后,结果是\ $。这就是PCRE将收到的。 PCRE将看到反斜杠,并理解美元符号并未断言主题的结束,而是文字字符。
QUOTES
And now we've come to quotes. The problem with them is the fact that PHP evaluates a string in different ways, depending on the quotes used to surround it. Check it out: Strings
现在我们来引用。它们的问题在于PHP以不同的方式评估字符串,具体取决于用于包围它的引号。看看吧:字符串
All I said until this point is valid for double quotes. If you try this '\n' in single quotes, PHP will evaluate that backslash as a literal one.
But, if it is used in a regular expression, PCRE will get this string as is. And since n is also special to PCRE, it will interpret that as a newline character, and BOOM, it "magicaly" matches a newline in a string. Check the escape sequences here: Escape Sequences
我所说的直到这一点对双引号都有效。如果你在单引号中尝试'\ n',PHP会将反斜杠评估为文字反斜杠。但是,如果它在正则表达式中使用,PCRE将按原样获取此字符串。并且因为n对于PCRE也是特殊的,它会将其解释为换行符和BOOM,它“magicaly”匹配字符串中的换行符。在此处检查转义序列:转义序列
As I said in the beginning, things area not exactly as I tried to explain here, but I really hope it helps (and not make it more confusing than it already is).
正如我在开始时所说的那样,事情的区域并不像我在这里解释的那样,但我真的希望它有所帮助(而不是让它比现在更混乱)。
#1
7
Double quoted strings
双引号字符串
When it comes to escaping inside double quotes, the rule is that PHP will inspect the character(s) immediately following the backslash.
当涉及到双引号内转义时,规则是PHP将在反斜杠后立即检查字符。
If the neighboring character is in the set ntrvef\$"
or if a numeric value follows it (rules can be found here) it gets evaluated as the corresponding control character or ordinal (hexadecimal or octal) representation, respectively.
如果相邻字符在set ntrvef \ $“中,或者如果数字值跟在其后面(可以在此处找到规则),则它将分别作为相应的控制字符或序数(十六进制或八进制)表示进行评估。
It's important to note that if an invalid escape sequence is given, the expression is not evaluated and both the backslash and character remain. This is different from some other languages where an invalid escape sequence would cause an error instead.
重要的是要注意,如果给出了无效的转义序列,则不评估表达式,并且反斜杠和字符都保留。这与其他语言不同,其中无效的转义序列会导致错误。
E.g. "domain\.com"
will be left as is.
例如。 “domain \ .com”将保留原样。
Note that variables get expanded inside double quotes as well, e.g. "$var"
needs to be escaped as "\$var"
.
请注意,变量也会在双引号内扩展,例如“$ var”需要以“\ $ var”的形式进行转义。
Single quotes strings
单引号字符串
Since PHP 5.1.1, any backslash inside single quoted strings (and followed by at least one character) will get printed as is and no variables get substituted either. This is by far the most convenient feature of single quoted strings.
从PHP 5.1.1开始,单引号字符串中的任何反斜杠(后跟至少一个字符)将按原样打印,并且不会替换任何变量。这是迄今为止单引号字符串最方便的特性。
Regular expressions
For escaping regular expressions, it's best to leave escaping to preg_quote()
:
对于转义正则表达式,最好将转义转义为preg_quote():
$foo = preg_match('/' . preg_quote('mine & yours', '/') . '/', $bar);
This way you don't have to worry about which characters need to be escaped, so it works well for user input.
这样您就不必担心需要转义哪些字符,因此适用于用户输入。
See also: preg_quote
另见:preg_quote
Update
You added this test:
你添加了这个测试:
"/me \$ mine/"
This gets evaluated as "/me $ mine/"
; but in PCRE the $
has a special meaning (it's an end-of-subject anchor).
这被评估为“/ me $ mine /”;但在PCRE中,$具有特殊含义(它是一个主题终结点)。
"/me \\$ mine/"
This is evaluated as "/me \$ mine/"
and so the backslashes is escaped for PHP itself whereas the $
is escaped for PCRE. This only works by accident btw.
这被评估为“/ me \ $ mine /”,所以反斜杠为PHP本身进行了转义,而$则为PCRE进行了转义。这只能偶然发挥作用。
$var = 'something';
"/me \\$var mine/"
This gets evaluated as "/me \something"
, so you need to escape the $
again.
这被评估为“/ me \ something”,所以你需要再次逃避$。
"/me \\\$var mine/"
#2
1
Use single quotes. They prevent escape sequences from occurring.
使用单引号。它们可以防止逃逸序列的发生。
For example:
php > print "hi\n";
hi
php > print 'hi\n';
hi\nphp >
#3
0
Whenever you have an invalid escape sequence, PHP actually leaves the characters literally in the string. From the documentation:
每当你有一个无效的转义序列时,PHP实际上将字符保留在字符串中。从文档:
As in single quoted strings, escaping any other character will result in the backslash being printed too.
与单引号字符串一样,转义任何其他字符也会导致反斜杠被打印。
I.e. "\&"
really is interpreted as "\&"
. There are not that many escape sequences, so in most cases you probably get away with a single backslash. But for consistency, escaping the backslash might be a better choice.
即“\&”实际上被解释为“\&”。没有那么多的转义序列,所以在大多数情况下,你可能只用一个反斜杠就可以了。但为了保持一致性,逃避反斜杠可能是更好的选择。
As always: Know what you are doing :)
一如既往:知道你在做什么:)
#4
0
OK So I did some more testing and discovered the RULE OF THUMB when encapsulating a PCRE in DOUBLE QUOTES, the following holds true:
好的所以我做了一些更多的测试,并在将PCRE封装在DOUBLE QUOTES中时发现了THUMB的规则,以下情况属实:
$
- Requires double escape because PHP will interpret that as the beginning of a variable if text is immediately following it. Left unescaped and it will indicate the end of your needle and will break.
$ - 需要双重转义,因为如果文本紧跟在后面,PHP会将其解释为变量的开头。保持未转义状态,它将指示针的末端并将断裂。
\r\n\t\v
- Special PHP string escapes, single escape required only.
\ r \ n \ t \ v - 特殊的PHP字符串转义,仅需要单个转义。
[\^$.|?*+()
- Special RegEx characters, require single escape only. Double escape does not seem to break expressions when used unnecessarily.
[\ ^ $。|?* +() - 特殊的RegEx字符,仅需要单个转义。双重转义在不必要地使用时似乎不会破坏表达式。
"
- Quotes are obviously going to have to be escaped due to the encapsulation, but only need to be escaped once.
“ - 由于封装,行情显然必须被转义,但只需要转义一次。
\
- Searching for a backslash? Using the double quote encapsulation of your expression, this will require 3 escapes! \\ (four backslashes in total)
\ - 搜索反斜杠?使用表达式的双引号封装,这将需要3次转义! \\(总共四个反斜杠)
Anything I'm missing?
我错过了什么?
#5
0
I'll start saying that all I'll write below is not exactly what happens, but, for clarity, I'll simplify it.
我会开始说我下面写的所有内容并不完全是这样,但为了清楚起见,我会简化它。
Imagine that there are two evaluations happening when using regular expressions: the first being done by PHP and the second being done by PCRE, as if they were separate engines. And for our bad luck,
想象一下,使用正则表达式时会发生两次评估:第一次是由PHP完成,第二次是由PCRE完成,就好像它们是独立的引擎一样。为了我们的运气不好,
PHP AND PCRE EVALUATES THINGS IN DIFFERENT WAYS.
PHP和PCRE以不同的方式评估这些事情。
We have 3 "guys" here: 1) the USER; 2) the PHP and; 3) the PCRE.
我们这里有3个“伙伴”:1)用户; 2)PHP和; 3)PCRE。
The USER communicates with PHP by writing the CODE, which is exactly what you type in a code editor. PHP then evaluates this CODE and sends another bit of information to PCRE. This bit of information is different from what you typed in your CODE. PCRE then evaluates it and returns something to PHP, that evaluates this response and returns something to the USER.
USER通过编写CODE与PHP进行通信,这正是您在代码编辑器中键入的内容。 PHP然后评估此CODE并向PCRE发送另一部分信息。这些信息与您在CODE中输入的信息不同。 PCRE然后对其进行评估并向PHP返回一些内容,评估此响应并向USER返回一些内容。
I'll explain better in the exemple below. There I'm going to use the backslash ("\") to ilustrate what's going on.
我将在下面的例子中更好地解释。在那里,我将使用反斜杠(“\”)来说明正在发生的事情。
Assume this bit of CODE in a php file:
在php文件中假设这个CODE:
<?php
$sub = "A backslash \ in a string";
$pat1 = "#\#";
$pat2 = "#\\#";
$pat3 = "#\\\#";
$pat4 = "#\\\\#";
echo "sub: ".$sub;
echo "\n\n";
echo "pat1: ".$pat1;
echo "\n";
echo "pat2: ".$pat2;
echo "\n";
echo "pat3: ".$pat3;
echo "\n";
echo "pat4: ".$pat4;
?>
This will print:
这将打印:
sub: A backslash \ in a string pat1: #\# pat2: #\# pat3: #\\# pat4: #\\#
In this exemple, there is no regular expression involved, so there is only the PHP evaluation of the code happening. PHP leaves a backslash as is if it doesn't precede any special character. That's why it prints the backslash correctly in $sub.
在这个例子中,没有涉及正则表达式,因此只有代码的PHP评估发生。如果它不在任何特殊字符之前,PHP会留下反斜杠。这就是为什么它在$ sub中正确打印反斜杠的原因。
PHP evaluates $pat1 and $pat2 EXACTLY the same, because in $pat1 the backslash is left as is, and in $pat2 the first backslash escapes the second, resulting in a single backslash.
PHP评估$ pat1和$ pat2完全相同,因为在$ pat1中反斜杠保持原样,而在$ pat2中第一个反斜杠转义为第二个,导致一个反斜杠。
Now, in $pat3, the first backslash escapes the second, resulting in one backslash. Then PHP evaluates the third backslash and leaves it as is because it is not preceding anything special. The result is going to be the double backslash.
现在,在$ pat3中,第一个反斜杠逃脱了第二个,导致一个反斜杠。然后PHP评估第三个反斜杠并将其保留原样,因为它不会出现任何特殊情况。结果将是双反斜杠。
Now someone could say "but now we have two backslashes again! shouldn't the first one escape the second one again?!" The answer is "No". After PHP evaluates the first two backslashes into a single one, it doesn't look back again, and keeps moving on evaluating what is next.
现在有人可以说“但现在我们又有两个反斜杠了!不应该第一个再次逃脱第二个吗?!”答案是不”。在PHP将前两个反斜杠评估为单个反斜杠之后,它不再回头看,并继续评估接下来的内容。
At this point you already know what's going on with $pat4: the first backslash escapes the second and the third escapes the fourth, leaving two in the end.
此时你已经知道$ pat4发生了什么:第一个反斜杠逃脱第二个,第三个逃脱第四个,最后留下两个。
Now that it's clear what PHP is doing to these strings, let's add some more code after the previous one.
现在很清楚PHP对这些字符串做了什么,让我们在前一个字符串之后再添加一些代码。
if (preg_match($pat1, $sub)) echo "test1: true"; else echo "test1: false";
echo "\n";
if (preg_match($pat2, $sub)) echo "test2: true"; else echo "test2: false";
echo "\n";
if (preg_match($pat3, $sub)) echo "test3: true"; else echo "test3: false";
echo "\n";
if (preg_match($pat4, $sub)) echo "test4: true"; else echo "test4: false";
And the result is:
结果是:
test1: false test2: false test3: true test4: true
So, what's going on here is that PHP is not sending "what you typed" in the CODE directly to PCRE. Instead, PHP is sending what it has evaluated previously (which are exactly what we saw above).
所以,这里发生的是PHP没有直接向CORE发送“你键入的内容”。相反,PHP正在发送它之前评估过的内容(这正是我们上面看到的)。
For test1 and test2, even though we have written different patterns in the CODE for each test, PHP is sending the same pattern #\# to PCRE. The same thing happens for test3 and test4: PHP is sending #\\#. So, the results for test1 and test2 are the same, as well as for test3 and test4.
对于test1和test2,即使我们在CODE中为每个测试编写了不同的模式,PHP也会向PCRE发送相同的模式#\#。 test3和test4也是如此:PHP正在发送#\\#。因此,test1和test2的结果以及test3和test4的结果相同。
Now, what's going on when PCRE evaluates these patterns? PCRE doesn't act like PHP.
现在,当PCRE评估这些模式时会发生什么? PCRE不像PHP那样。
In test1 and test2, when PCRE sees a single backslash escaping nothing special (or nothing at all), it doesn't leave it as is. Instead, it problably thinks "what the hell is this?" and returns an error to PHP (actually, I don't really know what goes on when sending a single backslash to PCRE, searched for this, but still no conclusive). Then PHP takes what we are assuming is an error and evaluates it as "false" and returns that to the rest of the CODE (in this exemple, the if () function).
在test1和test2中,当PCRE看到一个反斜杠没有任何特殊情况(或根本没有)时,它不会保持原样。相反,它可能会认为“这到底是什么?”并向PHP返回一个错误(实际上,我真的不知道在向PCRE发送单个反斜杠时发生了什么,搜索了这个,但仍然没有定论)。然后PHP采用我们假设的错误并将其评估为“false”并将其返回到CODE的其余部分(在此示例中,if()函数)。
In test3 and test4, things go as we now expect: PCRE evaluates the first backslash as escaping the second, resulting in a single backslash. That of course matches the $sub string and returns a "successful message" to PHP, which evaluates it as "true".
在test3和test4中,事情就像我们现在所期望的那样:PCRE将第一个反斜杠计算为逃避第二个反斜杠,从而产生一个反斜杠。那当然匹配$ sub字符串并向PHP返回一条“成功消息”,它将其评估为“true”。
ANSWERING QUESTIONS
Some characters are special to PHP (e.g. n for NEW LINE, t for TAB).
Some characters are special to PCRE (e.g. . (dot) to match any character, s to match whitespaces).
And some characters are special to both (e.g. $ to php is the beginning of the name of a variable and to PCRE it asserts the end of the subject).
回答问题有些字符对PHP来说很特殊(例如,n代表NEW LINE,t代表TAB)。某些字符对PCRE是特殊的(例如。(点)匹配任何字符,s匹配空格)。并且一些字符对两者都是特殊的(例如,$ to php是变量名称的开头,而PCRE则断言主题的结尾)。
That's why you need to escape newlines just once, like this \n. PHP will evaluate it as the REAL character NEW LINE and send that to PCRE.
这就是为什么你需要一次转义换行,就像这样\ n。 PHP会将其评估为REAL字符NEW LINE并将其发送给PCRE。
For the dot, if you want to match that specific character, you should use \. and PHP will do nothing because the dot isn't a special character to PHP in a string. Instead, it will send them as is to PCRE. Now on PCRE, it will "see" a backslash preceding a dot and understand that it should match that specific character. If you use a double escape \\. the first backslash will escape the second, leaving you with the same result.
对于点,如果要匹配该特定字符,则应使用\。并且PHP将不执行任何操作,因为点不是字符串中PHP的特殊字符。相反,它会按原样发送给PCRE。现在在PCRE上,它将“看到”一个点之前的反斜杠并理解它应该与该特定字符匹配。如果你使用双逃生\\。第一个反斜杠将逃脱第二个,留下你相同的结果。
And if you want to match a dollar sign in a string, then you should use \\\$. In PHP, the first backslash will escape the second one, leaving a single backslash. Then the third backslash will escape the dollar sign. In the end, the result is \$. This is what PCRE will receive. PCRE will see that backslash and understand that the dollar sign is not asserting end of subject, but the literal character.
如果你想匹配字符串中的美元符号,那么你应该使用\\\ $。在PHP中,第一个反斜杠将逃脱第二个反斜杠,留下一个反斜杠。然后第三个反斜杠将逃脱美元符号。最后,结果是\ $。这就是PCRE将收到的。 PCRE将看到反斜杠,并理解美元符号并未断言主题的结束,而是文字字符。
QUOTES
And now we've come to quotes. The problem with them is the fact that PHP evaluates a string in different ways, depending on the quotes used to surround it. Check it out: Strings
现在我们来引用。它们的问题在于PHP以不同的方式评估字符串,具体取决于用于包围它的引号。看看吧:字符串
All I said until this point is valid for double quotes. If you try this '\n' in single quotes, PHP will evaluate that backslash as a literal one.
But, if it is used in a regular expression, PCRE will get this string as is. And since n is also special to PCRE, it will interpret that as a newline character, and BOOM, it "magicaly" matches a newline in a string. Check the escape sequences here: Escape Sequences
我所说的直到这一点对双引号都有效。如果你在单引号中尝试'\ n',PHP会将反斜杠评估为文字反斜杠。但是,如果它在正则表达式中使用,PCRE将按原样获取此字符串。并且因为n对于PCRE也是特殊的,它会将其解释为换行符和BOOM,它“magicaly”匹配字符串中的换行符。在此处检查转义序列:转义序列
As I said in the beginning, things area not exactly as I tried to explain here, but I really hope it helps (and not make it more confusing than it already is).
正如我在开始时所说的那样,事情的区域并不像我在这里解释的那样,但我真的希望它有所帮助(而不是让它比现在更混乱)。