这个正则表达式做了什么

$pee = preg_replace( '|<p>|', "$1<p>", $pee );

This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help?

这个正则表达式来自Wordpress源代码(formatting.php,wpautop函数);我不知道它做了什么,有人可以帮忙吗?

Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with regex.

实际上我正试图将这个函数移植到Python ...如果有人知道现有的端口,那就好了,因为我对正则表达式真的很糟糕。

9 个解决方案

#1

wordpress really calls a variable "pee" ?

wordpress真的称为变量“小便”?

I'm not sure what the $1 stands for (there are no braces in the first parameter?), so I don't think it actually does anything, but i could be wrong.

我不确定$ 1代表什么(第一个参数中没有括号?),所以我认为它实际上没有做任何事情,但我可能是错的。

#2

The preg_replace() function - somewhat confusingly - allows you to use other delimiters besides the standard "/" for regular expressions, so

preg_replace()函数 - 有点令人困惑 - 允许你使用除标准“/”之外的其他分隔符来表示正则表达式,所以

"|<p>|"

Would be a regular expression just matching

将是一个正则匹配的正则表达式

"<p>"

in the text. However, I'm not clear on what the replacement parameter of

在文中。但是,我不清楚更换参数是什么

"$1<p>"

would be doing, since there's no grouping to map to $1. It would seem like as given, this is just replacing a paragraph tag with an empty string followed by a paragraph tag, and in effect doing nothing.

会做,因为没有分组映射到$ 1。它看起来像给定的,这只是用一个空字符串后跟段落标记替换段落标记,实际上什么都不做。

Anyone with more in-depth knowledge of PHP quirks have a better analysis?

任何对PHP怪癖有更深入了解的人都有更好的分析?

#3

...?

Actually, it looks like this takes the first  tag and prepends the previous regular expression's first match to it (since there's no match in this one),

实际上,它看起来像第一个

标签,并在前一个正则表达式的第一个匹配前面加上它(因为在这个标签中没有匹配),

However, it seems that this behavior is bad to say the least, as there's no guarantee that preg_* functions won't clobber $1 with their own values.

但是,似乎这种行为至少可以说是不好的,因为不能保证preg_ *函数不会使用自己的值来破坏$ 1。

Edit: Judging from Jay's comment, this regex actually does nothing.

编辑:从周杰伦的评论来看,这个正则表达式实际上什么也没做。

#4

The pipe symbols | in this case do not have the default meaning of "match this or that" but are use as alternative delimiters for the pattern instead of the more common slashes /. This may make sense, if you want to match for / without having to escape those appearances (e.g. /(.\*)\/(.\*)\// is not as readable as #/(.\*)/(.\*)/#). Seems quite contra productive to use | instead which is just another reserved char for patterns, though.

管道符号|在这种情况下,没有“匹配this或that”的默认含义,但是用作模式的替代分隔符而不是更常见的斜杠/。这可能是有意义的,如果你想匹配/不必逃避那些外观(例如/(.\*)\/(.\*),//不像#/(。\ *)/那样可读( \ *)/#)。似乎非常有效使用|相反,它只是模式的另一个保留字符。

Normally $1 in the replacement pattern should match the first group denoted by parentheses. E.g if you've got a pattern like

通常,替换模式中的$ 1应与括号表示的第一组匹配。例如,如果你有一个类似的模式

"(.*)<p>"

$0 would contain the whole match and $1 the part before the .

0美元将包含整个匹配,并且之前的部分为1美元。

As the given reg-ex does not declare any groups and $1 is not a valid name for a variable (in PHP4) defined elsewhere, this call seems to replace any occurrences of  with ?

由于给定的reg-ex没有声明任何组,并且$ 1不是其他地方定义的变量(在PHP4中)的有效名称,因此该调用似乎替换了任何出现的与?

To be honest, now I'm also quite confused. Just a guess: gets another pattern-matching method (preg_match and the like) called before the given line so the $1 is "leaked" from there?

说实话,现在我也很困惑。只是一个猜测:在给定的行之前调用另一个模式匹配方法(preg_match之类),这样$ 1从那里“泄露”了吗?

#5

I highly recommend the amazing RegexBuddy

我强烈推荐令人惊叹的RegexBuddy

#6

I believe that line does nothing.

我相信这条线什么都不做。

For what it's worth, this is the previous line, in which $1 is set:

对于它的价值,这是前一行,其中设置了$ 1:

$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);

However, I don't think that's worth anything. In my testing, $1 does not maintain a value from one preg_replace to the next, even if the next doesn't set its own value for $1. Remember that PHP variable names cannot begin with a number (see: http://php.net/language.variables ), so $1 is not a PHP variable. It only means something within a single preg_replace, and in this case the rules of preg_replace suggest it doesn't mean anything.

但是,我认为这不值得。在我的测试中,$ 1不会保持从一个preg_replace到下一个preg_replace的值,即使下一个没有为$ 1设置自己的值。请记住,PHP变量名称不能以数字开头(请参阅:http://php.net/language.variables),因此$ 1不是PHP变量。它只表示单个preg_replace中的某些内容,在这种情况下,preg_replace的规则表明它并不意味着什么。

That said, autop being such a widely-used function makes me doubt my own conclusion that this line is doing nothing. So I look forward to someone correcting me.

也就是说,autop是如此广泛使用的功能让我怀疑我自己的结论是这条线无效。所以我期待有人纠正我。

#7

The regex simply matches the literal text

正则表达式只是匹配文字文本

. The choice to delimit the regex with the vertical bar instead of forward slashes is very unfortunate. It doesn't change the code, but it makes it harder for humans to read. (It also makes it impossible to use the alternation operator in the regex.)

。选择使用垂直条而不是正斜杠来分隔正则表达式是非常不幸的。它不会改变代码,但它使人类更难阅读。 (这也使得在正则表达式中使用交替运算符成为不可能。)

$1 is not a valid variable name in PHP, so $1 is never interpolated in double-quoted strings. The $1 gets passed to preg_replace unchanged. preg_replace parses the replacement string, and replaces $1 with the contents of the first capturing group. If there is no capturing group, $1 is replaced with nothing.

$ 1不是PHP中的有效变量名,因此$ 1永远不会用双引号字符串进行插值。 $ 1传递给preg_replace不变。 preg_replace解析替换字符串,并将$ 1替换为第一个捕获组的内容。如果没有捕获组,$ 1将被替换为nothing。

Thus, this code does the same as:

因此,此代码与以下内容相同:

$pee = preg_replace( '/<p>/', "<p>", $pee );

It's not correct that this does nothing. The search-and-replace will run, slowing down your software, and eating up memory for temporary copies of $pee.

这没有任何作用是不正确的。搜索和替换将运行,减慢您的软件速度,并消耗内存以便临时复制$ pee。

#8

-2

I don't have very much experience with RegEx an don't have a RegEx testing tool on me atm but after doing some searching and looking at other WordPress source code and comments, is it possible this code removes duplicate paragraph tags and replaces them wih a single set of tags.

我没有很多使用RegEx的经验,我没有使用RegEx测试工具,但在做了一些搜索并查看其他WordPress源代码和注释之后,这段代码是否有可能删除重复的段落标记并将其替换为wih一组标签。

#9

-3

It replace the match from the pattern

它取代了模式中的匹配

"|<p>|"

by the string

通过字符串

"$1<p>"

The | in the replacement pattern is causes the regex engine to match either the part on the left side, or the part on the right side.

|在替换模式中,正则表达式引擎匹配左侧的部分或右侧的部分。

I do not get why it's used that way because usually it's for something like "ta(b|p)e"...

我不明白为什么它被这样使用,因为通常它是像“ta(b | p)e”这样的东西......

For the $1, I guess the variable $1 is in the PHP code and it replaced during the preg_replace so if $1 = "test"; the replacement will replace the

对于1美元,我猜变量$ 1是在PHP代码中,它在preg_replace期间被替换,所以如果$ 1 =“test”;更换将取代

"<p>"

"test<p>"

But I am not sure of it for the $1

但我不确定1美元

#1