为什么\%(\)比Vim中的\(\)更快?

时间:2021-07-05 20:56:16

I am confused by the docs:

我对文档感到困惑:

\%(\) A pattern enclosed by escaped parentheses. */\%(\)* */\%(* *E53* Just like \(\), but without counting it as a sub-expression. This allows using more groups and it's a little bit faster.

\%(\)由转义括号括起来的模式。 * / \%(\)* * / \%(* * E53 *就像\(\),但不计算它作为子表达式。这允许使用更多的组,它更快一点。

Can someone explain the reason for the difference? Is it because of backtracking or something else?

有人可以解释差异的原因吗?是因为回溯还是别的什么?

2 个解决方案

#1


11  

The 'a little bit faster' comment is accurate in that there is a little less bookkeeping to be done, but the emphasis is on 'little bit' rather than 'faster'. Basically, normally, the material matched by \(pattern\) has to be kept so that you can use \3 (for the appropriate number) to refer to it in the replacement. The % notation means that vim does not have to keep track of the match - so it is doing a little less work.

“稍快一点”的评论是准确的,因为要完成的簿记要少一些,但重点是“一点点”而不是“更快”。基本上,通常,必须保留与\(模式\)匹配的材料,以便您可以使用\ 3(对于适当的数字)在替换中引用它。 %符号表示vim不必跟踪匹配 - 因此它的工作量少了一点。


@SimpleQuestions asks:

@SimpleQuestions问:

What do you mean by "keep track of the match"? How does it affect speed?

“跟踪比赛”是什么意思?它如何影响速度?

You can use escaped parentheses to 'capture' parts of the matched pattern. For example, suppose we're playing with simple C function declarations - no pointers to functions or other sources of parentheses - then we might have a substitute command such as the following:

您可以使用转义括号来“捕获”匹配模式的部分内容。例如,假设我们正在使用简单的C函数声明 - 没有指向函数或其他括号源的指针 - 那么我们可能会有一个替换命令,如下所示:

s@\<\([a-zA-Z_][a-zA-Z_0-9]*\)(\([^)]*\))@xyz_\1(int nargs) /* \2 */@

Given an input line such as:

给定输入行,例如:

int simple_function(int a, char *b, double c)

The output will be:

输出将是:

int xyz_simple_function(int nargs) /* int a, char *b, double c */

(Why might you want to do that? I'm imagining that I need to wrap the C function simple_function so that it can be called from a language compiled to C that uses a different interface convention - it is based on Informix 4GL, to be precise. I'm using it to get an example - not because you really need to know why it was a good change to make.)

(为什么你想要这样做?我想我需要包装C函数simple_function,以便可以从编译为C的语言调用它使用不同的接口约定 - 它基于Informix 4GL,是我正在使用它来得到一个例子 - 不是因为你真的需要知道为什么这是一个很好的改变。)

Now, in the example, the \1 and \2 in the replacement text refer to the captured parts of the regular expression - the function name (a sequence of alphanumerics starting with an alphabetic character - counting underscore as 'alphabetic') and the function argument list (everything between the parentheses, but not including the parentheses).

现在,在示例中,替换文本中的\ 1和\ 2引用正则表达式的捕获部分 - 函数名称(以字母字符开头的字母数字序列 - 将下划线计为'字母')和函数参数列表(括号之间的所有内容,但不包括括号)。

If I'd used the \%(....\) notation around the function identifier, then \1 would refer to the argument list and there would be no \2. Because vim would not have to keep track of one of the two captured parts of the regular expression, it has marginally less bookkeeping to do than if it had to keep track of two captured parts. But, as I said, the difference is tiny; you could probably never measure it in practice. That's why the manual says 'it allows more groups'; if you needed to group parts of your regular expression but didn't need to refer to them again, then you could work with longer regular expressions. However, by the time you have more than 9 remembered (captured) parts to the regular expression, your brain is usually doing gyrations and your fingers will make mistakes anyway - so the effort is not usually worth it. But that is, I think, the argument for using the \%(...\) notation. It matches the Perl (PCRE) notation '(?:...)' for a non-capturing regular expression.

如果我在函数标识符周围使用\%(.... \)表示法,那么\ 1将引用参数列表,并且没有\ 2。因为vim不必跟踪正则表达式的两个捕获部分中的一个,所以它比必须跟踪两个捕获的部分要少得多。但是,正如我所说,差异很小;你可能永远不会在实践中测量它。这就是为什么手册说'它允许更多的团体';如果你需要对正则表达式的部分进行分组但不需要再次引用它们,那么你可以使用更长的正则表达式。然而,当你有正常表达的9个以上记忆(被捕获)的部分时,你的大脑通常会做旋转而你的手指无论如何都会犯错误 - 所以这种努力通常不值得。但是,我认为,使用\%(... \)表示法的论点。它匹配非捕获正则表达式的Perl(PCRE)表示法'(?:...)'。

#2


4  

I asked in #Vim, whether the other is faster because of backtracking. The user godlygeek answered:

我在#Vim中询问,由于回溯,对方是否更快。用户godlygeek回答:

No, it's faster because the thing that's matched doesn't need to be strdup'ed -- any unnecessary work is a bad thing for a syntax file.

不,它更快,因为匹配的东西不需要被限制 - 任何不必要的工作对语法文件都是坏事。

He continued:

他继续:

[The speed] depends on how big the string is. For 3 characters, it doesn't matter much, for 3000 it probably does. And keep in mind that it needs to be strdup'ed every time it matches.... including during backtracking... which means that even the 3 characters could be strdup'ed 1000 times over the course of matching a single regex. -- the syntax files are in $VIMRUNTIME/syntax

[速度]取决于弦的大小。对于3个字符,它并不重要,对于3000它可能会。并且请记住,每次匹配时都需要进行限制....包括在回溯过程中...这意味着在匹配单个正则表达式的过程中,即使是3个字符也可能被强制调整1000次。 - 语法文件采用$ VIMRUNTIME /语法

#1


11  

The 'a little bit faster' comment is accurate in that there is a little less bookkeeping to be done, but the emphasis is on 'little bit' rather than 'faster'. Basically, normally, the material matched by \(pattern\) has to be kept so that you can use \3 (for the appropriate number) to refer to it in the replacement. The % notation means that vim does not have to keep track of the match - so it is doing a little less work.

“稍快一点”的评论是准确的,因为要完成的簿记要少一些,但重点是“一点点”而不是“更快”。基本上,通常,必须保留与\(模式\)匹配的材料,以便您可以使用\ 3(对于适当的数字)在替换中引用它。 %符号表示vim不必跟踪匹配 - 因此它的工作量少了一点。


@SimpleQuestions asks:

@SimpleQuestions问:

What do you mean by "keep track of the match"? How does it affect speed?

“跟踪比赛”是什么意思?它如何影响速度?

You can use escaped parentheses to 'capture' parts of the matched pattern. For example, suppose we're playing with simple C function declarations - no pointers to functions or other sources of parentheses - then we might have a substitute command such as the following:

您可以使用转义括号来“捕获”匹配模式的部分内容。例如,假设我们正在使用简单的C函数声明 - 没有指向函数或其他括号源的指针 - 那么我们可能会有一个替换命令,如下所示:

s@\<\([a-zA-Z_][a-zA-Z_0-9]*\)(\([^)]*\))@xyz_\1(int nargs) /* \2 */@

Given an input line such as:

给定输入行,例如:

int simple_function(int a, char *b, double c)

The output will be:

输出将是:

int xyz_simple_function(int nargs) /* int a, char *b, double c */

(Why might you want to do that? I'm imagining that I need to wrap the C function simple_function so that it can be called from a language compiled to C that uses a different interface convention - it is based on Informix 4GL, to be precise. I'm using it to get an example - not because you really need to know why it was a good change to make.)

(为什么你想要这样做?我想我需要包装C函数simple_function,以便可以从编译为C的语言调用它使用不同的接口约定 - 它基于Informix 4GL,是我正在使用它来得到一个例子 - 不是因为你真的需要知道为什么这是一个很好的改变。)

Now, in the example, the \1 and \2 in the replacement text refer to the captured parts of the regular expression - the function name (a sequence of alphanumerics starting with an alphabetic character - counting underscore as 'alphabetic') and the function argument list (everything between the parentheses, but not including the parentheses).

现在,在示例中,替换文本中的\ 1和\ 2引用正则表达式的捕获部分 - 函数名称(以字母字符开头的字母数字序列 - 将下划线计为'字母')和函数参数列表(括号之间的所有内容,但不包括括号)。

If I'd used the \%(....\) notation around the function identifier, then \1 would refer to the argument list and there would be no \2. Because vim would not have to keep track of one of the two captured parts of the regular expression, it has marginally less bookkeeping to do than if it had to keep track of two captured parts. But, as I said, the difference is tiny; you could probably never measure it in practice. That's why the manual says 'it allows more groups'; if you needed to group parts of your regular expression but didn't need to refer to them again, then you could work with longer regular expressions. However, by the time you have more than 9 remembered (captured) parts to the regular expression, your brain is usually doing gyrations and your fingers will make mistakes anyway - so the effort is not usually worth it. But that is, I think, the argument for using the \%(...\) notation. It matches the Perl (PCRE) notation '(?:...)' for a non-capturing regular expression.

如果我在函数标识符周围使用\%(.... \)表示法,那么\ 1将引用参数列表,并且没有\ 2。因为vim不必跟踪正则表达式的两个捕获部分中的一个,所以它比必须跟踪两个捕获的部分要少得多。但是,正如我所说,差异很小;你可能永远不会在实践中测量它。这就是为什么手册说'它允许更多的团体';如果你需要对正则表达式的部分进行分组但不需要再次引用它们,那么你可以使用更长的正则表达式。然而,当你有正常表达的9个以上记忆(被捕获)的部分时,你的大脑通常会做旋转而你的手指无论如何都会犯错误 - 所以这种努力通常不值得。但是,我认为,使用\%(... \)表示法的论点。它匹配非捕获正则表达式的Perl(PCRE)表示法'(?:...)'。

#2


4  

I asked in #Vim, whether the other is faster because of backtracking. The user godlygeek answered:

我在#Vim中询问,由于回溯,对方是否更快。用户godlygeek回答:

No, it's faster because the thing that's matched doesn't need to be strdup'ed -- any unnecessary work is a bad thing for a syntax file.

不,它更快,因为匹配的东西不需要被限制 - 任何不必要的工作对语法文件都是坏事。

He continued:

他继续:

[The speed] depends on how big the string is. For 3 characters, it doesn't matter much, for 3000 it probably does. And keep in mind that it needs to be strdup'ed every time it matches.... including during backtracking... which means that even the 3 characters could be strdup'ed 1000 times over the course of matching a single regex. -- the syntax files are in $VIMRUNTIME/syntax

[速度]取决于弦的大小。对于3个字符,它并不重要,对于3000它可能会。并且请记住,每次匹配时都需要进行限制....包括在回溯过程中...这意味着在匹配单个正则表达式的过程中,即使是3个字符也可能被强制调整1000次。 - 语法文件采用$ VIMRUNTIME /语法