捕获组VS非捕获组

时间:2022-11-09 22:33:32

I tried to test the performance of capturing and non-capturing group of the regex. By the way, there is very slightly different between the capturing group and the non-capturing group. Is this result normal?

我试图测试正则表达式的捕获和非捕获组的性能。顺便说一下,捕获组和非捕获组之间存在非常小的差异。这个结果是否正常?

[root@Sensor ~]# ll -h sample.log
-rw-r--r-- 1 root root 21M Oct 20 23:01 sample.log

[root@Sensor ~]# time grep -ciP '(get|post).*' sample.log
20000

real    0m0.083s
user    0m0.070s
sys     0m0.010s

[root@Sensor ~]# time grep -ciP '(?:get|post).*' sample.log
20000

real    0m0.083s
user    0m0.077s
sys     0m0.004s

2 个解决方案

#1


1  

Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match. However, there are three important caveats:

通常,非捕获组的性能优于捕获组,因为它们需要较少的内存分配,并且不会复制组匹配。但是,有三个重要的警告:

  • The difference is typically very small for simple, short expressions with short matches.
  • 对于具有短匹配的简单短表达,差异通常非常小。
  • The act of starting a program like grep itself takes a significant amount of time and memory, and may overwhelm any small improvement gained by using non-capturing group(s).
  • 启动像grep这样的程序的行为需要大量的时间和内存,并且可能会超过使用非捕获组所获得的任何小改进。
  • Some languages implement capturing and non-capturing groups in the same way, causing the latter to give no performance improvement.
  • 有些语言以相同的方式实现捕获和非捕获组,导致后者不会提高性能。

#2


1  

If use a lot of the capturing group. The difference seems to be more.

如果使用了很多捕获组。差异似乎更多。

Thanks everyone.:)

感谢大家。:)

[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+" sample.log
20000

real    0m0.057s
user    0m0.051s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+" sample.log
20000

real    0m0.061s
user    0m0.053s
sys     0m0.006s
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+(get|post)" sample.log
1880

real    0m0.839s
user    0m0.833s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+(?:get|post)" sample.log
1880

real    0m0.744s
user    0m0.741s
sys     0m0.003s

#1


1  

Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match. However, there are three important caveats:

通常,非捕获组的性能优于捕获组,因为它们需要较少的内存分配,并且不会复制组匹配。但是,有三个重要的警告:

  • The difference is typically very small for simple, short expressions with short matches.
  • 对于具有短匹配的简单短表达,差异通常非常小。
  • The act of starting a program like grep itself takes a significant amount of time and memory, and may overwhelm any small improvement gained by using non-capturing group(s).
  • 启动像grep这样的程序的行为需要大量的时间和内存,并且可能会超过使用非捕获组所获得的任何小改进。
  • Some languages implement capturing and non-capturing groups in the same way, causing the latter to give no performance improvement.
  • 有些语言以相同的方式实现捕获和非捕获组,导致后者不会提高性能。

#2


1  

If use a lot of the capturing group. The difference seems to be more.

如果使用了很多捕获组。差异似乎更多。

Thanks everyone.:)

感谢大家。:)

[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+" sample.log
20000

real    0m0.057s
user    0m0.051s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+" sample.log
20000

real    0m0.061s
user    0m0.053s
sys     0m0.006s
[root@Sensor ~]# time grep -ciP "(get|post)\s[^\s]+(get|post)" sample.log
1880

real    0m0.839s
user    0m0.833s
sys     0m0.005s
[root@Sensor ~]# time grep -ciP "(?:get|post)\s[^\s]+(?:get|post)" sample.log
1880

real    0m0.744s
user    0m0.741s
sys     0m0.003s