为什么此代码会导致Chrome窒息?

时间:2021-01-03 20:16:28

I am trying to debug a problem in my app which I have narrowed down to a particular situation involving a regular expression which causes Chrome to choke! Trying the same code in Firefox works fine. Also if I reduce my 'sample' text to run the regex on it works too.

我正在尝试调试我的应用程序中的问题,我已经缩小到涉及正常表达式的特定情况,导致Chrome窒息!在Firefox中尝试相同的代码可以正常工作。此外,如果我减少我的'示例'文本以运行正则表达式也可以。

So what gives?

什么赋予了什么?

Here is the jsfiddle: http://jsfiddle.net/XWKRb/1/ (Which will fail to initialize at all because Chrome would choke if you are getting the same result as I am)

这是jsfiddle:http://jsfiddle.net/XWKRb/1/(这将无法初始化,因为如果你获得与我相同的结果,Chrome会窒息)

The code I have put in the jsfiddle is:

我在jsfiddle中输入的代码是:

var rgx = /^(\d+([,|;]?\d*))*$/;
var sample = '40162690,40162755,40162691,40168355,40168357,40162726,40162752,40162729,40428707 ,40162740,40162546';
alert("Test is "+rgx.test(sample));

Maybe there is a better way to write my regular expression to avoid the issue? The goal is the regular expression should catch a string of numbers which are separated by comma or semi-colon.

也许有更好的方法来编写我的正则表达式来避免这个问题?目标是正则表达式应该捕获由逗号或分号分隔的一串数字。

2 个解决方案

#1


13  

You have a classic case of catastrophic backtracking:

你有一个典型的灾难性回溯案例:

^(\d+([,|;]?\d*))*$
    ^      ^  ^  ^
    |      |  |  ---- zero or more repetitions of the group 
    |      |  ------- zero or more digits
    |      ---------- zero or one comma, pipe or semicolon
    ----------------- one or more digits

contains a repeated group which contains optional elements, one of which is repeated itself. Ignoring the separators for now, you have essentially the regex

包含一个包含可选元素的重复组,其中一个元素自身重复。暂时忽略分隔符,你基本上有正则表达式

^(\d+\d*)*$

That leads to an exponential number of permutations your regex has to check in the worst case.

这导致你的正则表达式在最坏的情况下必须检查的指数数量的排列。

As soon as another character besides the allowed characters is found in your string (like a space in your example), the regex must fail - but it takes the engine ages to figure this out. Some browsers detect such runaway regex matches, but Chrome appears to want to ride this out.

只要在你的字符串中找到了允许的字符之外的另一个字符(就像你的例子中的空格一样),正则表达式必须失败 - 但需要引擎年龄来计算出来。有些浏览器检测到这种失控的正则表达式匹配,但Chrome似乎想要将其解决。

To illustrate this, testing your regex in RegexBuddy shows the following:

为了说明这一点,在RegexBuddy中测试你的正则表达式显示如下:

Input             Steps to determine a non-match
1,1X                   23
12,21X                119
123,321X              723
1234,4321X          4,743
12345,54321X       31,991
123456,654321X    217,995
1234567,7654321X  attempt aborted after 1,000,000 steps

#2


4  

This pattern will work better:

这种模式会更好:

var rgx = /^\d+(?:[,;]\s*\d+)*$/;

#1


13  

You have a classic case of catastrophic backtracking:

你有一个典型的灾难性回溯案例:

^(\d+([,|;]?\d*))*$
    ^      ^  ^  ^
    |      |  |  ---- zero or more repetitions of the group 
    |      |  ------- zero or more digits
    |      ---------- zero or one comma, pipe or semicolon
    ----------------- one or more digits

contains a repeated group which contains optional elements, one of which is repeated itself. Ignoring the separators for now, you have essentially the regex

包含一个包含可选元素的重复组,其中一个元素自身重复。暂时忽略分隔符,你基本上有正则表达式

^(\d+\d*)*$

That leads to an exponential number of permutations your regex has to check in the worst case.

这导致你的正则表达式在最坏的情况下必须检查的指数数量的排列。

As soon as another character besides the allowed characters is found in your string (like a space in your example), the regex must fail - but it takes the engine ages to figure this out. Some browsers detect such runaway regex matches, but Chrome appears to want to ride this out.

只要在你的字符串中找到了允许的字符之外的另一个字符(就像你的例子中的空格一样),正则表达式必须失败 - 但需要引擎年龄来计算出来。有些浏览器检测到这种失控的正则表达式匹配,但Chrome似乎想要将其解决。

To illustrate this, testing your regex in RegexBuddy shows the following:

为了说明这一点,在RegexBuddy中测试你的正则表达式显示如下:

Input             Steps to determine a non-match
1,1X                   23
12,21X                119
123,321X              723
1234,4321X          4,743
12345,54321X       31,991
123456,654321X    217,995
1234567,7654321X  attempt aborted after 1,000,000 steps

#2


4  

This pattern will work better:

这种模式会更好:

var rgx = /^\d+(?:[,;]\s*\d+)*$/;