Regex从右到左读取

时间:2022-08-22 11:33:22

I was looking for a short code that can put commas in set of numbers until I came to this site.

在我来到这个网站之前,我一直在寻找一个可以把逗号放在数字集合中的短代码。

The code:

代码:

function addCommas(nStr)
{
    nStr += '';
    x = nStr.split('.');
    x1 = x[0];
    x2 = x.length > 1 ? '.' + x[1] : '';
    var rgx = /(\d+)(\d{3})/;
    while (rgx.test(x1)) {
        x1 = x1.replace(rgx, '$1' + ',' + '$2');
    }
    return x1 + x2;
}  

Works really great. Having this example set of number:

真正伟大的工作。有这个数字的例子集:

addCommas('83475934.89');  

Will return "83,475,934.89", but when I read the code, I expect it to return 8,3,4,7,5,934.89 but this sites explains that

将返回“83,475,934.89”,但是当我阅读代码时,我期望它返回8,3,4,7,5,934.89,但是这个网站解释了这一点

\d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.

\d+组合\d{3}将匹配一组3个数字,前加任意数量的数字。这使得搜索从右到左进行替换。

And I get so confused. How does this code read from right to left? Plus, what does $1 and $2 mean?

我很困惑。这段代码是如何从右向左读取的?另外,1美元和2美元意味着什么?

5 个解决方案

#1


6  

It matches from right to left because it uses greedy pattern matching. This means that it first finds all the digits (the \d+), then tries to find the \d{3}. In the number 2421567.56, for example it would first match the digits up until the '.' - 2431567 - then works backwards to match the next 3 digits (567) in the next part of the regex. It does this in a loop adding a comma between the $1 and $2 variables.

它从右到左匹配,因为它使用贪婪模式匹配。这意味着它首先找到所有的数字(\d+),然后尝试找到\d{3}。例如,在数字2421567.56中,它首先匹配数字直到“”。- 2431567 -然后向后匹配regex的下一部分的下三个数字(567)。它在一个循环中,在$1和$2变量之间添加逗号。

The $'s represent matching groups formed in the regex with parentheses e.g. the (\d+) = $1 and (\d{3}) = $2. In this way it can easily add characters between them.

$'s表示regex中使用括号组成的匹配组,例如(\d+) = $1, (\d{3}) = $2。通过这种方式,它可以很容易地在它们之间添加字符。

In the next iteration, the greedy matching stops at the newly created comma instead, and it continues until it can't match > 3 digits.

在下一个迭代中,贪婪匹配将停止在新创建的逗号上,并继续下去,直到不能匹配>3位数字。

#2


14  

It isn't actually reading right-to-left. What's really happening is that it's repeatedly applying the (\d+)(\d{3}) pattern (via a while loop) and replacing until it no longer matches the pattern. In other words:

它不是从右到左。真正发生的是,它重复地应用(\d+)(\d{3})模式(通过while循环)并替换,直到不再匹配模式。换句话说:

Iteration 1:

迭代1:

x1 = 83475934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83475,934.89

Iteration 2:

迭代2:

x1 = 83475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83,475,934.89

Iteration 3:

迭代3:

x1 = 83,475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
// no match; end loop

Edit:

编辑:

Plus, what does $1 and $2 mean?

另外,1美元和2美元意味着什么?

Those are back references to the matching groups (\d+) and (\d{3}) respectively.

它们分别是对匹配组(\d+)和(\d{3})的引用。

Here's a great reference for learning how Regular Expressions actually work:
http://www.regular-expressions.info/quickstart.html

这里有一个学习正则表达式如何工作的很好的参考:http://www.regular-expressions.info/quickstart.html

#3


3  

This explanation was further down on the same page

这个解释在同一页下面更靠后

Code Explanation: The code starts off dividing the string into two parts (nStr and nStrEnd) if there is a decimal. A regular expression is used on nStr to add the commas. Then nStrEnd is added back. If the string didn't have nStrEnd temporarily removed, then the regular expression would format 10.0004 as 10.0,004

代码说明:如果有小数,代码首先将字符串分成两部分(nStr和nStrEnd)。在nStr上使用正则表达式添加逗号。然后将nStrEnd添加回。如果字符串没有临时删除nStrEnd,那么正则表达式将把10.0004格式化为10.0,004

Regular Expression Explanation: \d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.

正则表达式说明:\d+与\d{3}组合将匹配一组3个数字,前加任意数量的数字。这使得搜索从右到左进行替换。

The $1 and $2 are captured group matches from the regular expression. You can read more on this topic on Regex Tutorial.

$1和$2是从正则表达式捕获的组匹配。您可以在Regex教程中阅读有关这个主题的更多内容。

#4


2  

The code does read from right to left, what it does is searching for the biggest row of digits (\d+) that is followed by 3 digits (\d{3}). The $1 and $2 are respectively the biggest row of digits and the 3 digits. So it place the comma in between them, by repeating this process it can parse it this way.

这段代码是从右到左读取的,它所做的就是搜索最大的一行数字(\d+),然后是3位数字(\d{3})。1美元和2美元分别是最大的数字行和3位数。它把逗号放在它们之间,通过重复这个过程它可以这样解析它。

#5


1  

I wrote a regular expression which does the same thing in a single pass:

我写了一个正则表达式,每次都做同样的事情:

/(?!\b)(\d{3}(?=(\d{3})*\b))/g

Try this for example with varying numbers at the start:

试着以不同的数字为例:

var num = '1234567890123456';

for(var i = 1; i <= num.length; i++)
{
  console.log(num.slice(0, -i).replace(/(?!\b)(\d{3}(?=(\d{3})*\b))/g, ',$1'));
}


I'll try to break it down here:

我试着把它分解一下

Ignore this bit for now - I'll come back to that.

暂时忽略这一点——我会回到这一点。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


It still reads from left to right trying to capture blocks of 3 digits. Here's the capturing group.

它仍然从左到右读取,试图捕获3位数的块。这是捕获组。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


However, inside the capturing group, it uses a lookahead.

然而,在捕获组中,它使用了一个lookahead。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


The lookahead looks for any multiple of 3 digits anchored to the end of the number - the terminating boundary. This aligns the capture to multiples of 3 from the right-hand end of the number. This means it works with decimal numbers too (unless they are more than 3 decimal places, in which case it will put commas in them too. It ain't perfect).

前视器查找锚定在数字末尾的任何3位数字的倍数——终止边界。这将捕获数从数字的右手端对齐到3的倍数。这意味着它也可以处理十进制数(除非它们的小数点超过3位,在这种情况下,它也会在十进制数中加上逗号。它不是完美的。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


The problem I had was that JavaScript doesn't support atomic look-behinds so, when the number has a multiple of 3 digits, it was matching the first 3 digits and placing a comma at the very start of the number.
You can't match a character before the 3 digit match without throwing off the repetition, so I had to use a negative lookahead that matches a word-boundary. It's kinda the opposite of putting ^ at the start.

我遇到的问题是JavaScript不支持原子外观,所以,当数字有3个数字的倍数时,它会匹配前3个数字,并在数字的开头放置一个逗号。不能在3位数匹配之前匹配一个字符而不进行重复,所以我必须使用一个与单词边界匹配的负前视。它有点相反的^。

(?!\b)(\d{3}(?=(\d{3})*$))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* $))


Essentially it prevents the expression from matching from the start of the string.
Which would be bad.

本质上,它阻止表达式从字符串的开始匹配。这将是坏的。

#1


6  

It matches from right to left because it uses greedy pattern matching. This means that it first finds all the digits (the \d+), then tries to find the \d{3}. In the number 2421567.56, for example it would first match the digits up until the '.' - 2431567 - then works backwards to match the next 3 digits (567) in the next part of the regex. It does this in a loop adding a comma between the $1 and $2 variables.

它从右到左匹配,因为它使用贪婪模式匹配。这意味着它首先找到所有的数字(\d+),然后尝试找到\d{3}。例如,在数字2421567.56中,它首先匹配数字直到“”。- 2431567 -然后向后匹配regex的下一部分的下三个数字(567)。它在一个循环中,在$1和$2变量之间添加逗号。

The $'s represent matching groups formed in the regex with parentheses e.g. the (\d+) = $1 and (\d{3}) = $2. In this way it can easily add characters between them.

$'s表示regex中使用括号组成的匹配组,例如(\d+) = $1, (\d{3}) = $2。通过这种方式,它可以很容易地在它们之间添加字符。

In the next iteration, the greedy matching stops at the newly created comma instead, and it continues until it can't match > 3 digits.

在下一个迭代中,贪婪匹配将停止在新创建的逗号上,并继续下去,直到不能匹配>3位数字。

#2


14  

It isn't actually reading right-to-left. What's really happening is that it's repeatedly applying the (\d+)(\d{3}) pattern (via a while loop) and replacing until it no longer matches the pattern. In other words:

它不是从右到左。真正发生的是,它重复地应用(\d+)(\d{3})模式(通过while循环)并替换,直到不再匹配模式。换句话说:

Iteration 1:

迭代1:

x1 = 83475934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83475,934.89

Iteration 2:

迭代2:

x1 = 83475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83,475,934.89

Iteration 3:

迭代3:

x1 = 83,475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
// no match; end loop

Edit:

编辑:

Plus, what does $1 and $2 mean?

另外,1美元和2美元意味着什么?

Those are back references to the matching groups (\d+) and (\d{3}) respectively.

它们分别是对匹配组(\d+)和(\d{3})的引用。

Here's a great reference for learning how Regular Expressions actually work:
http://www.regular-expressions.info/quickstart.html

这里有一个学习正则表达式如何工作的很好的参考:http://www.regular-expressions.info/quickstart.html

#3


3  

This explanation was further down on the same page

这个解释在同一页下面更靠后

Code Explanation: The code starts off dividing the string into two parts (nStr and nStrEnd) if there is a decimal. A regular expression is used on nStr to add the commas. Then nStrEnd is added back. If the string didn't have nStrEnd temporarily removed, then the regular expression would format 10.0004 as 10.0,004

代码说明:如果有小数,代码首先将字符串分成两部分(nStr和nStrEnd)。在nStr上使用正则表达式添加逗号。然后将nStrEnd添加回。如果字符串没有临时删除nStrEnd,那么正则表达式将把10.0004格式化为10.0,004

Regular Expression Explanation: \d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.

正则表达式说明:\d+与\d{3}组合将匹配一组3个数字,前加任意数量的数字。这使得搜索从右到左进行替换。

The $1 and $2 are captured group matches from the regular expression. You can read more on this topic on Regex Tutorial.

$1和$2是从正则表达式捕获的组匹配。您可以在Regex教程中阅读有关这个主题的更多内容。

#4


2  

The code does read from right to left, what it does is searching for the biggest row of digits (\d+) that is followed by 3 digits (\d{3}). The $1 and $2 are respectively the biggest row of digits and the 3 digits. So it place the comma in between them, by repeating this process it can parse it this way.

这段代码是从右到左读取的,它所做的就是搜索最大的一行数字(\d+),然后是3位数字(\d{3})。1美元和2美元分别是最大的数字行和3位数。它把逗号放在它们之间,通过重复这个过程它可以这样解析它。

#5


1  

I wrote a regular expression which does the same thing in a single pass:

我写了一个正则表达式,每次都做同样的事情:

/(?!\b)(\d{3}(?=(\d{3})*\b))/g

Try this for example with varying numbers at the start:

试着以不同的数字为例:

var num = '1234567890123456';

for(var i = 1; i <= num.length; i++)
{
  console.log(num.slice(0, -i).replace(/(?!\b)(\d{3}(?=(\d{3})*\b))/g, ',$1'));
}


I'll try to break it down here:

我试着把它分解一下

Ignore this bit for now - I'll come back to that.

暂时忽略这一点——我会回到这一点。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


It still reads from left to right trying to capture blocks of 3 digits. Here's the capturing group.

它仍然从左到右读取,试图捕获3位数的块。这是捕获组。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


However, inside the capturing group, it uses a lookahead.

然而,在捕获组中,它使用了一个lookahead。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


The lookahead looks for any multiple of 3 digits anchored to the end of the number - the terminating boundary. This aligns the capture to multiples of 3 from the right-hand end of the number. This means it works with decimal numbers too (unless they are more than 3 decimal places, in which case it will put commas in them too. It ain't perfect).

前视器查找锚定在数字末尾的任何3位数字的倍数——终止边界。这将捕获数从数字的右手端对齐到3的倍数。这意味着它也可以处理十进制数(除非它们的小数点超过3位,在这种情况下,它也会在十进制数中加上逗号。它不是完美的。

(?!\b)(\d{3}(?=(\d{3})*\b))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))


The problem I had was that JavaScript doesn't support atomic look-behinds so, when the number has a multiple of 3 digits, it was matching the first 3 digits and placing a comma at the very start of the number.
You can't match a character before the 3 digit match without throwing off the repetition, so I had to use a negative lookahead that matches a word-boundary. It's kinda the opposite of putting ^ at the start.

我遇到的问题是JavaScript不支持原子外观,所以,当数字有3个数字的倍数时,它会匹配前3个数字,并在数字的开头放置一个逗号。不能在3位数匹配之前匹配一个字符而不进行重复,所以我必须使用一个与单词边界匹配的负前视。它有点相反的^。

(?!\b)(\d{3}(?=(\d{3})*$))

(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* $))


Essentially it prevents the expression from matching from the start of the string.
Which would be bad.

本质上,它阻止表达式从字符串的开始匹配。这将是坏的。