
时间:2022-08-22 11:33:22

I was looking for a short code that can put commas in set of numbers until I came to this site.


The code:


function addCommas(nStr)
    nStr += '';
    x = nStr.split('.');
    x1 = x[0];
    x2 = x.length > 1 ? '.' + x[1] : '';
    var rgx = /(\d+)(\d{3})/;
    while (rgx.test(x1)) {
        x1 = x1.replace(rgx, '$1' + ',' + '$2');
    return x1 + x2;

Works really great. Having this example set of number:



Will return "83,475,934.89", but when I read the code, I expect it to return 8,3,4,7,5,934.89 but this sites explains that


\d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.


And I get so confused. How does this code read from right to left? Plus, what does $1 and $2 mean?


5 个解决方案



It matches from right to left because it uses greedy pattern matching. This means that it first finds all the digits (the \d+), then tries to find the \d{3}. In the number 2421567.56, for example it would first match the digits up until the '.' - 2431567 - then works backwards to match the next 3 digits (567) in the next part of the regex. It does this in a loop adding a comma between the $1 and $2 variables.

它从右到左匹配,因为它使用贪婪模式匹配。这意味着它首先找到所有的数字(\d+),然后尝试找到\d{3}。例如,在数字2421567.56中,它首先匹配数字直到“”。- 2431567 -然后向后匹配regex的下一部分的下三个数字(567)。它在一个循环中,在$1和$2变量之间添加逗号。

The $'s represent matching groups formed in the regex with parentheses e.g. the (\d+) = $1 and (\d{3}) = $2. In this way it can easily add characters between them.

$'s表示regex中使用括号组成的匹配组,例如(\d+) = $1, (\d{3}) = $2。通过这种方式,它可以很容易地在它们之间添加字符。

In the next iteration, the greedy matching stops at the newly created comma instead, and it continues until it can't match > 3 digits.




It isn't actually reading right-to-left. What's really happening is that it's repeatedly applying the (\d+)(\d{3}) pattern (via a while loop) and replacing until it no longer matches the pattern. In other words:


Iteration 1:


x1 = 83475934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83475,934.89

Iteration 2:


x1 = 83475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83,475,934.89

Iteration 3:


x1 = 83,475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
// no match; end loop



Plus, what does $1 and $2 mean?


Those are back references to the matching groups (\d+) and (\d{3}) respectively.


Here's a great reference for learning how Regular Expressions actually work:




This explanation was further down on the same page


Code Explanation: The code starts off dividing the string into two parts (nStr and nStrEnd) if there is a decimal. A regular expression is used on nStr to add the commas. Then nStrEnd is added back. If the string didn't have nStrEnd temporarily removed, then the regular expression would format 10.0004 as 10.0,004


Regular Expression Explanation: \d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.


The $1 and $2 are captured group matches from the regular expression. You can read more on this topic on Regex Tutorial.




The code does read from right to left, what it does is searching for the biggest row of digits (\d+) that is followed by 3 digits (\d{3}). The $1 and $2 are respectively the biggest row of digits and the 3 digits. So it place the comma in between them, by repeating this process it can parse it this way.




I wrote a regular expression which does the same thing in a single pass:



Try this for example with varying numbers at the start:


var num = '1234567890123456';

for(var i = 1; i <= num.length; i++)
  console.log(num.slice(0, -i).replace(/(?!\b)(\d{3}(?=(\d{3})*\b))/g, ',$1'));

I'll try to break it down here:


Ignore this bit for now - I'll come back to that.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

It still reads from left to right trying to capture blocks of 3 digits. Here's the capturing group.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

However, inside the capturing group, it uses a lookahead.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

The lookahead looks for any multiple of 3 digits anchored to the end of the number - the terminating boundary. This aligns the capture to multiples of 3 from the right-hand end of the number. This means it works with decimal numbers too (unless they are more than 3 decimal places, in which case it will put commas in them too. It ain't perfect).



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

The problem I had was that JavaScript doesn't support atomic look-behinds so, when the number has a multiple of 3 digits, it was matching the first 3 digits and placing a comma at the very start of the number.
You can't match a character before the 3 digit match without throwing off the repetition, so I had to use a negative lookahead that matches a word-boundary. It's kinda the opposite of putting ^ at the start.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* $))

Essentially it prevents the expression from matching from the start of the string.
Which would be bad.




It matches from right to left because it uses greedy pattern matching. This means that it first finds all the digits (the \d+), then tries to find the \d{3}. In the number 2421567.56, for example it would first match the digits up until the '.' - 2431567 - then works backwards to match the next 3 digits (567) in the next part of the regex. It does this in a loop adding a comma between the $1 and $2 variables.

它从右到左匹配,因为它使用贪婪模式匹配。这意味着它首先找到所有的数字(\d+),然后尝试找到\d{3}。例如,在数字2421567.56中,它首先匹配数字直到“”。- 2431567 -然后向后匹配regex的下一部分的下三个数字(567)。它在一个循环中,在$1和$2变量之间添加逗号。

The $'s represent matching groups formed in the regex with parentheses e.g. the (\d+) = $1 and (\d{3}) = $2. In this way it can easily add characters between them.

$'s表示regex中使用括号组成的匹配组,例如(\d+) = $1, (\d{3}) = $2。通过这种方式,它可以很容易地在它们之间添加字符。

In the next iteration, the greedy matching stops at the newly created comma instead, and it continues until it can't match > 3 digits.




It isn't actually reading right-to-left. What's really happening is that it's repeatedly applying the (\d+)(\d{3}) pattern (via a while loop) and replacing until it no longer matches the pattern. In other words:


Iteration 1:


x1 = 83475934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83475,934.89

Iteration 2:


x1 = 83475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
x1 = 83,475,934.89

Iteration 3:


x1 = 83,475,934.89
x1.replace((\d+)(\d{3}), '$1' + ',' + '$2');
// no match; end loop



Plus, what does $1 and $2 mean?


Those are back references to the matching groups (\d+) and (\d{3}) respectively.


Here's a great reference for learning how Regular Expressions actually work:




This explanation was further down on the same page


Code Explanation: The code starts off dividing the string into two parts (nStr and nStrEnd) if there is a decimal. A regular expression is used on nStr to add the commas. Then nStrEnd is added back. If the string didn't have nStrEnd temporarily removed, then the regular expression would format 10.0004 as 10.0,004


Regular Expression Explanation: \d+ in combination with \d{3} will match a group of 3 numbers preceded by any amount of numbers. This tricks the search into replacing from right to left.


The $1 and $2 are captured group matches from the regular expression. You can read more on this topic on Regex Tutorial.




The code does read from right to left, what it does is searching for the biggest row of digits (\d+) that is followed by 3 digits (\d{3}). The $1 and $2 are respectively the biggest row of digits and the 3 digits. So it place the comma in between them, by repeating this process it can parse it this way.




I wrote a regular expression which does the same thing in a single pass:



Try this for example with varying numbers at the start:


var num = '1234567890123456';

for(var i = 1; i <= num.length; i++)
  console.log(num.slice(0, -i).replace(/(?!\b)(\d{3}(?=(\d{3})*\b))/g, ',$1'));

I'll try to break it down here:


Ignore this bit for now - I'll come back to that.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

It still reads from left to right trying to capture blocks of 3 digits. Here's the capturing group.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

However, inside the capturing group, it uses a lookahead.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

The lookahead looks for any multiple of 3 digits anchored to the end of the number - the terminating boundary. This aligns the capture to multiples of 3 from the right-hand end of the number. This means it works with decimal numbers too (unless they are more than 3 decimal places, in which case it will put commas in them too. It ain't perfect).



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* \ b))

The problem I had was that JavaScript doesn't support atomic look-behinds so, when the number has a multiple of 3 digits, it was matching the first 3 digits and placing a comma at the very start of the number.
You can't match a character before the 3 digit match without throwing off the repetition, so I had to use a negative lookahead that matches a word-boundary. It's kinda the opposite of putting ^ at the start.



(? ! \ b)(\ d { 3 }(? =(\ d { 3 })* $))

Essentially it prevents the expression from matching from the start of the string.
Which would be bad.
