Javascript正则表达式在逗号分隔的字符串中拆分单词

时间:2022-04-29 15:40:55

I am trying to split a comma separated string using regex.

我正在尝试使用正则表达式分割逗号分隔的字符串。

var a = 'hi,mr.007,bond,12:25PM'; //there are no white spaces between commas
var b = /(\S+?),(?=\S|$)/g;
b.exec(a); // does not catch the last item.

Any suggestion to catch all the items.

任何有关捕获所有物品的建议。

3 个解决方案

#1


7  

Use a negated character class:

使用否定的字符类:

/([^,]+)/g

will match groups of non-commas.

将匹配非逗号组。

< a = 'hi,mr.007,bond,12:25PM'
> "hi,mr.007,bond,12:25PM"
< b=/([^,]+)/g
> /([^,]+)/g
< a.match(b)
> ["hi", "mr.007", "bond", "12:25PM"]

#2


5  

Why not just use .split?

为什么不使用.split?

>'hi,mr.007,bond,12:25PM'.split(',')
["hi", "mr.007", "bond", "12:25PM"]

If you must use regex for some reason:

如果由于某种原因必须使用正则表达式:

str.match(/(\S+?)(?:,|$)/g)
["hi,", "mr.007,", "bond,", "12:25PM"]

(note the inclusion of commas).

(注意包含逗号)。

#3


0  

If you are passing a CSV file, some of your values may have got double-quotes around them, so you may need something a little more complicated. For example:

如果您传递的是CSV文件,那么您的某些值可能会有双引号,因此您可能需要更复杂的内容。例如:

Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)");

Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL");

while (m.find()) {
    System.out.println( m.group(1));
}

or in Groovy:

或者在Groovy中:

java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)')
        .matcher("11,=\"12,345\",ABC,,JKL")
            .iterator()
                .collect { it[1] }

This code:

  • blank lines
  • empty columns, including the last column being empty
  • 空列,包括最后一列为空

  • handles values wrapped in quotes
  • 处理用引号括起来的值

  • but does not handle double quotes used for escaping a double quote itself
  • 但不处理用于转义双引号本身的双引号

The pattern consists of:

该模式包括:

  • (?:^|,) matches the start of the line or a comma after the last column, but does not add that to the group

    (?:^ |,)匹配行的开头或最后一列之后的逗号,但不会将其添加到组

  • ((?:[^",]|"[^"]*")*) matches the value of the column, and consists of:

    ((?:[^“,] |”[^“] *”)*)匹配列的值,包括:

    • a collecting group, which collects zero or more characters that are:

      收集组,收集零个或多个字符:

      • [^",] is a character that's not a comma or a quote
      • [^“,]是一个不是逗号或引号的字符

      • "[^"]*" is a double-quote followed by zero or more other characters ending in another double-quote
      • “[^”] *“是双引号,后跟零个或多个其他字符,以另一个双引号结尾

    • those are or-ed together, using a non-collecting group: (?:[^",]|"[^"]*")

      那些是使用非收集组一起编辑的:(?:[^“,] |”[^“] *”)

    • use a * to repeat the above any number of times: (?:[^",]|"[^"]*")*
    • 使用*重复上述任何次数:(?:[^“,] |”[^“] *”)*

    • and into a collecting group to give the columns value: ((?:[^",]|"[^"]*")*)
    • 并进入一个收集组给列值:((?:[^“,] |”[^“] *”)*)

Doing escaping of double quotes is left as an exercise to the reader

逃避双引号是留给读者的练习

#1


7  

Use a negated character class:

使用否定的字符类:

/([^,]+)/g

will match groups of non-commas.

将匹配非逗号组。

< a = 'hi,mr.007,bond,12:25PM'
> "hi,mr.007,bond,12:25PM"
< b=/([^,]+)/g
> /([^,]+)/g
< a.match(b)
> ["hi", "mr.007", "bond", "12:25PM"]

#2


5  

Why not just use .split?

为什么不使用.split?

>'hi,mr.007,bond,12:25PM'.split(',')
["hi", "mr.007", "bond", "12:25PM"]

If you must use regex for some reason:

如果由于某种原因必须使用正则表达式:

str.match(/(\S+?)(?:,|$)/g)
["hi,", "mr.007,", "bond,", "12:25PM"]

(note the inclusion of commas).

(注意包含逗号)。

#3


0  

If you are passing a CSV file, some of your values may have got double-quotes around them, so you may need something a little more complicated. For example:

如果您传递的是CSV文件,那么您的某些值可能会有双引号,因此您可能需要更复杂的内容。例如:

Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)");

Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL");

while (m.find()) {
    System.out.println( m.group(1));
}

or in Groovy:

或者在Groovy中:

java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)')
        .matcher("11,=\"12,345\",ABC,,JKL")
            .iterator()
                .collect { it[1] }

This code:

  • blank lines
  • empty columns, including the last column being empty
  • 空列,包括最后一列为空

  • handles values wrapped in quotes
  • 处理用引号括起来的值

  • but does not handle double quotes used for escaping a double quote itself
  • 但不处理用于转义双引号本身的双引号

The pattern consists of:

该模式包括:

  • (?:^|,) matches the start of the line or a comma after the last column, but does not add that to the group

    (?:^ |,)匹配行的开头或最后一列之后的逗号,但不会将其添加到组

  • ((?:[^",]|"[^"]*")*) matches the value of the column, and consists of:

    ((?:[^“,] |”[^“] *”)*)匹配列的值,包括:

    • a collecting group, which collects zero or more characters that are:

      收集组,收集零个或多个字符:

      • [^",] is a character that's not a comma or a quote
      • [^“,]是一个不是逗号或引号的字符

      • "[^"]*" is a double-quote followed by zero or more other characters ending in another double-quote
      • “[^”] *“是双引号,后跟零个或多个其他字符,以另一个双引号结尾

    • those are or-ed together, using a non-collecting group: (?:[^",]|"[^"]*")

      那些是使用非收集组一起编辑的:(?:[^“,] |”[^“] *”)

    • use a * to repeat the above any number of times: (?:[^",]|"[^"]*")*
    • 使用*重复上述任何次数:(?:[^“,] |”[^“] *”)*

    • and into a collecting group to give the columns value: ((?:[^",]|"[^"]*")*)
    • 并进入一个收集组给列值:((?:[^“,] |”[^“] *”)*)

Doing escaping of double quotes is left as an exercise to the reader

逃避双引号是留给读者的练习