JS/RegEx:将字符串分割为单词,但使用异常RegEx

时间:2022-05-18 21:45:16

I need to split up a string into single words, but there are some cases which should not be splitted.

我需要将字符串分割成单个单词,但是有些情况不应该分割。

An example for type I string
An example for degree II string

So every type | degree + I | II | III | IV | V should be kept as a string

所以每一种|度+ I | II | III | IV | V都应该保持为一个字符串

The result of the example strings should be

示例字符串的结果应该是

['An', 'example', 'for', 'type I', 'string']
['An', 'example', 'for', 'degree II', 'string']

In my regex I have to search for type or degree, followed by space, followed by a string with characters I or V with maximum length of 3. Those matches should not be splited.

在我的regex中,我必须搜索类型或程度,然后是空格,然后是字符I或V的字符串,最大长度为3。那些匹配不应该被分割。

consr regex = '/(type|degree)\s(I{1,3}|V{1})/' // <-- regEx is wrong as it is not working
const result = string.split(' ')

I'm not quite sure how to use the regex in combination with splitting in a way, that all matches are exceptions for splitting by space character.

我不太确定如何将regex与拆分组合在一起,所有的匹配都是由空格字符分割的异常。

1 个解决方案

#1


2  

You may match the words type and degree followed with any Roman number or any 1+ non-whitespace chars with

您可以使用任何罗马数字或任何1+非空格字符来匹配单词类型和程度

var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));

I borrowed and shortened the Roman number regex from here. The pattern matches

我从这里借并缩短了罗马数字regex。模式匹配

  • \b - a word boundary
  • 一个词的边界
  • (?:type|degree) - a non-capturing group matching either type or degree substrings
  • (?:类型|度)-一个非捕获组,匹配类型或程度子字符串
  • \s+ - 1 or more whitespaces
  • \s+ - 1或更多的空白。
  • M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}) - the Roman number regex
  • { 0,4 }(?:C(MD)| D ? C { 0,3 })(?:X(CL)| L ? X { 0,3 })(?:我(十五)| V ?我{ 0,3 })-罗马数字正则表达式
  • \b - a trailing word boundary (this will make sure at least 1 Roman number is present)
  • \b -一个结尾的单词边界(这将确保至少存在一个罗马数字)
  • | - or
  • |——或者
  • \S+ - 1 or more non-whitespace chars.
  • \S+ - 1或更多非空格字符。

Note that in case any symbol or punctuation char is present in front of the degree or type words, it will be matched with \S+ branch, so you need to handle those cases before applying this regex.

注意,如果某个符号或标点符号出现在度数或输入词前面,它将与\S+ branch匹配,因此在应用此regex之前需要处理这些情况。

#1


2  

You may match the words type and degree followed with any Roman number or any 1+ non-whitespace chars with

您可以使用任何罗马数字或任何1+非空格字符来匹配单词类型和程度

var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));

I borrowed and shortened the Roman number regex from here. The pattern matches

我从这里借并缩短了罗马数字regex。模式匹配

  • \b - a word boundary
  • 一个词的边界
  • (?:type|degree) - a non-capturing group matching either type or degree substrings
  • (?:类型|度)-一个非捕获组,匹配类型或程度子字符串
  • \s+ - 1 or more whitespaces
  • \s+ - 1或更多的空白。
  • M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}) - the Roman number regex
  • { 0,4 }(?:C(MD)| D ? C { 0,3 })(?:X(CL)| L ? X { 0,3 })(?:我(十五)| V ?我{ 0,3 })-罗马数字正则表达式
  • \b - a trailing word boundary (this will make sure at least 1 Roman number is present)
  • \b -一个结尾的单词边界(这将确保至少存在一个罗马数字)
  • | - or
  • |——或者
  • \S+ - 1 or more non-whitespace chars.
  • \S+ - 1或更多非空格字符。

Note that in case any symbol or punctuation char is present in front of the degree or type words, it will be matched with \S+ branch, so you need to handle those cases before applying this regex.

注意,如果某个符号或标点符号出现在度数或输入词前面,它将与\S+ branch匹配,因此在应用此regex之前需要处理这些情况。