JS/RegEx:将字符串分割为单词，但使用异常RegEx

I need to split up a string into single words, but there are some cases which should not be splitted.

我需要将字符串分割成单个单词，但是有些情况不应该分割。

An example for type I string
An example for degree II string

So every type | degree + I | II | III | IV | V should be kept as a string

所以每一种|度+ I | II | III | IV | V都应该保持为一个字符串

The result of the example strings should be

示例字符串的结果应该是

['An', 'example', 'for', 'type I', 'string']
['An', 'example', 'for', 'degree II', 'string']

In my regex I have to search for type or degree, followed by space, followed by a string with characters I or V with maximum length of 3. Those matches should not be splited.

在我的regex中，我必须搜索类型或程度，然后是空格，然后是字符I或V的字符串，最大长度为3。那些匹配不应该被分割。

consr regex = '/(type|degree)\s(I{1,3}|V{1})/' // <-- regEx is wrong as it is not working
const result = string.split(' ')

I'm not quite sure how to use the regex in combination with splitting in a way, that all matches are exceptions for splitting by space character.

我不太确定如何将regex与拆分组合在一起，所有的匹配都是由空格字符分割的异常。

1 个解决方案

#1

You may match the words type and degree followed with any Roman number or any 1+ non-whitespace chars with

您可以使用任何罗马数字或任何1+非空格字符来匹配单词类型和程度

var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));

I borrowed and shortened the Roman number regex from here. The pattern matches

我从这里借并缩短了罗马数字regex。模式匹配

\b - a word boundary
一个词的边界
(?:type|degree) - a non-capturing group matching either type or degree substrings
(?:类型|度)-一个非捕获组，匹配类型或程度子字符串
\s+ - 1 or more whitespaces
\s+ - 1或更多的空白。
M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}) - the Roman number regex
{ 0,4 }(?:C(MD)| D ? C { 0,3 })(?:X(CL)| L ? X { 0,3 })(?:我(十五)| V ?我{ 0,3 })-罗马数字正则表达式
\b - a trailing word boundary (this will make sure at least 1 Roman number is present)
\b -一个结尾的单词边界(这将确保至少存在一个罗马数字)
| - or
|——或者
\S+ - 1 or more non-whitespace chars.
\S+ - 1或更多非空格字符。

Note that in case any symbol or punctuation char is present in front of the degree or type words, it will be matched with \S+ branch, so you need to handle those cases before applying this regex.

注意，如果某个符号或标点符号出现在度数或输入词前面，它将与\S+ branch匹配，因此在应用此regex之前需要处理这些情况。

#1