I need to split up a string into single words, but there are some cases which should not be splitted.
我需要将字符串分割成单个单词,但是有些情况不应该分割。
An example for type I string
An example for degree II string
So every type
| degree
+ I
| II
| III
| IV
| V
should be kept as a string
所以每一种|度+ I | II | III | IV | V都应该保持为一个字符串
The result of the example strings should be
示例字符串的结果应该是
['An', 'example', 'for', 'type I', 'string']
['An', 'example', 'for', 'degree II', 'string']
In my regex I have to search for type
or degree
, followed by space, followed by a string with characters I
or V
with maximum length of 3. Those matches should not be splited.
在我的regex中,我必须搜索类型或程度,然后是空格,然后是字符I或V的字符串,最大长度为3。那些匹配不应该被分割。
consr regex = '/(type|degree)\s(I{1,3}|V{1})/' // <-- regEx is wrong as it is not working
const result = string.split(' ')
I'm not quite sure how to use the regex in combination with splitting in a way, that all matches are exceptions for splitting by space character.
我不太确定如何将regex与拆分组合在一起,所有的匹配都是由空格字符分割的异常。
1 个解决方案
#1
2
You may match the words type
and degree
followed with any Roman number or any 1+ non-whitespace chars with
您可以使用任何罗马数字或任何1+非空格字符来匹配单词类型和程度
var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));
I borrowed and shortened the Roman number regex from here. The pattern matches
我从这里借并缩短了罗马数字regex。模式匹配
-
\b
- a word boundary - 一个词的边界
-
(?:type|degree)
- a non-capturing group matching eithertype
ordegree
substrings - (?:类型|度)-一个非捕获组,匹配类型或程度子字符串
-
\s+
- 1 or more whitespaces - \s+ - 1或更多的空白。
-
M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})
- the Roman number regex - { 0,4 }(?:C(MD)| D ? C { 0,3 })(?:X(CL)| L ? X { 0,3 })(?:我(十五)| V ?我{ 0,3 })-罗马数字正则表达式
-
\b
- a trailing word boundary (this will make sure at least 1 Roman number is present) - \b -一个结尾的单词边界(这将确保至少存在一个罗马数字)
-
|
- or - |——或者
-
\S+
- 1 or more non-whitespace chars. - \S+ - 1或更多非空格字符。
Note that in case any symbol or punctuation char is present in front of the degree
or type
words, it will be matched with \S+
branch, so you need to handle those cases before applying this regex.
注意,如果某个符号或标点符号出现在度数或输入词前面,它将与\S+ branch匹配,因此在应用此regex之前需要处理这些情况。
#1
2
You may match the words type
and degree
followed with any Roman number or any 1+ non-whitespace chars with
您可以使用任何罗马数字或任何1+非空格字符来匹配单词类型和程度
var s = "An example for degree II string";
var rx = /\b(?:type|degree)\s+M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b|\S+/g;
console.log(s.match(rx));
I borrowed and shortened the Roman number regex from here. The pattern matches
我从这里借并缩短了罗马数字regex。模式匹配
-
\b
- a word boundary - 一个词的边界
-
(?:type|degree)
- a non-capturing group matching eithertype
ordegree
substrings - (?:类型|度)-一个非捕获组,匹配类型或程度子字符串
-
\s+
- 1 or more whitespaces - \s+ - 1或更多的空白。
-
M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})
- the Roman number regex - { 0,4 }(?:C(MD)| D ? C { 0,3 })(?:X(CL)| L ? X { 0,3 })(?:我(十五)| V ?我{ 0,3 })-罗马数字正则表达式
-
\b
- a trailing word boundary (this will make sure at least 1 Roman number is present) - \b -一个结尾的单词边界(这将确保至少存在一个罗马数字)
-
|
- or - |——或者
-
\S+
- 1 or more non-whitespace chars. - \S+ - 1或更多非空格字符。
Note that in case any symbol or punctuation char is present in front of the degree
or type
words, it will be matched with \S+
branch, so you need to handle those cases before applying this regex.
注意,如果某个符号或标点符号出现在度数或输入词前面,它将与\S+ branch匹配,因此在应用此regex之前需要处理这些情况。