正则表达式匹配子字符串并返回特定分隔符之间的字符串

Let's say I have a string that can be formatted a few different ways, for example:

假设我有一个字符串，可以用几种不同的方式进行格式化，例如:

"languages:(ruby AND python) role:(software engineer or data scientist)"
语言(ruby和python)角色:(软件工程师或数据科学家)
"role:(software engineer or data scientist) languages:(ruby AND python)"
“角色:(软件工程师或数据科学家)语言:(ruby和python)”
"languages:'python' role:'software engineer'"
“语言:python的角色:软件工程师”
"languages:(ruby AND python)role:(software engineer or data scientist)"
语言(ruby和python)角色:(软件工程师或数据科学家)
"languages:'python'role:'software engineer'"
“语言:python 'role:“软件工程师”
"languages:'python'
“语言:python

And I want to parse this string, identify if role: is present in the string and then capture whatever word(s) are relevant to "role", excluding whatever isn't wrapped in the close parans ) OR the '... so in this example, "languages:'python'role:'software engineer'" would return "software engineer" and "role:(software engineer or data scientist) languages:(ruby AND python)" would return "software engineer or data scientist".

我想要解析这个字符串，标识字符串中是否有role:，然后捕获任何与“role”相关的词，排除在关闭的parans中没有包装的词)或'…因此，在这个例子中，“语言:‘python’角色:‘软件工程师’”将返回“软件工程师”和“角色:(软件工程师或数据科学家)语言:(ruby和python)”将返回“软件工程师或数据科学家”。

Is there a way to do this with something LIKE a word boundary? Specifically, the region after the match on role: would be delimited by either quotes or ()?

有没有一种方法可以用一个词的边界来做?具体来说，角色匹配后的区域:将被引号或()分隔?

1 个解决方案

#1

You may use

你可以用

s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

See the regex demo

看到regex演示

Details

细节

role: - a literal substring
角色:-文字子字符串
(?: - start of an alternation non-capturing group:
- \( - a ( char
- \(- a) char
- \K - match reset operator discarding the text matched so far
- \K -匹配重置操作符丢弃到目前为止匹配的文本
- [^()]+ - 1+ chars other than ( and )
- [^()]+ - 1 +(和)以外的字符
- (?=\)) - a ) should follow the current position
- (?) - a)应跟随当前的位置
(?:开始交替无组:\(-(char \ K -匹配重置运营商丢弃文本匹配到目前为止[^())+ - 1 +(和)以外的字符(? = \))- a)应遵循当前位置
| - or
|——或者
' - a ' char
-一个炭
\K - match reset operator discarding the text matched so far
\K -匹配重置操作符丢弃到目前为止匹配的文本
[^']+ - 1+ chars other than '
(^)+ - 1 +字符以外的
(?=') - there must be ' char immediately to the right
(?=) -必须在右边马上有“char”
) - end of the alternation group.
)-交替组的结束。

NOTE: if you do not care if there is a ) or trailing ', remove the lookaheads to simplify the regex.

注意:如果您不关心是否有)或拖尾，请删除lookahead以简化regex。

Ruby demo:

Ruby演示:

s  = "languages:(ruby AND python) role:(software engineer or data scientist) role:(software engineer or data scientist) languages:(ruby AND python) languages:'python' role:'software engineer'  languages:(ruby AND python)role:(software engineer or data scientist) languages:'python'role:'software engineer' languages:'python'"
puts s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

Output:

输出:

software engineer or data scientist
software engineer or data scientist
software engineer
software engineer or data scientist
software engineer

#1