正则表达式匹配子字符串并返回特定分隔符之间的字符串

时间:2022-09-13 16:44:38

Let's say I have a string that can be formatted a few different ways, for example:

假设我有一个字符串,可以用几种不同的方式进行格式化,例如:

  • "languages:(ruby AND python) role:(software engineer or data scientist)"
  • 语言(ruby和python)角色:(软件工程师或数据科学家)
  • "role:(software engineer or data scientist) languages:(ruby AND python)"
  • “角色:(软件工程师或数据科学家)语言:(ruby和python)”
  • "languages:'python' role:'software engineer'"
  • “语言:python的角色:软件工程师”
  • "languages:(ruby AND python)role:(software engineer or data scientist)"
  • 语言(ruby和python)角色:(软件工程师或数据科学家)
  • "languages:'python'role:'software engineer'"
  • “语言:python 'role:“软件工程师”
  • "languages:'python'
  • “语言:python

And I want to parse this string, identify if role: is present in the string and then capture whatever word(s) are relevant to "role", excluding whatever isn't wrapped in the close parans ) OR the '... so in this example, "languages:'python'role:'software engineer'" would return "software engineer" and "role:(software engineer or data scientist) languages:(ruby AND python)" would return "software engineer or data scientist".

我想要解析这个字符串,标识字符串中是否有role:,然后捕获任何与“role”相关的词,排除在关闭的parans中没有包装的词)或'…因此,在这个例子中,“语言:‘python’角色:‘软件工程师’”将返回“软件工程师”和“角色:(软件工程师或数据科学家)语言:(ruby和python)”将返回“软件工程师或数据科学家”。

Is there a way to do this with something LIKE a word boundary? Specifically, the region after the match on role: would be delimited by either quotes or ()?

有没有一种方法可以用一个词的边界来做?具体来说,角色匹配后的区域:将被引号或()分隔?

1 个解决方案

#1


3  

You may use

你可以用

s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

See the regex demo

看到regex演示

Details

细节

  • role: - a literal substring
  • 角色:-文字子字符串
  • (?: - start of an alternation non-capturing group:
    • \( - a ( char
    • \(- a) char
    • \K - match reset operator discarding the text matched so far
    • \K -匹配重置操作符丢弃到目前为止匹配的文本
    • [^()]+ - 1+ chars other than ( and )
    • [^()]+ - 1 +(和)以外的字符
    • (?=\)) - a ) should follow the current position
    • (?) - a)应跟随当前的位置
  • (?:开始交替无组:\(-(char \ K -匹配重置运营商丢弃文本匹配到目前为止[^())+ - 1 +(和)以外的字符(? = \))- a)应遵循当前位置
  • | - or
  • |——或者
  • ' - a ' char
  • -一个炭
  • \K - match reset operator discarding the text matched so far
  • \K -匹配重置操作符丢弃到目前为止匹配的文本
  • [^']+ - 1+ chars other than '
  • (^)+ - 1 +字符以外的
  • (?=') - there must be ' char immediately to the right
  • (?=) -必须在右边马上有“char”
  • ) - end of the alternation group.
  • )-交替组的结束。

NOTE: if you do not care if there is a ) or trailing ', remove the lookaheads to simplify the regex.

注意:如果您不关心是否有)或拖尾,请删除lookahead以简化regex。

Ruby demo:

Ruby演示:

s  = "languages:(ruby AND python) role:(software engineer or data scientist) role:(software engineer or data scientist) languages:(ruby AND python) languages:'python' role:'software engineer'  languages:(ruby AND python)role:(software engineer or data scientist) languages:'python'role:'software engineer' languages:'python'"
puts s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

Output:

输出:

software engineer or data scientist
software engineer or data scientist
software engineer
software engineer or data scientist
software engineer

#1


3  

You may use

你可以用

s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

See the regex demo

看到regex演示

Details

细节

  • role: - a literal substring
  • 角色:-文字子字符串
  • (?: - start of an alternation non-capturing group:
    • \( - a ( char
    • \(- a) char
    • \K - match reset operator discarding the text matched so far
    • \K -匹配重置操作符丢弃到目前为止匹配的文本
    • [^()]+ - 1+ chars other than ( and )
    • [^()]+ - 1 +(和)以外的字符
    • (?=\)) - a ) should follow the current position
    • (?) - a)应跟随当前的位置
  • (?:开始交替无组:\(-(char \ K -匹配重置运营商丢弃文本匹配到目前为止[^())+ - 1 +(和)以外的字符(? = \))- a)应遵循当前位置
  • | - or
  • |——或者
  • ' - a ' char
  • -一个炭
  • \K - match reset operator discarding the text matched so far
  • \K -匹配重置操作符丢弃到目前为止匹配的文本
  • [^']+ - 1+ chars other than '
  • (^)+ - 1 +字符以外的
  • (?=') - there must be ' char immediately to the right
  • (?=) -必须在右边马上有“char”
  • ) - end of the alternation group.
  • )-交替组的结束。

NOTE: if you do not care if there is a ) or trailing ', remove the lookaheads to simplify the regex.

注意:如果您不关心是否有)或拖尾,请删除lookahead以简化regex。

Ruby demo:

Ruby演示:

s  = "languages:(ruby AND python) role:(software engineer or data scientist) role:(software engineer or data scientist) languages:(ruby AND python) languages:'python' role:'software engineer'  languages:(ruby AND python)role:(software engineer or data scientist) languages:'python'role:'software engineer' languages:'python'"
puts s.scan(/role:(?:\(\K[^()]+(?=\))|'\K[^']+(?='))/)

Output:

输出:

software engineer or data scientist
software engineer or data scientist
software engineer
software engineer or data scientist
software engineer