scala:获取正则表达式的所有可能匹配项

时间:2021-08-05 05:15:41

I need to find all the pairs of word joined with the "and" word.

我需要找到所有用“和”字加入的单词对。

So far I tried with the following:

到目前为止,我尝试了以下内容:

val salute = """.*?(\w+\W+)and(\W+\w+).*""".r

val salute(a,b) = "hello ladies and gentlemen, mesdames and messieurs, how are you?"
a: String = "ladies "
b: String = " gentlemen"

Now I'd like something like this:

现在我想要这样的事情:

salute.findAllMatches("hello ladies and gentlemen, mesdames and messieurs, how are you?")
List[(java.lang.String, java.lang.String)] = List((ladies,gentlemen), (mesdames,mesieurs))

I tried with

我试过了

salute.findAllIn("hello ladies and gentlemen, mesdames and messieurs, how are you?").toList
res14: List[String] = List(hello ladies and gentlemen, mesdames and messieurs, how are you?)

But, as you can see, without success...

但是,正如你所看到的,没有成功......

2 个解决方案

#1


3  

Your regex

你的正则表达式

.*?(\w+\W+)and(\W+\w+).*

will already match everything because of .* before and after. Change it to (or similar based on requirements):

因为。*之前和之后都会匹配所有内容。将其更改为(或根据要求类似):

(\w+\W+)and(\W+\w+)

#2


0  

For getting the result as a list of tuples as you described above you could do these two things:

要将结果作为上面描述的元组列表获取,您可以执行以下两项操作:

Change your regex to be not so greedy i.e. to not consume the whole string at once For example:

将你的正则表达式更改为不那么贪心,即不立即消耗整个字符串例如:

""".(\w+) and (\w+)""".r

Use findAllIn and use the RegexExtractor on all matches to get the parts in the catching parantheses

使用findAllIn并在所有匹配项上使用RegexExtractor来获取捕获parantheses中的部分

Putting everything together a solution producing the desired result might look like this:

将所有内容放在一起产生所需结果的解决方案可能如下所示:

val salute = """.(\w+) and (\w+)""".r
val string = "hello ladies and gentlemen, mesdames and messieurs, how are you?"

val results = for {
  salute(left,right) <- (salute findAllIn string)
} yield (left,right)

println(results toList)

results in

结果是

List((ladies,gentlemen), (mesdames,messieurs))

#1


3  

Your regex

你的正则表达式

.*?(\w+\W+)and(\W+\w+).*

will already match everything because of .* before and after. Change it to (or similar based on requirements):

因为。*之前和之后都会匹配所有内容。将其更改为(或根据要求类似):

(\w+\W+)and(\W+\w+)

#2


0  

For getting the result as a list of tuples as you described above you could do these two things:

要将结果作为上面描述的元组列表获取,您可以执行以下两项操作:

Change your regex to be not so greedy i.e. to not consume the whole string at once For example:

将你的正则表达式更改为不那么贪心,即不立即消耗整个字符串例如:

""".(\w+) and (\w+)""".r

Use findAllIn and use the RegexExtractor on all matches to get the parts in the catching parantheses

使用findAllIn并在所有匹配项上使用RegexExtractor来获取捕获parantheses中的部分

Putting everything together a solution producing the desired result might look like this:

将所有内容放在一起产生所需结果的解决方案可能如下所示:

val salute = """.(\w+) and (\w+)""".r
val string = "hello ladies and gentlemen, mesdames and messieurs, how are you?"

val results = for {
  salute(left,right) <- (salute findAllIn string)
} yield (left,right)

println(results toList)

results in

结果是

List((ladies,gentlemen), (mesdames,messieurs))