正则表达式:当字符串包含正则表达式模式的一部分时匹配字符串的部分

时间:2022-09-13 13:30:57

I want to reduce the number of patterns I have to write by using a regex that picks up any or all of the pattern when it appears in a string.

我想通过使用正则表达式来减少我必须写入的模式数量,该正则表达式在字符串中出现时拾取任何或所有模式。

Is this possible with Regex?

这与Regex有关吗?

E.g. Pattern is: "the cat sat on the mat"

I would like pattern to match on following strings:
"the"
"the cat"
"the cat sat"
...
"the cat sat on the mat"

But it should not match on the following string because although some words match, they are split by a non matching word: "the dog sat"

但它不应该匹配下面的字符串,因为虽然有些单词匹配,但它们被一个不匹配的单词分开:“狗坐”

4 个解决方案

#1


This:

the( cat( sat( on( the( mat)?)?)?)?)?

would answer your question. Remove "optional group" parens "(...)?" for parts that are not optional, add additional groups for things that must match together.

会回答你的问题。删除“可选组”parens“(...)?”对于非可选的部件,请为必须匹配的部件添加其他组。

the                       // complete match
the cat                   // complete match
the cat sat               // complete match
the cat sat on            // complete match
the cat sat on the        // complete match
the cat sat on the mat    // complete match
the dog sat on the mat    // two partial matches ("the")

You might want to add some pre-condition, like a start of line anchor, to prevent the expression from matching the second "the" in the last line:

您可能希望添加一些前置条件,例如行锚的开头,以防止表达式匹配最后一行中的第二个“the”:

^the( cat( sat( on( the( mat)?)?)?)?)?

EDIT: If you add a post-condition, like the end-of-line anchor, matching will be prevented entirely on the last example, that is, the last example won't match at all:

编辑:如果你添加一个后置条件,比如行尾锚点,在最后一个例子中将完全阻止匹配,也就是说,最后一个例子根本不匹配:

the( cat( sat( on( the( mat)?)?)?)?)?$

Credits for the tip go to VonC. Thanks!

提示的积分转到VonC。谢谢!

The post-condition may of course be something else you expect to follow the match.

后置条件当然可能是你期望跟随比赛的其他事情。

Alternatively, you remove the last question mark:

或者,您删除最后一个问号:

the( cat( sat( on( the( mat)?)?)?)?)

Be aware though: This would make a single "the" a non-match, so the first line will also not match.

但请注意:这会使单个“the”不匹配,因此第一行也不匹配。

#2


It could be fairly complicated:

它可能相当复杂:

(?ms)the(?=(\s+cat)|[\r\n]+)(:?\s+cat(?=(\s+sat)|[\r\n]+))?(:?\s+sat(?=(\s+on)|[\r\n]+))?(:?\s+on(?=(\s+the)|[\r\n]+))?(:?\s+the(?=(\s+mat)|[\r\n]+))?(:?\s+mat)?[\r\n]+

Meaning:

  • I want "the" only if followed by "cat" or end of line
  • 我只想在“猫”或行尾之后才想要“the”

  • then I want "cat" (optional) only if followed by "sat"
  • 然后我想要“猫”(可选)只有在跟着“坐”

  • and so one
  • 等等

  • followed by and end of line (which ensure to not match partial "the cat walk...")
  • 后跟和结束(确保不匹配部分“猫步......”)

It does match

它确实匹配

the cat sat on the mat
the cat
the cat sat
the cat sat aa on the mat (nothing is match either)
the dog sat (nothing is matched there)

那只猫坐在垫子上猫猫坐着猫坐在垫子上(没有任何东西也匹配)狗坐着(没有匹配的那里)


On second thought, Tomalak's answer is simpler (if fixed, that is ended with a '$').
I keep mine as a wiki post.

第二个想法,Tomalak的答案更简单(如果修复,则以'$'结束)。我把我作为一个wiki帖子。

#3


If you know the match always begins at the first character, it would be much faster to match the characters directly in a loop. I don't think Regex will do it anyway.

如果你知道匹配总是从第一个字符开始,那么在循环中直接匹配字符要快得多。我认为Regex无论如何都不会这样做。

#4


Perhaps it would be easier and more logical to think about the problem a little differently..

或许以稍微不同的方式思考问题会更容易也更合乎逻辑。

Instead of matching the pattern against the string.... how about using the string as the pattern and looking for it in the pattern.

而不是将模式与字符串匹配....如何使用字符串作为模式并在模式中查找它。

For example where

例如在哪里

string = "the cat sat on" pattern = "the cat sat on the mat"

string =“猫坐在上面”pattern =“猫坐在垫子上”

string is always a subset of pattern and is simply a case of doing a regex match.

string总是模式的一个子集,只是一个正则表达式匹配的情况。

If that makes sense ;-)

如果这是有道理的;-)

#1


This:

the( cat( sat( on( the( mat)?)?)?)?)?

would answer your question. Remove "optional group" parens "(...)?" for parts that are not optional, add additional groups for things that must match together.

会回答你的问题。删除“可选组”parens“(...)?”对于非可选的部件,请为必须匹配的部件添加其他组。

the                       // complete match
the cat                   // complete match
the cat sat               // complete match
the cat sat on            // complete match
the cat sat on the        // complete match
the cat sat on the mat    // complete match
the dog sat on the mat    // two partial matches ("the")

You might want to add some pre-condition, like a start of line anchor, to prevent the expression from matching the second "the" in the last line:

您可能希望添加一些前置条件,例如行锚的开头,以防止表达式匹配最后一行中的第二个“the”:

^the( cat( sat( on( the( mat)?)?)?)?)?

EDIT: If you add a post-condition, like the end-of-line anchor, matching will be prevented entirely on the last example, that is, the last example won't match at all:

编辑:如果你添加一个后置条件,比如行尾锚点,在最后一个例子中将完全阻止匹配,也就是说,最后一个例子根本不匹配:

the( cat( sat( on( the( mat)?)?)?)?)?$

Credits for the tip go to VonC. Thanks!

提示的积分转到VonC。谢谢!

The post-condition may of course be something else you expect to follow the match.

后置条件当然可能是你期望跟随比赛的其他事情。

Alternatively, you remove the last question mark:

或者,您删除最后一个问号:

the( cat( sat( on( the( mat)?)?)?)?)

Be aware though: This would make a single "the" a non-match, so the first line will also not match.

但请注意:这会使单个“the”不匹配,因此第一行也不匹配。

#2


It could be fairly complicated:

它可能相当复杂:

(?ms)the(?=(\s+cat)|[\r\n]+)(:?\s+cat(?=(\s+sat)|[\r\n]+))?(:?\s+sat(?=(\s+on)|[\r\n]+))?(:?\s+on(?=(\s+the)|[\r\n]+))?(:?\s+the(?=(\s+mat)|[\r\n]+))?(:?\s+mat)?[\r\n]+

Meaning:

  • I want "the" only if followed by "cat" or end of line
  • 我只想在“猫”或行尾之后才想要“the”

  • then I want "cat" (optional) only if followed by "sat"
  • 然后我想要“猫”(可选)只有在跟着“坐”

  • and so one
  • 等等

  • followed by and end of line (which ensure to not match partial "the cat walk...")
  • 后跟和结束(确保不匹配部分“猫步......”)

It does match

它确实匹配

the cat sat on the mat
the cat
the cat sat
the cat sat aa on the mat (nothing is match either)
the dog sat (nothing is matched there)

那只猫坐在垫子上猫猫坐着猫坐在垫子上(没有任何东西也匹配)狗坐着(没有匹配的那里)


On second thought, Tomalak's answer is simpler (if fixed, that is ended with a '$').
I keep mine as a wiki post.

第二个想法,Tomalak的答案更简单(如果修复,则以'$'结束)。我把我作为一个wiki帖子。

#3


If you know the match always begins at the first character, it would be much faster to match the characters directly in a loop. I don't think Regex will do it anyway.

如果你知道匹配总是从第一个字符开始,那么在循环中直接匹配字符要快得多。我认为Regex无论如何都不会这样做。

#4


Perhaps it would be easier and more logical to think about the problem a little differently..

或许以稍微不同的方式思考问题会更容易也更合乎逻辑。

Instead of matching the pattern against the string.... how about using the string as the pattern and looking for it in the pattern.

而不是将模式与字符串匹配....如何使用字符串作为模式并在模式中查找它。

For example where

例如在哪里

string = "the cat sat on" pattern = "the cat sat on the mat"

string =“猫坐在上面”pattern =“猫坐在垫子上”

string is always a subset of pattern and is simply a case of doing a regex match.

string总是模式的一个子集,只是一个正则表达式匹配的情况。

If that makes sense ;-)

如果这是有道理的;-)