正则表达式在一个可选字符串上匹配非贪婪,在另一个字符串上贪婪

时间:2022-02-05 20:28:55

I've researched around for a while and haven't found a clue for matching the following pattern (I am also very new to regex, though), it looks either like

我已经研究了一段时间,并没有找到匹配以下模式的线索(虽然我也是正则表达式的新手),它看起来像是

/abc/foo/bar(/*) 

or

/abc/foo/bar/stop

So I want to match and capture the above string as /abc/foo/bar. Now "/stop" is an optional string that could be appended at the end of the pattern. The goal is to get the desired capture while ignoring "stop" if they present (and if "stop" exists multiple times stop at the first "stop"), while allow as many slashes in the middle as possible except the slash at the end of line.

所以我想匹配并捕获上面的字符串为/ abc / foo / bar。现在“/ stop”是一个可选字符串,可以附加在模式的末尾。目标是获得所需的捕获,同时忽略“停止”(如果它们存在)(并且如果“停止”多次停留在第一个“停止”处),同时允许尽可能多的中间斜线,除了最后的斜线线。

If I simply do:

如果我只是这样做:

^(/.*[^/])/*$

Which is greedy in including all slashes until I strip off the possible last occurrence; but in order to accept the second case where I have an optional "/stop", I need to match in a non-greedy way until I find the first possible "/stop" and stop there.

包括所有斜线在内的哪个是贪婪的,直到我剥离可能的最后一次出现;但是为了接受我有一个可选的“/ stop”的第二种情况,我需要以非贪婪的方式进行匹配,直到找到第一个可能的“/ stop”并停在那里。

How can I craft a single regex that matches both cases?

如何制作一个匹配两种情况的正则表达式?

EDIT: Not sure if my previous example wasn't clear enough. Try to give more, say I want to match and capture "/abc/foo/bar" in all of the following strings:

编辑:不确定我之前的例子是否不够清楚。尝试提供更多,说我想匹配并捕获以下所有字符串中的“/ abc / foo / bar”:

/abc/foo/bar
/abc/foo/bar/
/abc/foo/bar///
/abc/foo/bar/stop
/abc/foo/bar/stop/foo/bar/stop/stop
/abc/foo/bar//stop

While it won't match any of the followings:

虽然它不符合以下任何一项:

/abc/foo/bar/sto (will match the whole "/abc/foo/bar/sto" instead)
/abc/foo/bar/abc/foo/bar (it will catch "/abc/foo/bar/abc/foo/bar" instead)

Let me know if this is clear enough. Thanks!

如果这一点足够清楚,请告诉我。谢谢!

2 个解决方案

#1


3  

Try this:

/^(?:\/+(?!$|(?:stop\/?))[^\/]+)*/

Regex101 Demo

Explanation:

This matches the start of the string (^), followed by zero or more instances of the following pattern:

这匹配字符串的开头(^),后跟以下模式的零个或多个实例:

  • one or more slashes (\/+) that are not followed by the end of the string ($) or by stop, followed by
  • 一个或多个斜杠(\ / +),后面没有字符串的结尾($)或者是stop,后跟

  • one or more non-slash characters ([^\/]+)
  • 一个或多个非斜杠字符([^ \ /] +)

正则表达式在一个可选字符串上匹配非贪婪,在另一个字符串上贪婪

Here's a Debuggex Demo with working unit tests.

这是一个带有工作单元测试的Debuggex演示。

EDIT: Here is an alternative, arguably simpler, regex:

编辑:这是一个替代,可以说更简单,正则表达式:

/^.+?(?=\/*$|\/+stop\b)/

This matches one or more characters in a non-greedy manner, then stops when whatever is after the match is one of the following:

这会以非贪婪的方式匹配一个或多个字符,然后当匹配后的任何内容为以下之一时停止:

  1. the end of the string ($), possibly preceded by one or more slashes (\/*)
  2. 字符串($)的结尾,可能前面有一个或多个斜杠(\ / *)

  3. one or more slashes, the word stop, and a word break.
  4. 一个或多个斜杠,单词停止和单词分隔符。

Here's a Regex101 demo of this option.

这是此选项的Regex101演示。

EDIT 2: If you'd like to test this, here's a simple JavaScript test that tests the second regex above against various test strings and logs the results to the console:

编辑2:如果你想测试这个,这里是一个简单的JavaScript测试,测试上面的第二个正则表达式对各种测试字符串并将结果记录到控制台:

var re = /^.+?(?=\/*$|\/+stop\b)/,
    test_strings = ["/abc/foo/bar",
                    "/abc/foo/bar/",
                    "/abc/foo/bar///",
                    "/abc/foo/bar/stop",
                    "/abc/foo/bar/stop/foo/bar/stop/stop",
                    "/abc/foo/bar//stop",
                    "/abc/foo/bar/sto",
                    "/abc/foo/bar/abc/foo/bar"];
for(var s = 0; s < test_strings.length; s++) {
    console.log(test_strings[s].match(re)[0]);
}

/*
Results:

/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar/sto
/abc/foo/bar/abc/foo/bar 

*/

#2


2  

You can try something like this:

你可以尝试这样的事情:

^((?:/[^/]+)+?)(?:/+|/+stop(?:/.*)?)$

demo

and if atomic groups are available, you better write:

如果原子组可用,你最好写:

^((?:/[^/]+)+?)(?>/+$|/+stop(?:/.*)?)

demo

If lookaheads are available:

如果可以预见:

^/(?>[^/]+|/(?!/*(?:$|stop(?:/|$))))+

demo

ps: don't forget to escape slashes if your delimiters are slashes.

ps:如果你的分隔符是斜杠,别忘了逃避斜线。

As Ed Cottrell notices it, features like atomic grouping are not available in language like Javascript or in the re module of Python. However, this feature can be efficiently emulated using the fact that a lookahead is naturaly atomic: (?>a+) <=> (?=(a+))\1

正如Ed Cottrell所注意到的那样,原子分组等功能在Javascript等语言或Python的re模块中不可用。但是,使用前瞻天然原子的事实可以有效地模拟这个特征:(?> a +)<=>(?=(a +))\ 1

#1


3  

Try this:

/^(?:\/+(?!$|(?:stop\/?))[^\/]+)*/

Regex101 Demo

Explanation:

This matches the start of the string (^), followed by zero or more instances of the following pattern:

这匹配字符串的开头(^),后跟以下模式的零个或多个实例:

  • one or more slashes (\/+) that are not followed by the end of the string ($) or by stop, followed by
  • 一个或多个斜杠(\ / +),后面没有字符串的结尾($)或者是stop,后跟

  • one or more non-slash characters ([^\/]+)
  • 一个或多个非斜杠字符([^ \ /] +)

正则表达式在一个可选字符串上匹配非贪婪,在另一个字符串上贪婪

Here's a Debuggex Demo with working unit tests.

这是一个带有工作单元测试的Debuggex演示。

EDIT: Here is an alternative, arguably simpler, regex:

编辑:这是一个替代,可以说更简单,正则表达式:

/^.+?(?=\/*$|\/+stop\b)/

This matches one or more characters in a non-greedy manner, then stops when whatever is after the match is one of the following:

这会以非贪婪的方式匹配一个或多个字符,然后当匹配后的任何内容为以下之一时停止:

  1. the end of the string ($), possibly preceded by one or more slashes (\/*)
  2. 字符串($)的结尾,可能前面有一个或多个斜杠(\ / *)

  3. one or more slashes, the word stop, and a word break.
  4. 一个或多个斜杠,单词停止和单词分隔符。

Here's a Regex101 demo of this option.

这是此选项的Regex101演示。

EDIT 2: If you'd like to test this, here's a simple JavaScript test that tests the second regex above against various test strings and logs the results to the console:

编辑2:如果你想测试这个,这里是一个简单的JavaScript测试,测试上面的第二个正则表达式对各种测试字符串并将结果记录到控制台:

var re = /^.+?(?=\/*$|\/+stop\b)/,
    test_strings = ["/abc/foo/bar",
                    "/abc/foo/bar/",
                    "/abc/foo/bar///",
                    "/abc/foo/bar/stop",
                    "/abc/foo/bar/stop/foo/bar/stop/stop",
                    "/abc/foo/bar//stop",
                    "/abc/foo/bar/sto",
                    "/abc/foo/bar/abc/foo/bar"];
for(var s = 0; s < test_strings.length; s++) {
    console.log(test_strings[s].match(re)[0]);
}

/*
Results:

/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar/sto
/abc/foo/bar/abc/foo/bar 

*/

#2


2  

You can try something like this:

你可以尝试这样的事情:

^((?:/[^/]+)+?)(?:/+|/+stop(?:/.*)?)$

demo

and if atomic groups are available, you better write:

如果原子组可用,你最好写:

^((?:/[^/]+)+?)(?>/+$|/+stop(?:/.*)?)

demo

If lookaheads are available:

如果可以预见:

^/(?>[^/]+|/(?!/*(?:$|stop(?:/|$))))+

demo

ps: don't forget to escape slashes if your delimiters are slashes.

ps:如果你的分隔符是斜杠,别忘了逃避斜线。

As Ed Cottrell notices it, features like atomic grouping are not available in language like Javascript or in the re module of Python. However, this feature can be efficiently emulated using the fact that a lookahead is naturaly atomic: (?>a+) <=> (?=(a+))\1

正如Ed Cottrell所注意到的那样,原子分组等功能在Javascript等语言或Python的re模块中不可用。但是,使用前瞻天然原子的事实可以有效地模拟这个特征:(?> a +)<=>(?=(a +))\ 1