匹配字符串中第一次出现的分号,只有前缀为' - '

时间:2021-04-08 16:56:10

I'm trying to write a regular expression for Java that matches if there is a semicolon that does not have two (or more) leading '-' characters.

我正在尝试为Java编写一个正则表达式,如果有一个分号没有两个(或更多)前导' - '字符。

I'm only able to get the opposite working: A semicolon that has at least two leading '-' characters.

我只能做相反的工作:一个至少有两个前导' - '字符的分号。

([\-]{2,}.*?;.*)

But I need something like

但我需要类似的东西

([^([\-]{2,})])*?;.*

I'm somehow not able to express 'not at least two - characters'.

我不知道怎么说不能表达“至少两个字符”。

Here are some examples I need to evaluate with the expression:

以下是我需要用表达式评估的一些示例:

; -- a           : should match
-- a ;           : should not match
-- ;             : should not match
--;              : should not match
-;-              : should match
---;             : should not match
-- semicolon ;   : should not match
bla ; bla        : should match
bla              : should not match (; is mandatory)
-;--;            : should match (the first occuring semicolon must not have two or more consecutive leading '-')

5 个解决方案

#1


2  

It seems that this regex matches what you want

看来这个正则表达式匹配你想要的东西

String regex = "[^-]*(-[^-]+)*-?;.*";

DEMO

Explanation: matches will accept string that:

说明:matches将接受以下字符串:

  • [^-]* can start with non dash characters
  • [^ - ] *可以以非短划线字符开头

  • (-[^-]+)*-?; is a bit tricky because before we will match ; we need to make sure that each - do not have another - after it so:
    • (-[^-]+)* each - have at least one non - character after it
    • ( - [^ - ] +)*每个 - 后面至少有一个非字符

    • -? or - was placed right before ;
    • - ?或 - 被放置在之前;

  • ( - [^ - ] +)* - ?;有点棘手,因为在我们匹配之前;我们需要确保每一个 - 没有另一个 - 在它之后:( - [^ - ] +)*每个 - 在它之后至少有一个非字符 - ?或 - 被放置在之前;

  • ;.* if earlier conditions ware fulfilled we can accept ; and any .* characters after it.
  • ; *如果早期条件得到满足,我们可以接受;和之后的任何。*字符。


More readable version, but probably little slower

更可读的版本,但可能稍慢

((?!--)[^;])*;.*

Explanation:

To make sure that there is ; in string we can use .*;.* in matches.
But we need to add some conditions to characters before first ;.

确保有;在字符串中我们可以在匹配中使用。*。*。但是我们需要在第一个之前为角色添加一些条件;

So to make sure that matched ; will be first one we can write such regex as

所以要确保匹配;将是第一个我们可以写这样的正则表达式

[^;]*;.*

which means:

  • [^;]* zero or more non semicolon characters
  • [^;] *零个或多个非分号字符

  • ; first semicolon
  • ;第一个分号

  • .* zero or more of any characters (actually . can't match line separators like \n or \r)
  • 。*零个或多个任何字符(实际上。不能匹配\ n或\ r \ n等行分隔符)

So now all we need to do is make sure that character matched by [^;] is not part of --. To do so we can use look-around mechanisms for instance:

所以现在我们需要做的就是确保[^;]匹配的字符不是 - 的一部分。为此,我们可以使用环视机制,例如:

  • (?!--)[^;] before matching [^;] (?!--) checks that next two characters are not --, in other words character matched by [^;] can't be first - in series of two --
  • (?! - )[^;]匹配[^;](?! - )之前检查接下来的两个字符是不是 - 换句话说,[^;]匹配的字符不能是第一个 - 在系列中两个 -

  • [^;](?<!--) checks if after matching [^;] regex engine will not be able to find -- if it will backtrack two positions, in other words [^;] can't be last character in series of --.
  • [^;](?

#2


0  

How about just splitting the string along -- and if there are two or more sub strings, checking if the last one contains a semicolon?

如何只是拆分字符串 - 如果有两个或更多子字符串,检查最后一个字符串是否包含分号?

#3


0  

How about using this regex in Java:

如何在Java中使用此正则表达式:

[^;]*;(?<!--[^;]{0,999};).*

Only caveat is that it works with up to 999 character length between -- and ;

唯一需要注意的是它在 - 和之间最多可以使用999个字符长度;

Java Regex Demo

#4


0  

I think this is what you're looking for:

我想这就是你要找的东西:

^(?:(?!--).)*;.*$

In other words, match from the start of the string (^), zero or more characters (.*) followed by a semicolon. But replacing the dot with (?:(?!--).) causes it to match any character unless it's the beginning of a two-hyphen sequence (--).

换句话说,从字符串的开头(^)匹配,零个或多个字符(。*)后跟分号。但是用(?:(?! - )。)替换点会使它匹配任何字符,除非它是双连字序列( - )的开头。

If performance is an issue, you can exclude the semicolon as well, so it never has to backtrack:

如果性能是一个问题,你也可以排除分号,所以它永远不必回溯:

^(?:(?!--|;).)*;.*$

EDIT: I just noticed your comment that the regex should work with the matches() method, so I padded it out with .*. The anchors aren't really necessary, but they do no harm.

编辑:我刚刚注意到你的评论,正则表达式应该使用matches()方法,所以我用。*填充它。锚点不是必需的,但它们没有任何伤害。

#5


0  

You need a negative lookahead!

你需要一个消极的向前看!

This regex will match any string which does not contain your original match pattern:

此正则表达式将匹配任何不包含原始匹配模式的字符串:

(?!-{2,}.*?;.*).*?;.*

This Regex matches a string which contains a semicolon, but not one occuring after 2 or more dashes.

此正则表达式匹配一个包含分号的字符串,但不是在2个或更多短划线后出现的字符串。

Example: 匹配字符串中第一次出现的分号,只有前缀为' - '

#1


2  

It seems that this regex matches what you want

看来这个正则表达式匹配你想要的东西

String regex = "[^-]*(-[^-]+)*-?;.*";

DEMO

Explanation: matches will accept string that:

说明:matches将接受以下字符串:

  • [^-]* can start with non dash characters
  • [^ - ] *可以以非短划线字符开头

  • (-[^-]+)*-?; is a bit tricky because before we will match ; we need to make sure that each - do not have another - after it so:
    • (-[^-]+)* each - have at least one non - character after it
    • ( - [^ - ] +)*每个 - 后面至少有一个非字符

    • -? or - was placed right before ;
    • - ?或 - 被放置在之前;

  • ( - [^ - ] +)* - ?;有点棘手,因为在我们匹配之前;我们需要确保每一个 - 没有另一个 - 在它之后:( - [^ - ] +)*每个 - 在它之后至少有一个非字符 - ?或 - 被放置在之前;

  • ;.* if earlier conditions ware fulfilled we can accept ; and any .* characters after it.
  • ; *如果早期条件得到满足,我们可以接受;和之后的任何。*字符。


More readable version, but probably little slower

更可读的版本,但可能稍慢

((?!--)[^;])*;.*

Explanation:

To make sure that there is ; in string we can use .*;.* in matches.
But we need to add some conditions to characters before first ;.

确保有;在字符串中我们可以在匹配中使用。*。*。但是我们需要在第一个之前为角色添加一些条件;

So to make sure that matched ; will be first one we can write such regex as

所以要确保匹配;将是第一个我们可以写这样的正则表达式

[^;]*;.*

which means:

  • [^;]* zero or more non semicolon characters
  • [^;] *零个或多个非分号字符

  • ; first semicolon
  • ;第一个分号

  • .* zero or more of any characters (actually . can't match line separators like \n or \r)
  • 。*零个或多个任何字符(实际上。不能匹配\ n或\ r \ n等行分隔符)

So now all we need to do is make sure that character matched by [^;] is not part of --. To do so we can use look-around mechanisms for instance:

所以现在我们需要做的就是确保[^;]匹配的字符不是 - 的一部分。为此,我们可以使用环视机制,例如:

  • (?!--)[^;] before matching [^;] (?!--) checks that next two characters are not --, in other words character matched by [^;] can't be first - in series of two --
  • (?! - )[^;]匹配[^;](?! - )之前检查接下来的两个字符是不是 - 换句话说,[^;]匹配的字符不能是第一个 - 在系列中两个 -

  • [^;](?<!--) checks if after matching [^;] regex engine will not be able to find -- if it will backtrack two positions, in other words [^;] can't be last character in series of --.
  • [^;](?

#2


0  

How about just splitting the string along -- and if there are two or more sub strings, checking if the last one contains a semicolon?

如何只是拆分字符串 - 如果有两个或更多子字符串,检查最后一个字符串是否包含分号?

#3


0  

How about using this regex in Java:

如何在Java中使用此正则表达式:

[^;]*;(?<!--[^;]{0,999};).*

Only caveat is that it works with up to 999 character length between -- and ;

唯一需要注意的是它在 - 和之间最多可以使用999个字符长度;

Java Regex Demo

#4


0  

I think this is what you're looking for:

我想这就是你要找的东西:

^(?:(?!--).)*;.*$

In other words, match from the start of the string (^), zero or more characters (.*) followed by a semicolon. But replacing the dot with (?:(?!--).) causes it to match any character unless it's the beginning of a two-hyphen sequence (--).

换句话说,从字符串的开头(^)匹配,零个或多个字符(。*)后跟分号。但是用(?:(?! - )。)替换点会使它匹配任何字符,除非它是双连字序列( - )的开头。

If performance is an issue, you can exclude the semicolon as well, so it never has to backtrack:

如果性能是一个问题,你也可以排除分号,所以它永远不必回溯:

^(?:(?!--|;).)*;.*$

EDIT: I just noticed your comment that the regex should work with the matches() method, so I padded it out with .*. The anchors aren't really necessary, but they do no harm.

编辑:我刚刚注意到你的评论,正则表达式应该使用matches()方法,所以我用。*填充它。锚点不是必需的,但它们没有任何伤害。

#5


0  

You need a negative lookahead!

你需要一个消极的向前看!

This regex will match any string which does not contain your original match pattern:

此正则表达式将匹配任何不包含原始匹配模式的字符串:

(?!-{2,}.*?;.*).*?;.*

This Regex matches a string which contains a semicolon, but not one occuring after 2 or more dashes.

此正则表达式匹配一个包含分号的字符串,但不是在2个或更多短划线后出现的字符串。

Example: 匹配字符串中第一次出现的分号,只有前缀为' - '