为什么这个Java正则表达式不起作用?

时间:2021-01-12 21:45:10

I need to create a regular expression that allows a string to contain any number of:

我需要创建一个正则表达式,允许字符串包含任意数量的:

  • alphanumeric characters
  • spaces
  • (
  • )
  • &
  • .

No other characters are permitted. I used RegexBuddy to construct the following regex, which works correctly when I test it within RegexBuddy:

不允许使用其他字符。我使用RegexBuddy来构造以下正则表达式,当我在RegexBuddy中测试它时,它正常工作:

\w* *\(*\)*&*\.*

Then I used RegexBuddy's "Use" feature to convert this into Java code, but it doesn't appear to work correctly using a simple test program:

然后我使用RegexBuddy的“使用”功能将其转换为Java代码,但使用简单的测试程序似乎无法正常工作:

public class RegexTest
{
  public static void main(String[] args)
  {
    String test = "(AT) & (T)."; // Should be valid
    System.out.println("Test string matches: "
      + test.matches("\\w* *\\(*\\)*&*\\.*")); // Outputs false
  }
}
  • I must admit that I have a bit of a blind spot when it comes to regular expressions. Can anyone explain why it doesn't work please?
  • 我必须承认,在正则表达式方面我有点盲点。任何人都可以解释为什么它不起作用吗?

4 个解决方案

#1


That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods.

正则表达式测试任何数量的空格,后跟任意数量的字母数字字符,然后是任意数量的空白数据,接着是任意数量的密切数据,然后是任意数量的&符号,然后是任意数量的句点。

What you want is...

你想要的是......

test.matches("[\\w \\(\\)&\\.]*")

As mentioned by mmyers, this allows the empty string. If you do not want to allow the empty string...

正如mmyers所提到的,这允许空字符串。如果你不想允许空字符串......

test.matches("[\\w \\(\\)&\\.]+")

Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character...

虽然这也允许一个只有空格的字符串,或者只包含句点等。如果你想确保至少有一个字母数字字符......

test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")

So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's.

所以你理解正则表达式的含义......方括号内的任何内容(“[]”)表示一组字符。因此,“a *”表示0或更多a,[abc] *表示0或更多字符,所有字符都是a,b或c。

#2


Maybe I'm misunderstanding your description, but aren't you essentially defining a class of characters without an order rather than a specific sequence? Shouldn't your regexp have a structure of [xxxx]+, where xxxx are the actual characters you want ?

也许我误解了你的描述,但你不是在没有订单而不是特定的序列来定义一类字符吗?你的regexp不应该有[xxxx] +的结构,其中xxxx是你想要的实际字符吗?

#3


The difference between your Java code snippet and the Test tab in RegexBuddy is that the matches() method in Java requires the regular expression to match the whole string, while the Test tab in RegexBuddy allows partial matches. If you use your original regex in RegexBuddy, you'll see multiple blocks of yellow and blue highlighting. That indicates RegexBuddy found multiple partial matches in your string. To get a regex that works as intended with matches(), you need to edit it until the whole test subject is highlighted in yellow, or if you turn off highlighting, until the Find First button selects the whole text.

您的Java代码片段和RegexBuddy中的Test选项卡之间的区别在于Java中的matches()方法要求正则表达式匹配整个字符串,而RegexBuddy中的Test选项卡允许部分匹配。如果您在RegexBuddy中使用原始正则表达式,您将看到多个黄色和蓝色突出显示的块。这表明RegexBuddy在你的字符串中发现了多个部分匹配。要获得与match()一致的正则表达式,您需要对其进行编辑,直到整个测试主题以黄色突出显示,或者如果您关闭突出显示,直到“查找第一个”按钮选择整个文本。

Alternatively, you can use the anchors \A and \Z at the start and the end of your regex to force it to match the whole string. When you do that, your regex always behaves in the same way, whether you test it in RegexBuddy, or whether you use matches() or another method in Java. Only matches() requires a full string match. All other Matcher methods in Java allow partial matches.

或者,您可以在正则表达式的开头和结尾使用锚点\ A和\ Z来强制它匹配整个字符串。当你这样做时,你的正则表达式总是以相同的方式运行,无论你是在RegexBuddy中测试它,还是在Java中使用matches()或其他方法。只有matches()需要完整的字符串匹配。 Java中的所有其他Matcher方法都允许部分匹配。

#4


the regex

\w* *\(*\)*&*\.*

will give you the items you described, but only in the order you described, and each one can be as many as wanted. So "skjhsklasdkjgsh((((())))))&&&&&....." works, but not mixing the characters.

将为您提供您描述的项目,但仅按您所描述的顺序,每个项目可以达到所需数量。所以“skjhsklasdkjgsh((((())))))&&&&& .....”工作,但不混合字符。

You want a regex like this:

你想要这样的正则表达式:

\[\w\(\)\&\.]+\

which will allow a mix of all characters.

这将允许混合所有角色。

edit: my regex knowledge is limited, so the above syntax may not be perfect.

编辑:我的正则表达式知识是有限的,所以上面的语法可能不完美。

#1


That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods.

正则表达式测试任何数量的空格,后跟任意数量的字母数字字符,然后是任意数量的空白数据,接着是任意数量的密切数据,然后是任意数量的&符号,然后是任意数量的句点。

What you want is...

你想要的是......

test.matches("[\\w \\(\\)&\\.]*")

As mentioned by mmyers, this allows the empty string. If you do not want to allow the empty string...

正如mmyers所提到的,这允许空字符串。如果你不想允许空字符串......

test.matches("[\\w \\(\\)&\\.]+")

Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character...

虽然这也允许一个只有空格的字符串,或者只包含句点等。如果你想确保至少有一个字母数字字符......

test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")

So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's.

所以你理解正则表达式的含义......方括号内的任何内容(“[]”)表示一组字符。因此,“a *”表示0或更多a,[abc] *表示0或更多字符,所有字符都是a,b或c。

#2


Maybe I'm misunderstanding your description, but aren't you essentially defining a class of characters without an order rather than a specific sequence? Shouldn't your regexp have a structure of [xxxx]+, where xxxx are the actual characters you want ?

也许我误解了你的描述,但你不是在没有订单而不是特定的序列来定义一类字符吗?你的regexp不应该有[xxxx] +的结构,其中xxxx是你想要的实际字符吗?

#3


The difference between your Java code snippet and the Test tab in RegexBuddy is that the matches() method in Java requires the regular expression to match the whole string, while the Test tab in RegexBuddy allows partial matches. If you use your original regex in RegexBuddy, you'll see multiple blocks of yellow and blue highlighting. That indicates RegexBuddy found multiple partial matches in your string. To get a regex that works as intended with matches(), you need to edit it until the whole test subject is highlighted in yellow, or if you turn off highlighting, until the Find First button selects the whole text.

您的Java代码片段和RegexBuddy中的Test选项卡之间的区别在于Java中的matches()方法要求正则表达式匹配整个字符串,而RegexBuddy中的Test选项卡允许部分匹配。如果您在RegexBuddy中使用原始正则表达式,您将看到多个黄色和蓝色突出显示的块。这表明RegexBuddy在你的字符串中发现了多个部分匹配。要获得与match()一致的正则表达式,您需要对其进行编辑,直到整个测试主题以黄色突出显示,或者如果您关闭突出显示,直到“查找第一个”按钮选择整个文本。

Alternatively, you can use the anchors \A and \Z at the start and the end of your regex to force it to match the whole string. When you do that, your regex always behaves in the same way, whether you test it in RegexBuddy, or whether you use matches() or another method in Java. Only matches() requires a full string match. All other Matcher methods in Java allow partial matches.

或者,您可以在正则表达式的开头和结尾使用锚点\ A和\ Z来强制它匹配整个字符串。当你这样做时,你的正则表达式总是以相同的方式运行,无论你是在RegexBuddy中测试它,还是在Java中使用matches()或其他方法。只有matches()需要完整的字符串匹配。 Java中的所有其他Matcher方法都允许部分匹配。

#4


the regex

\w* *\(*\)*&*\.*

will give you the items you described, but only in the order you described, and each one can be as many as wanted. So "skjhsklasdkjgsh((((())))))&&&&&....." works, but not mixing the characters.

将为您提供您描述的项目,但仅按您所描述的顺序,每个项目可以达到所需数量。所以“skjhsklasdkjgsh((((())))))&&&&& .....”工作,但不混合字符。

You want a regex like this:

你想要这样的正则表达式:

\[\w\(\)\&\.]+\

which will allow a mix of all characters.

这将允许混合所有角色。

edit: my regex knowledge is limited, so the above syntax may not be perfect.

编辑:我的正则表达式知识是有限的,所以上面的语法可能不完美。