scala匹配可选的字符集

I am using scala regex to extract a token from a URL

我正在使用scala regex从URL中提取令牌

my url is http://www.google.com?x=10&id=x10_23&y=2 here I want to extract the value of x10 in front of id. note that _23 is optional and may or may not appear but if it appears it must be removed.

我的网址是http://www.google.com?x=10&id=x10_23&y=2这里我要在id前面提取x10的值。请注意_23是可选的,可能会也可能不会出现,但如果出现,则必须将其删除。

The regex which I have written is

我写的正则表达式是

val regex = "^.*id=(.*)(\\_\\d+)?.*$".r
x match {
    case regex(id) => print(id)
    case _ => print("none")
}

this should work because (\\_\\d+)? should make the _23 optional as a whole.

这应该有效,因为(\\ _ \\ d +)?应该使_23作为一个整体可选。

So I don't understand why it prints none.

所以我不明白为什么它不打印。

2 个解决方案

#1

Note that your pattern ^.*id=(.*)(\\_\\d+)?.*$ actually puts x10_23&y=2 into Group 1 because of the 1st greedy dot matching subpattern. Since (_\d+)? is optional, the first greedy subpattern does not have to yield any characters to that capture group.

请注意,由于第一个贪婪点匹配子模式,您的模式^。* id =(。*)(\\ _ \\ d +)?。* $实际上将x10_23和y = 2放入组1中。自(_ \ d +)?是可选的,第一个贪婪的子模式不必向该捕获组产生任何字符。

You can use

您可以使用

val regex = "(?s).*[?&]id=([^\\W&]+?)(?:_\\d+)?(?:&.*)?".r
val x = "http://www.google.com?x=10&id=x10_23&y=2"
x match {
    case regex(id) => print(id)
    case _ => print("none")
}

See the IDEONE demo (regex demo)

查看IDEONE演示(正则表达式演示)

Note that there is no need defining ^ and $ - that pattern is anchored in Scala by default. (?s) ensures we match the full input string even if it contains newline symbols.

请注意,不需要定义^和$ - 默认情况下该模式锚定在Scala中。 (?s)确保我们匹配完整的输入字符串,即使它包含换行符号。

#2

Another idea instead of using a regular expression to extract tokens would be to use the built-in URI Java class with its getQuery() method. There you can split the query by = and then check if one of the pair starts with id= and extract the value.

另一种不使用正则表达式来提取令牌的想法是使用内置的URI Java类及其getQuery()方法。在那里你可以用=分割查询,然后检查其中一个是否以id =开头并提取值。

For instance (just as an example):

例如(仅作为示例):

val x = "http://www.google.com?x=10&id=x10_23&y=2"
val uri = new URI(x)

uri.getQuery.split('&').find(_.startsWith("id=")) match {
    case Some(param) => println(param.split('=')(1).replace("_23", ""))
    case None => println("None")
}

I find it simpler to maintain that the regular expression you have, but that's just my thought!

我发现维护你的正则表达式更简单,但这只是我的想法!

#1