I am using scala regex to extract a token from a URL
我正在使用scala regex从URL中提取令牌
my url is http://www.google.com?x=10&id=x10_23&y=2 here I want to extract the value of x10 in front of id. note that _23 is optional and may or may not appear but if it appears it must be removed.
我的网址是http://www.google.com?x=10&id=x10_23&y=2这里我要在id前面提取x10的值。请注意_23是可选的,可能会也可能不会出现,但如果出现,则必须将其删除。
The regex which I have written is
我写的正则表达式是
val regex = "^.*id=(.*)(\\_\\d+)?.*$".r
x match {
case regex(id) => print(id)
case _ => print("none")
}
this should work because (\\_\\d+)?
should make the _23 optional as a whole.
这应该有效,因为(\\ _ \\ d +)?应该使_23作为一个整体可选。
So I don't understand why it prints none.
所以我不明白为什么它不打印。
2 个解决方案
#1
1
Note that your pattern ^.*id=(.*)(\\_\\d+)?.*$
actually puts x10_23&y=2
into Group 1 because of the 1st greedy dot matching subpattern. Since (_\d+)?
is optional, the first greedy subpattern does not have to yield any characters to that capture group.
请注意,由于第一个贪婪点匹配子模式,您的模式^。* id =(。*)(\\ _ \\ d +)?。* $实际上将x10_23和y = 2放入组1中。自(_ \ d +)?是可选的,第一个贪婪的子模式不必向该捕获组产生任何字符。
You can use
您可以使用
val regex = "(?s).*[?&]id=([^\\W&]+?)(?:_\\d+)?(?:&.*)?".r
val x = "http://www.google.com?x=10&id=x10_23&y=2"
x match {
case regex(id) => print(id)
case _ => print("none")
}
See the IDEONE demo (regex demo)
查看IDEONE演示(正则表达式演示)
Note that there is no need defining ^
and $
- that pattern is anchored in Scala by default. (?s)
ensures we match the full input string even if it contains newline symbols.
请注意,不需要定义^和$ - 默认情况下该模式锚定在Scala中。 (?s)确保我们匹配完整的输入字符串,即使它包含换行符号。
#2
1
Another idea instead of using a regular expression to extract tokens would be to use the built-in URI Java class with its getQuery()
method. There you can split the query by =
and then check if one of the pair starts with id=
and extract the value.
另一种不使用正则表达式来提取令牌的想法是使用内置的URI Java类及其getQuery()方法。在那里你可以用=分割查询,然后检查其中一个是否以id =开头并提取值。
For instance (just as an example):
例如(仅作为示例):
val x = "http://www.google.com?x=10&id=x10_23&y=2"
val uri = new URI(x)
uri.getQuery.split('&').find(_.startsWith("id=")) match {
case Some(param) => println(param.split('=')(1).replace("_23", ""))
case None => println("None")
}
I find it simpler to maintain that the regular expression you have, but that's just my thought!
我发现维护你的正则表达式更简单,但这只是我的想法!
#1
1
Note that your pattern ^.*id=(.*)(\\_\\d+)?.*$
actually puts x10_23&y=2
into Group 1 because of the 1st greedy dot matching subpattern. Since (_\d+)?
is optional, the first greedy subpattern does not have to yield any characters to that capture group.
请注意,由于第一个贪婪点匹配子模式,您的模式^。* id =(。*)(\\ _ \\ d +)?。* $实际上将x10_23和y = 2放入组1中。自(_ \ d +)?是可选的,第一个贪婪的子模式不必向该捕获组产生任何字符。
You can use
您可以使用
val regex = "(?s).*[?&]id=([^\\W&]+?)(?:_\\d+)?(?:&.*)?".r
val x = "http://www.google.com?x=10&id=x10_23&y=2"
x match {
case regex(id) => print(id)
case _ => print("none")
}
See the IDEONE demo (regex demo)
查看IDEONE演示(正则表达式演示)
Note that there is no need defining ^
and $
- that pattern is anchored in Scala by default. (?s)
ensures we match the full input string even if it contains newline symbols.
请注意,不需要定义^和$ - 默认情况下该模式锚定在Scala中。 (?s)确保我们匹配完整的输入字符串,即使它包含换行符号。
#2
1
Another idea instead of using a regular expression to extract tokens would be to use the built-in URI Java class with its getQuery()
method. There you can split the query by =
and then check if one of the pair starts with id=
and extract the value.
另一种不使用正则表达式来提取令牌的想法是使用内置的URI Java类及其getQuery()方法。在那里你可以用=分割查询,然后检查其中一个是否以id =开头并提取值。
For instance (just as an example):
例如(仅作为示例):
val x = "http://www.google.com?x=10&id=x10_23&y=2"
val uri = new URI(x)
uri.getQuery.split('&').find(_.startsWith("id=")) match {
case Some(param) => println(param.split('=')(1).replace("_23", ""))
case None => println("None")
}
I find it simpler to maintain that the regular expression you have, but that's just my thought!
我发现维护你的正则表达式更简单,但这只是我的想法!