I want to extract bell.com from these following input using Scala regex. I have tried a few variations without success.
我想使用Scala正则表达式从以下输入中提取bell.com。我尝试了一些变化没有成功。
"www.bell.com"
"bell.com"
"http://www.bell.com"
"https://www.bell.com"
"https://bell.com/about"
"https://www.bell.com?token=123"
This is my code but not working.
这是我的代码但不起作用。
val pattern = """(?:([http|https]://)?)(?:(www\.)?)([A-Za-z0-9._%+-]+)[/]?(?:.*)""".r
url match {
case pattern(domain) =>
print(domain)
case _ => print("not found!")
}
EDIT: My regex is wrong. Thanks to @Tabo. This is correct one.
编辑:我的正则表达式错了。感谢@Tabo。这是正确的。
(?:https?://)?(?:www\.)?([A-Za-z0-9._%+-]+)/?.*
3 个解决方案
#1
5
You can try:
你可以试试:
import java.net.URL
import util.Try
val t = "https://www.bell.com?token=123"
val url = Try { new URL(t) }.toOption
#2
3
You can use Java URL class to get Host, or you can check Apache library
您可以使用Java URL类来获取Host,也可以检查Apache库
new URL("https://www.bell.com?token=123").getHost
#3
0
You should probably use the java.net.URL
method, but...
您应该使用java.net.URL方法,但是......
For future reference, you have a couple of issues in your regex. Square brackets match character sets so [http|https]
is the same as [htps|]
(meaning 'h', 't', 'p', 's', or '|'). I think you mean http|https
or simply https?
.
为了将来参考,您的正则表达式中存在一些问题。方括号匹配字符集,因此[http | https]与[htps |](意为“h”,“t”,“p”,“s”或“|”)相同。我认为你的意思是http | https或只是https?
Also, if you are only trying to match just the domain, you want to only have one capturing group. Note that (?:blah)
denotes a non-capturing group, while (blah)
is a capturing group. The three capturing groups in your regex are ([http|https]://)
, (www\.)?
, and ([A-Za-z0-9._%+-]+)
. You really only want the last one.
此外,如果您只是尝试仅匹配域,则只希望拥有一个捕获组。注意,(?:blah)表示非捕获组,而(blah)表示捕获组。正则表达式中的三个捕获组是([http | https]://),(www \。)?和([A-Za-z0-9 ._%+ - ] +)。你真的只想要最后一个。
Try:
(?:https?://)?(?:www\.)?([A-Za-z0-9._%+-]+)/?.*
Test it here - https://regex101.com/r/xW4iY7/2
在此测试 - https://regex101.com/r/xW4iY7/2
#1
5
You can try:
你可以试试:
import java.net.URL
import util.Try
val t = "https://www.bell.com?token=123"
val url = Try { new URL(t) }.toOption
#2
3
You can use Java URL class to get Host, or you can check Apache library
您可以使用Java URL类来获取Host,也可以检查Apache库
new URL("https://www.bell.com?token=123").getHost
#3
0
You should probably use the java.net.URL
method, but...
您应该使用java.net.URL方法,但是......
For future reference, you have a couple of issues in your regex. Square brackets match character sets so [http|https]
is the same as [htps|]
(meaning 'h', 't', 'p', 's', or '|'). I think you mean http|https
or simply https?
.
为了将来参考,您的正则表达式中存在一些问题。方括号匹配字符集,因此[http | https]与[htps |](意为“h”,“t”,“p”,“s”或“|”)相同。我认为你的意思是http | https或只是https?
Also, if you are only trying to match just the domain, you want to only have one capturing group. Note that (?:blah)
denotes a non-capturing group, while (blah)
is a capturing group. The three capturing groups in your regex are ([http|https]://)
, (www\.)?
, and ([A-Za-z0-9._%+-]+)
. You really only want the last one.
此外,如果您只是尝试仅匹配域,则只希望拥有一个捕获组。注意,(?:blah)表示非捕获组,而(blah)表示捕获组。正则表达式中的三个捕获组是([http | https]://),(www \。)?和([A-Za-z0-9 ._%+ - ] +)。你真的只想要最后一个。
Try:
(?:https?://)?(?:www\.)?([A-Za-z0-9._%+-]+)/?.*
Test it here - https://regex101.com/r/xW4iY7/2
在此测试 - https://regex101.com/r/xW4iY7/2