I am trying to match a URL with 6 or more than 6 levels or sub-paths
我正在尝试匹配6个或6个以上级别或子路径的URL
http://www.domain.com/level1/level2/level3/level4/level5/level6/level7/level8/level9/level10/level11/level12.html
I came up with an expression
我想出了一个表情
^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b)
...which matches level1 (demo)
...匹配level1(演示)
However, when I am trying to match a URL with six or more levels it doesn't seem to work.
但是,当我尝试匹配六个或更多级别的URL时,它似乎不起作用。
^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b){6,}
(demo)
(演示)
2 个解决方案
#1
1
I think this is what you were trying for:
我想这就是你想要的:
^http://([a-zA-Z.-]+)/(?:[^/]+/){6,}.*$
This matches six or more levels, which is what you said you wanted in the question. However in the question's title you phrased it "more than six". If that's what you really want, change the quantifier from {6,}
to {7,}
.
这匹配六个或更多级别,这就是你在问题中所说的。但是在问题标题中,你说它“超过六个”。如果这是您真正想要的,请将量词从{6,}更改为{7,}。
On a side note, the forward slash (/
) has no special meaning in regexes, and doesn't need to be escaped. Rubular forces you to escape the slash because that's what it uses as the regex delimiter. Nutch uses Java's built-in regexes, so you should use a tester that the same flavor, like this one.
在旁注中,正斜杠(/)在正则表达式中没有特殊含义,并且不需要进行转义。 Rubular迫使你逃避斜线,因为它是用作正则表达式分隔符的东西。 Nutch使用Java的内置正则表达式,因此您应该使用与此类似的相同风格的测试程序。
#2
2
Try the following:
请尝试以下方法:
^http:\/\/([a-zA-Z\.-]*)(\/[\w\.]+){6,}
http://rubular.com/r/QZlidUqheq
http://rubular.com/r/QZlidUqheq
#1
1
I think this is what you were trying for:
我想这就是你想要的:
^http://([a-zA-Z.-]+)/(?:[^/]+/){6,}.*$
This matches six or more levels, which is what you said you wanted in the question. However in the question's title you phrased it "more than six". If that's what you really want, change the quantifier from {6,}
to {7,}
.
这匹配六个或更多级别,这就是你在问题中所说的。但是在问题标题中,你说它“超过六个”。如果这是您真正想要的,请将量词从{6,}更改为{7,}。
On a side note, the forward slash (/
) has no special meaning in regexes, and doesn't need to be escaped. Rubular forces you to escape the slash because that's what it uses as the regex delimiter. Nutch uses Java's built-in regexes, so you should use a tester that the same flavor, like this one.
在旁注中,正斜杠(/)在正则表达式中没有特殊含义,并且不需要进行转义。 Rubular迫使你逃避斜线,因为它是用作正则表达式分隔符的东西。 Nutch使用Java的内置正则表达式,因此您应该使用与此类似的相同风格的测试程序。
#2
2
Try the following:
请尝试以下方法:
^http:\/\/([a-zA-Z\.-]*)(\/[\w\.]+){6,}
http://rubular.com/r/QZlidUqheq
http://rubular.com/r/QZlidUqheq