Given the following input string 3481.7.1071.html
给出以下输入字符串3481.7.1071.html
I want to confirm that
我想证实这一点
- The string has 1 or more numbers followed by a period.
- The string ends in html.
该字符串包含一个或多个数字,后跟一个句点。
字符串以html结尾。
Finally, I want to extract the left-most number (i.e. 3481).
最后,我想提取最左边的数字(即3481)。
My current regex is nearly there but I can't capture the correct group:
我当前的正则表达式几乎就在那里,但我无法捕获正确的组:
final Pattern p = Pattern.compile("(\\d++\\.)+html");
final Matcher m = p.matcher("3481.7.1071.html");
if (m.matches()) {
final String corrected = m.group(1)+"html"; // WRONG! Gives 1071.html
}
How do I capture the first match?
我如何捕获第一场比赛?
6 个解决方案
#1
7
You can just factor it out:
你可以把它考虑出来:
(\d+\.)(\d+\.)*html
#2
3
"^(\\d+)\\.(\\d+\\.)*html$"
#3
0
Java style: "(\\d+)\\..*?\\.html$"
Java风格:“(\\ d +)\\ .. *?\\。html $”
This will 1) grab the first group of consecutive digits, 2) require a dot after words, 3) jump over everything except 3) the literal string '.html'.
这将1)抓住第一组连续数字,2)在单词后需要一个点,3)跳过除了3)文字字符串'.html'之外的所有内容。
If you mean "one or more [groups] of numbers followed by a period" then this is more along the lines of your requirements.
如果您的意思是“一个或多个[组]数字后跟一段时间”,那么这更符合您的要求。
"(\\d+)(?:\\.\\d+)*\\.html$"
This way you get a number and not the dot. And none of the other patterns need to be captured, so they are not.
这样你得到一个数字,而不是点。并且不需要捕获任何其他模式,因此它们不是。
#4
0
groovy:000> p = java.util.regex.Pattern.compile("(\\d+).*")
===> (\d+).*
groovy:000> m = p.matcher("3481.7.1071.html")
===> java.util.regex.Matcher[pattern=(\d+).* region=0,16 lastmatch=]
groovy:000> m.find()
===> true
groovy:000> m.group(1)+".html"
===> 3481.html
groovy:000>
#5
0
Yes, you can.
是的你可以。
If 123.html
and 1.23html
and are valid, use this:
如果123.html和1.23html有效,请使用:
^(?:(\d+)\.).*?html$
If 123.html
is invalid but 1.23html
valid, use this:
如果123.html无效但1.23html有效,请使用以下命令:
^(?:(\d+)\.(?!h)).*?html$
If 123.html
and 1.23html
are invalid but only 1.23.html
valid, use this:
如果123.html和1.23html无效但只有1.23.html有效,请使用以下命令:
^(?:(\d+)\.).*?\.html$
#6
-1
jpalecek's solution fails; it captures the rightmost number. The original poster was a lot closer, but he got the right-most number. To get the left-most number, ignore anything after the first dot:
jpalecek的解决方案失败了;它捕获最右边的数字。原始海报更接近,但他得到了最正确的数字。要获得最左边的数字,请在第一个点后忽略任何内容:
[^\d]*(\d+)\..*html
[^\d]* ignores everything before the left-most number (so X1.html captures number 1) (\d+). captures the first digits, if they are followed by a dot. .* ignores everything between the dot and the final html.
[^ \ d] *忽略最左边的数字之前的所有内容(因此X1.html捕获数字1)(\ d +)。捕获第一个数字,如果它们后跟一个点。 。*忽略点和最终html之间的所有内容。
#1
7
You can just factor it out:
你可以把它考虑出来:
(\d+\.)(\d+\.)*html
#2
3
"^(\\d+)\\.(\\d+\\.)*html$"
#3
0
Java style: "(\\d+)\\..*?\\.html$"
Java风格:“(\\ d +)\\ .. *?\\。html $”
This will 1) grab the first group of consecutive digits, 2) require a dot after words, 3) jump over everything except 3) the literal string '.html'.
这将1)抓住第一组连续数字,2)在单词后需要一个点,3)跳过除了3)文字字符串'.html'之外的所有内容。
If you mean "one or more [groups] of numbers followed by a period" then this is more along the lines of your requirements.
如果您的意思是“一个或多个[组]数字后跟一段时间”,那么这更符合您的要求。
"(\\d+)(?:\\.\\d+)*\\.html$"
This way you get a number and not the dot. And none of the other patterns need to be captured, so they are not.
这样你得到一个数字,而不是点。并且不需要捕获任何其他模式,因此它们不是。
#4
0
groovy:000> p = java.util.regex.Pattern.compile("(\\d+).*")
===> (\d+).*
groovy:000> m = p.matcher("3481.7.1071.html")
===> java.util.regex.Matcher[pattern=(\d+).* region=0,16 lastmatch=]
groovy:000> m.find()
===> true
groovy:000> m.group(1)+".html"
===> 3481.html
groovy:000>
#5
0
Yes, you can.
是的你可以。
If 123.html
and 1.23html
and are valid, use this:
如果123.html和1.23html有效,请使用:
^(?:(\d+)\.).*?html$
If 123.html
is invalid but 1.23html
valid, use this:
如果123.html无效但1.23html有效,请使用以下命令:
^(?:(\d+)\.(?!h)).*?html$
If 123.html
and 1.23html
are invalid but only 1.23.html
valid, use this:
如果123.html和1.23html无效但只有1.23.html有效,请使用以下命令:
^(?:(\d+)\.).*?\.html$
#6
-1
jpalecek's solution fails; it captures the rightmost number. The original poster was a lot closer, but he got the right-most number. To get the left-most number, ignore anything after the first dot:
jpalecek的解决方案失败了;它捕获最右边的数字。原始海报更接近,但他得到了最正确的数字。要获得最左边的数字,请在第一个点后忽略任何内容:
[^\d]*(\d+)\..*html
[^\d]* ignores everything before the left-most number (so X1.html captures number 1) (\d+). captures the first digits, if they are followed by a dot. .* ignores everything between the dot and the final html.
[^ \ d] *忽略最左边的数字之前的所有内容(因此X1.html捕获数字1)(\ d +)。捕获第一个数字,如果它们后跟一个点。 。*忽略点和最终html之间的所有内容。