如何编写正则表达式从这些URL中提取数字?

时间:2022-09-13 11:10:52

I'm trying to write a regex to match the numbers in these URLs (12345678 and 1234567890).

我正在尝试编写一个正则表达式来匹配这些URL中的数字(12345678和1234567890)。

http://www.example.com/p/12345678
http://www.example.com/p/12345678?foo=bar
http://www.example.com/p/some-text-123/1234567890?foo=bar

Rules:

  • the numbers always come after a slash
  • 数字总是在斜线后出现

  • the numbers can be varying lengths
  • 数字可以是不同的长度

  • the regex must check that the URLs have /p/ in them
  • 正则表达式必须检查URL中是否有/ p /

  • the numbers may be at the end of the URL, or there could be variables after them
  • 数字可能位于URL的末尾,或者可能在它们之后有变量

My attempt:

\/p\/([0-9]+)

That matches the first and second, but not the third. So I tried:

这匹配第一和第二,但不是第三。所以我尝试过:

\/p\/[^\/?]*\/?([0-9]+)

No joy.

REGEX 101

5 个解决方案

#1


2  

Regex might not be the right tool for this job. It looks like in every case, splitting the URL with a URL parser would make more sense. From your examples, it appears that the number portion is always the last item in the path portion of the URL. I'm not sure what language you're using, but many languages offer functions that can parse URLs into their constituent parts.

正则表达式可能不适合这项工作。看起来在每种情况下,使用URL解析器拆分URL会更有意义。从您的示例中,数字部分似乎始终是URL路径部分中的最后一项。我不确定您使用的是哪种语言,但许多语言都提供了可以将URL解析为其组成部分的功能。

$path = parse_url($url, PHP_URL_PATH);
if(strpos($path, "/p/") === 0) {
    $base = basename($path);
} else {
    // error
}

Works every time, assuming $url is the string you are parsing.

每次都有效,假设$ url是您正在解析的字符串。

#2


1  

I extended your version, it now works with all examples:

我扩展了你的版本,它现在适用于所有的例子:

\/p\/(.+\/)*(\d+)(\?.+=.+(&.+=.+)*)?$

If you don't care that the URL is valid, you could shrink the regex to:

如果您不关心URL是否有效,则可以将正则表达式缩小为:

\/p\/(.+\/)*(\d+)($|\?)

https://regex101.com/r/pW5qB3/2

#3


0  

If I understand well, the digits you want can only be:

如果我理解得很好,你想要的数字只能是:

  • right after the last slash of the URL
  • 在URL的最后一次斜杠之后

  • cannot be part of the variables, ie /p/123?foo=bar456 matches 123 and
    /p/foobar?foo=bar456 matches nothing
  • 不能成为变量的一部分,即/ p / 123?foo = bar456匹配123和/ p / foobar?foo = bar456匹配什么

You can then use the following regex:

然后,您可以使用以下正则表达式:

(?=/p/).*/\K\d+

Explanation

(?=/p/)  # lookahead: check '/p/' is in the URL
.*/      # go to the last '/' thanks to greediness
\K       # leave everything we have so far out of the final match
\d+      # select the digits just after the last '/'

To avoid escaping forward slashes don't use them as regex delimiters: #(?=/p/).*/\K\d+# will do fine.

为避免转义正斜杠,请不要将它们用作正则表达式分隔符:#(?= / p /)。* / \ K \ d +#将正常运行。

See demo here.

在这里看演示。

#4


0  

\/p\/(?:.*\/)?(\d+)\b

You can try this.This will capture integers based on your coditons.See demo.Grab the capture or group.

你可以尝试这个。这将根据你的密码捕获整数。参见demo.Grab捕获或组。

https://regex101.com/r/dU7oN5/29

$re = "/\\/p\\/(?:.*\\/)?(\\d+)\\b/";
$str = "http://www.example.com/p/12345678\nhttp://www.example.com/p/12345678?foo=bar\nhttp://www.example.com/p/some-text-123/1234567890?foo=bar";

preg_match_all($re, $str, $matches);

#5


-2  

var regex = new Regex(@"/(?<ticket>\d+)");

var subject = "http://www.example.com/p/some-text-123/1234567890?foo=bar";

var ticket = regex.Match(subject).Groups["ticket"].Value;

Output: 1234567890

#1


2  

Regex might not be the right tool for this job. It looks like in every case, splitting the URL with a URL parser would make more sense. From your examples, it appears that the number portion is always the last item in the path portion of the URL. I'm not sure what language you're using, but many languages offer functions that can parse URLs into their constituent parts.

正则表达式可能不适合这项工作。看起来在每种情况下,使用URL解析器拆分URL会更有意义。从您的示例中,数字部分似乎始终是URL路径部分中的最后一项。我不确定您使用的是哪种语言,但许多语言都提供了可以将URL解析为其组成部分的功能。

$path = parse_url($url, PHP_URL_PATH);
if(strpos($path, "/p/") === 0) {
    $base = basename($path);
} else {
    // error
}

Works every time, assuming $url is the string you are parsing.

每次都有效,假设$ url是您正在解析的字符串。

#2


1  

I extended your version, it now works with all examples:

我扩展了你的版本,它现在适用于所有的例子:

\/p\/(.+\/)*(\d+)(\?.+=.+(&.+=.+)*)?$

If you don't care that the URL is valid, you could shrink the regex to:

如果您不关心URL是否有效,则可以将正则表达式缩小为:

\/p\/(.+\/)*(\d+)($|\?)

https://regex101.com/r/pW5qB3/2

#3


0  

If I understand well, the digits you want can only be:

如果我理解得很好,你想要的数字只能是:

  • right after the last slash of the URL
  • 在URL的最后一次斜杠之后

  • cannot be part of the variables, ie /p/123?foo=bar456 matches 123 and
    /p/foobar?foo=bar456 matches nothing
  • 不能成为变量的一部分,即/ p / 123?foo = bar456匹配123和/ p / foobar?foo = bar456匹配什么

You can then use the following regex:

然后,您可以使用以下正则表达式:

(?=/p/).*/\K\d+

Explanation

(?=/p/)  # lookahead: check '/p/' is in the URL
.*/      # go to the last '/' thanks to greediness
\K       # leave everything we have so far out of the final match
\d+      # select the digits just after the last '/'

To avoid escaping forward slashes don't use them as regex delimiters: #(?=/p/).*/\K\d+# will do fine.

为避免转义正斜杠,请不要将它们用作正则表达式分隔符:#(?= / p /)。* / \ K \ d +#将正常运行。

See demo here.

在这里看演示。

#4


0  

\/p\/(?:.*\/)?(\d+)\b

You can try this.This will capture integers based on your coditons.See demo.Grab the capture or group.

你可以尝试这个。这将根据你的密码捕获整数。参见demo.Grab捕获或组。

https://regex101.com/r/dU7oN5/29

$re = "/\\/p\\/(?:.*\\/)?(\\d+)\\b/";
$str = "http://www.example.com/p/12345678\nhttp://www.example.com/p/12345678?foo=bar\nhttp://www.example.com/p/some-text-123/1234567890?foo=bar";

preg_match_all($re, $str, $matches);

#5


-2  

var regex = new Regex(@"/(?<ticket>\d+)");

var subject = "http://www.example.com/p/some-text-123/1234567890?foo=bar";

var ticket = regex.Match(subject).Groups["ticket"].Value;

Output: 1234567890