匹配regex中的第一个字符?

时间:2021-03-09 20:12:30

I have the following regex:

我有以下regex:

http://([^:]*):?([0-9]*)(/.*)

http://([^:]*):?([0 - 9]*)(/ . *)

When I match that against http://brandonhsiao.com/essays/showers.html, the parentheses grab: http://brandonhsiao.com/essays and /showers.html. How can I get it to grab http://brandonhsiao.com and /essays/showers.html?

当我将它与http://brandonhsiao.com/essays/showers.html进行比较时,括号内的内容是:http://brandonhsiao.com/essays和/showers.html。如何获取http://brandonhsiao.com和/essay /showers.html?

3 个解决方案

#1


3  

Put a question mark after the first * you have to make it non-greedy. Right now your code for matching the hostname is grabbing everything all the way up to the last /.

在第一个*后面加一个问号,你必须使它不贪婪。现在,与主机名匹配的代码一直到最后一个/。

http://([^:]*?):?([0-9]*)(/.*)

But that's not even what I would recommend. Try this instead:

但这并不是我所推荐的。试试这个:

(http://[^\s/]+)([^\s?#]*)

$1 should have http://brandonhsiao.com and $2 should have /essays/showers.html and any hash or query string is ignored.

$1应该有http://brandonhsiao.com, $2应该有/散文/淋浴。html和任何散列或查询字符串将被忽略。

Note that this is not designed to validate a URL, just to divide a URL up into the portion before the path, and the path itself. For example, it would happily accept invalid characters as part of the hostname. However, it does work fine for URLs with or without paths.

注意,这并不是为了验证URL而设计的,只是为了将URL分成路径之前的部分和路径本身。例如,它很乐意接受无效字符作为主机名的一部分。但是,无论有没有路径,它都可以很好地处理url。

P.S. I don't know exactly what you are doing with this in Lisp, so I have taken the liberty of only testing it in other PCRE-compatible environments. Usually I test my answers in the exact context where they will be used.

附注:我不知道您正在用Lisp做什么,所以我冒昧地只在其他兼容php程序的环境中测试它。通常我在使用答案的地方测试我的答案。

$_ = "http://brandonhsiao.com/essays/showers.html";
m|(http://[^\s/]+)([^\s?#]*)|;
print "1 = '$1' and 2 = '$2'\n";

# [j@5 ~]$ perl test2.pl
# 1 = 'http://brandonhsiao.com' and 2 = '/essays/showers.html'

#2


0  

http://([^/:]*):?([0-9]*)(/.*)

The first group is matching everything but : and now I added /, that's because the [^] operator means match everything but what's inside the group, everything else is just the same.

第一组是匹配除了:现在我添加/,这是因为[^]操作符意味着匹配除了里面有什么,其他都是一样的。

Hope it helped!

希望它帮助!

#3


0  

http:\/\/([^:]*?)(\/.*)

http:\ / \ /([^:]* ?)(\ /。*)

The *? is a non-greedy match to the first slash (the one just after .com)

* ?是第一个斜杠(在。com之后)的非贪婪匹配

See http://rubular.com/r/VmU2ghAX0k for match groups

请参阅http://rubular.com/r/VmU2ghAX0k以获得匹配组

#1


3  

Put a question mark after the first * you have to make it non-greedy. Right now your code for matching the hostname is grabbing everything all the way up to the last /.

在第一个*后面加一个问号,你必须使它不贪婪。现在,与主机名匹配的代码一直到最后一个/。

http://([^:]*?):?([0-9]*)(/.*)

But that's not even what I would recommend. Try this instead:

但这并不是我所推荐的。试试这个:

(http://[^\s/]+)([^\s?#]*)

$1 should have http://brandonhsiao.com and $2 should have /essays/showers.html and any hash or query string is ignored.

$1应该有http://brandonhsiao.com, $2应该有/散文/淋浴。html和任何散列或查询字符串将被忽略。

Note that this is not designed to validate a URL, just to divide a URL up into the portion before the path, and the path itself. For example, it would happily accept invalid characters as part of the hostname. However, it does work fine for URLs with or without paths.

注意,这并不是为了验证URL而设计的,只是为了将URL分成路径之前的部分和路径本身。例如,它很乐意接受无效字符作为主机名的一部分。但是,无论有没有路径,它都可以很好地处理url。

P.S. I don't know exactly what you are doing with this in Lisp, so I have taken the liberty of only testing it in other PCRE-compatible environments. Usually I test my answers in the exact context where they will be used.

附注:我不知道您正在用Lisp做什么,所以我冒昧地只在其他兼容php程序的环境中测试它。通常我在使用答案的地方测试我的答案。

$_ = "http://brandonhsiao.com/essays/showers.html";
m|(http://[^\s/]+)([^\s?#]*)|;
print "1 = '$1' and 2 = '$2'\n";

# [j@5 ~]$ perl test2.pl
# 1 = 'http://brandonhsiao.com' and 2 = '/essays/showers.html'

#2


0  

http://([^/:]*):?([0-9]*)(/.*)

The first group is matching everything but : and now I added /, that's because the [^] operator means match everything but what's inside the group, everything else is just the same.

第一组是匹配除了:现在我添加/,这是因为[^]操作符意味着匹配除了里面有什么,其他都是一样的。

Hope it helped!

希望它帮助!

#3


0  

http:\/\/([^:]*?)(\/.*)

http:\ / \ /([^:]* ?)(\ /。*)

The *? is a non-greedy match to the first slash (the one just after .com)

* ?是第一个斜杠(在。com之后)的非贪婪匹配

See http://rubular.com/r/VmU2ghAX0k for match groups

请参阅http://rubular.com/r/VmU2ghAX0k以获得匹配组