通过php和regex从文本字符串中查找网址? [重复]

时间:2021-06-12 09:00:39

This question already has an answer here:

这个问题在这里已有答案:

I know the question title looks very repetitive. But some of the solution i did not find here.

我知道问题标题看起来非常重复。但是我在这里找不到一些解决方案。

I need to find urls form text string:

我需要找到url表单文本字符串:

$pattern = '`.*?((http|https)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i';

    if (preg_match_all($pattern,$url_string,$matches)) {
        print_r($matches[1]);
    }

using this pattern i was able to find urls with http:// and https:// which is okey. But i have user input where people add url like www.domain.com even domain.com

使用这种模式,我能够找到http://和https://的网址,这是okey。但我有用户输入,人们添加网址,如www.domain.com甚至domain.com

So, i need to validate the string first where i can replace www.domain.com domain.com with common protocol http:// before them. Or i need to comeup with more good pattern?

因此,我需要首先验证字符串,然后我可以使用通用协议http://替换www.domain.com domain.com。或者我需要提出更好的模式?

I am not good with regex and don't know what to do.

我对正则表达式不好,不知道该怎么做。

My idea is first finding the urls with http:// and https:// the put them in an array then replace these url with space(" ") in the text string then use other patterns for it. But i am not sure what pattern to use.

我的想法是首先找到带有http://和https://的网址,然后将它们放入数组中,然后在文本字符串中用空格(“”)替换这些网址,然后使用其他模式。但我不确定使用什么模式。

I am using this $url_string = preg_replace($pattern, ' ', $url_string ); but that removes if any www.domain.com or domain.com url between two valid url with http:// or https://

我正在使用$ url_string = preg_replace($ pattern,'',$ url_string);但如果任何www.domain.com或domain.com网址在http://或https://两个有效网址之间移除,则会删除

If you can help that will be great.

如果你能提供帮助就会很棒。

To make things more clear:

为了使事情更清楚:

i need a pattern or some other method where i can find all urls in a text sting. the example of url are:

我需要一个模式或其他方法,我可以在文本刺痛中找到所有网址。 url的例子是:

  1. domain.com
  2. www.domain.com
  3. http://www.domain.com
  4. http://domain.com
  5. https://www.domain.com
  6. https://domain.com

thanks! 5.

2 个解决方案

#1


3  

$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);

#2


0  

I'm not sure if I've understood what you need correctly, but can you use something like this:

我不确定我是否理解你需要的东西,但你可以使用这样的东西:

preg_match('#^.+?://#', $url);

to find if there is a protocol specified on the string, and if not just append http://

找出是否在字符串上指定了协议,如果不是只是附加http://

#1


3  

$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);

#2


0  

I'm not sure if I've understood what you need correctly, but can you use something like this:

我不确定我是否理解你需要的东西,但你可以使用这样的东西:

preg_match('#^.+?://#', $url);

to find if there is a protocol specified on the string, and if not just append http://

找出是否在字符串上指定了协议,如果不是只是附加http://