从String获取URL

时间:2023-02-05 20:12:18

For a little while now I've been searching for a code to get URL's out of a string using PHP. I'm basically trying to get a Shortened URL out of a message, and then later do a HEAD request to find the actual link.

一段时间以来,我一直在寻找一个代码,使用PHP从字符串中获取URL。我基本上试图从消息中获取缩短的URL,然后再执行HEAD请求以查找实际链接。

Anyone have any code that returns URLs from strings?

任何人都有从字符串返回URL的代码?

Thanks in advance.

提前致谢。

Edit for Ghost Dog:

编辑鬼狗:

Here is a sample of what I am parsing:

以下是我正在解析的示例:

$test = "I am testing this application for http://test.com YAY!";

And here is the response I got that solved it:

以下是我得到的解决方案:

$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';

preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];

foreach($A as $B)
{
    $URL = GetRealURL($B);
    echo "$URL<BR>";    
}


function GetRealURL( $url ) 
{ 
    $options = array(
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADER         => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_USERAGENT      => "spider",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10,
    ); 

    $ch      = curl_init( $url ); 
    curl_setopt_array( $ch, $options ); 
    $content = curl_exec( $ch ); 
    $err     = curl_errno( $ch ); 
    $errmsg  = curl_error( $ch ); 
    $header  = curl_getinfo( $ch ); 
    curl_close( $ch ); 
    return $header['url']; 
} 

See Answer for the Details.

请参阅答案了解详细信息。

2 个解决方案

#1


This code may be helpful (see MadTechie's latest post):

此代码可能会有所帮助(请参阅MadTechie的最新帖子):

http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218

<?php
$string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988";

$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';

preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];

foreach($A as $B)
{
   $URL = GetRealURL($B);
   echo "$URL<BR>";   
}


function GetRealURL( $url ) 
{ 
   $options = array(
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_HEADER         => true,
      CURLOPT_FOLLOWLOCATION => true,
      CURLOPT_ENCODING       => "",
      CURLOPT_USERAGENT      => "spider",
      CURLOPT_AUTOREFERER    => true,
      CURLOPT_CONNECTTIMEOUT => 120,
      CURLOPT_TIMEOUT        => 120,
      CURLOPT_MAXREDIRS      => 10,
   ); 

   $ch      = curl_init( $url ); 
   curl_setopt_array( $ch, $options ); 
   $content = curl_exec( $ch ); 
   $err     = curl_errno( $ch ); 
   $errmsg  = curl_error( $ch ); 
   $header  = curl_getinfo( $ch ); 
   curl_close( $ch ); 
   return $header['url']; 
}  

?>

#2


Something like:

$matches = array();
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
print_r($matches);

You'll need to tune the regexp to get exactly what you want.

您需要调整正则表达式以获得您想要的内容。

To get the URL out, consider something as simple as:

要获取URL,请考虑以下简单的事项:

curl -I http://url.com/path | grep Location: | awk '{print $2}'

curl -I http://url.com/path | grep位置:| awk'{print $ 2}'

#1


This code may be helpful (see MadTechie's latest post):

此代码可能会有所帮助(请参阅MadTechie的最新帖子):

http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218

<?php
$string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988";

$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';

preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];

foreach($A as $B)
{
   $URL = GetRealURL($B);
   echo "$URL<BR>";   
}


function GetRealURL( $url ) 
{ 
   $options = array(
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_HEADER         => true,
      CURLOPT_FOLLOWLOCATION => true,
      CURLOPT_ENCODING       => "",
      CURLOPT_USERAGENT      => "spider",
      CURLOPT_AUTOREFERER    => true,
      CURLOPT_CONNECTTIMEOUT => 120,
      CURLOPT_TIMEOUT        => 120,
      CURLOPT_MAXREDIRS      => 10,
   ); 

   $ch      = curl_init( $url ); 
   curl_setopt_array( $ch, $options ); 
   $content = curl_exec( $ch ); 
   $err     = curl_errno( $ch ); 
   $errmsg  = curl_error( $ch ); 
   $header  = curl_getinfo( $ch ); 
   curl_close( $ch ); 
   return $header['url']; 
}  

?>

#2


Something like:

$matches = array();
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
print_r($matches);

You'll need to tune the regexp to get exactly what you want.

您需要调整正则表达式以获得您想要的内容。

To get the URL out, consider something as simple as:

要获取URL,请考虑以下简单的事项:

curl -I http://url.com/path | grep Location: | awk '{print $2}'

curl -I http://url.com/path | grep位置:| awk'{print $ 2}'