For a little while now I've been searching for a code to get URL's out of a string using PHP. I'm basically trying to get a Shortened URL out of a message, and then later do a HEAD request to find the actual link.
一段时间以来,我一直在寻找一个代码,使用PHP从字符串中获取URL。我基本上试图从消息中获取缩短的URL,然后再执行HEAD请求以查找实际链接。
Anyone have any code that returns URLs from strings?
任何人都有从字符串返回URL的代码?
Thanks in advance.
提前致谢。
Edit for Ghost Dog:
编辑鬼狗:
Here is a sample of what I am parsing:
以下是我正在解析的示例:
$test = "I am testing this application for http://test.com YAY!";
And here is the response I got that solved it:
以下是我得到的解决方案:
$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';
preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];
foreach($A as $B)
{
$URL = GetRealURL($B);
echo "$URL<BR>";
}
function GetRealURL( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "spider",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
return $header['url'];
}
See Answer for the Details.
请参阅答案了解详细信息。
2 个解决方案
#1
This code may be helpful (see MadTechie's latest post):
此代码可能会有所帮助(请参阅MadTechie的最新帖子):
http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218
<?php $string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988"; $regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i'; preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER); $A = $result[0]; foreach($A as $B) { $URL = GetRealURL($B); echo "$URL<BR>"; } function GetRealURL( $url ) { $options = array( CURLOPT_RETURNTRANSFER => true, CURLOPT_HEADER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_ENCODING => "", CURLOPT_USERAGENT => "spider", CURLOPT_AUTOREFERER => true, CURLOPT_CONNECTTIMEOUT => 120, CURLOPT_TIMEOUT => 120, CURLOPT_MAXREDIRS => 10, ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); return $header['url']; } ?>
#2
Something like:
$matches = array();
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
print_r($matches);
You'll need to tune the regexp to get exactly what you want.
您需要调整正则表达式以获得您想要的内容。
To get the URL out, consider something as simple as:
要获取URL,请考虑以下简单的事项:
curl -I http://url.com/path | grep Location: | awk '{print $2}'
curl -I http://url.com/path | grep位置:| awk'{print $ 2}'
#1
This code may be helpful (see MadTechie's latest post):
此代码可能会有所帮助(请参阅MadTechie的最新帖子):
http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218
<?php $string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988"; $regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i'; preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER); $A = $result[0]; foreach($A as $B) { $URL = GetRealURL($B); echo "$URL<BR>"; } function GetRealURL( $url ) { $options = array( CURLOPT_RETURNTRANSFER => true, CURLOPT_HEADER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_ENCODING => "", CURLOPT_USERAGENT => "spider", CURLOPT_AUTOREFERER => true, CURLOPT_CONNECTTIMEOUT => 120, CURLOPT_TIMEOUT => 120, CURLOPT_MAXREDIRS => 10, ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); return $header['url']; } ?>
#2
Something like:
$matches = array();
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
print_r($matches);
You'll need to tune the regexp to get exactly what you want.
您需要调整正则表达式以获得您想要的内容。
To get the URL out, consider something as simple as:
要获取URL,请考虑以下简单的事项:
curl -I http://url.com/path | grep Location: | awk '{print $2}'
curl -I http://url.com/path | grep位置:| awk'{print $ 2}'