How do I check if a URL exists (not 404) in PHP?
如何检查PHP中是否存在URL(而不是404)?
17 个解决方案
#1
239
Here:
在这里:
$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
From here and right below the above post, there's a curl solution:
从这里和右下方,有一个旋度解:
function url_exists($url) {
if (!$fp = curl_init($url)) return false;
return true;
}
#2
46
When figuring out if an url exists from php there are a few things to pay attention to:
在判断php是否存在url时,需要注意以下几点:
- Is the url itself valid (a string, not empty, good syntax), this is quick to check server side.
- url本身是有效的(一个字符串,不是空的,良好的语法),这是快速检查服务器端的。
- Waiting for a response might take time and block code execution.
- 等待响应可能需要时间和块代码执行。
- Not all headers returned by get_headers() are well formed.
- 并非所有get_headers()返回的头都是格式良好的。
- Use curl (if you can).
- 使用旋度(如果可以的话)。
- Prevent fetching the entire body/content, but only request the headers.
- 避免获取整个主体/内容,但只请求头部。
- Consider redirecting urls:
- Do you want the first code returned?
- 您想要返回第一个代码吗?
- Or follow all redirects and return the last code?
- 或者遵循所有重定向并返回最后的代码?
- You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.
- 您可能会得到一个200,但是它可以使用元标记或javascript重定向。弄清楚之后会发生什么是很困难的。
- 考虑重定向url:要返回第一个代码吗?或者遵循所有重定向并返回最后的代码?您可能会得到一个200,但是它可以使用元标记或javascript重定向。弄清楚之后会发生什么是很困难的。
Keep in mind that whatever method you use, it takes time to wait for a response.
All code might (and probably will) halt untill you either know the result or the requests have timed out.
记住,无论使用什么方法,都需要等待响应。在您知道结果或请求超时之前,所有代码都可能(也可能)停止。
For example: the code below could take a LONG time to display the page if the urls are invalid or unreachable:
例如:如果url无效或无法访问,下面的代码可能需要很长时间才能显示页面:
<?php
$urls = getUrls(); // some function getting say 10 or more external links
foreach($urls as $k=>$url){
// this could potentially take 0-30 seconds each
// (more or less depending on connection, target site, timeout settings...)
if( ! isValidUrl($url) ){
unset($urls[$k]);
}
}
echo "yay all done! now show my site";
foreach($urls as $url){
echo "<a href=\"{$url}\">{$url}</a><br/>";
}
The functions below could be helpfull, you probably want to modify them to suit your needs:
下面的功能可能会有帮助,您可能想要修改它们以适应您的需要:
function isValidUrl($url){
// first do some quick sanity checks:
if(!$url || !is_string($url)){
return false;
}
// quick check url is roughly a valid http request: ( http://blah/... )
if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
return false;
}
// the next bit could be slow:
if(getHttpResponseCode_using_curl($url) != 200){
// if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl
return false;
}
// all good!
return true;
}
function getHttpResponseCode_using_curl($url, $followredirects = true){
// returns int responsecode, or false (if url does not exist or connection timeout occurs)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$ch = @curl_init($url);
if($ch === false){
return false;
}
@curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers
@curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body
@curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!)
if($followredirects){
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
@curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true
}else{
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
}
// @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser
@curl_exec($ch);
if(@curl_errno($ch)){ // should be 0
@curl_close($ch);
return false;
}
$code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
@curl_close($ch);
return $code;
}
function getHttpResponseCode_using_getheaders($url, $followredirects = true){
// returns string responsecode, or false if no responsecode found in headers (or url does not exist)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$headers = @get_headers($url);
if($headers && is_array($headers)){
if($followredirects){
// we want the the last errorcode, reverse array so we start at the end:
$headers = array_reverse($headers);
}
foreach($headers as $hline){
// search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
// note that the exact syntax/version/output differs, so there is some string magic involved here
if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
$code = $matches[1];
return $code;
}
}
// no HTTP/xxx found in headers:
return false;
}
// no headers :
return false;
}
#3
45
$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;
so anytime you contact a website and get something else than 200 ok it will work
所以任何时候你联系一个网站,得到超过200的东西都可以
#4
15
you cannot use curl in certain servers u can use this code
您不能在某些服务器中使用curl,您可以使用此代码
<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
{
echo 'url exists';
}
else
{
echo 'url does not exist';
}
?>
#5
7
function URLIsValid($URL)
{
$exists = true;
$file_headers = @get_headers($URL);
$InvalidHeaders = array('404', '403', '500');
foreach($InvalidHeaders as $HeaderVal)
{
if(strstr($file_headers[0], $HeaderVal))
{
$exists = false;
break;
}
}
return $exists;
}
#6
6
$url = 'http://google.com';
$not_url = 'stp://google.com';
if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;
// Found 'http://google.com'!Can't find 'stp://google.com'.
#7
5
I use this function:
我使用这个函数:
/**
* @param $url
* @param array $options
* @return string
* @throws Exception
*/
function checkURL($url, array $options = array()) {
if (empty($url)) {
throw new Exception('URL is empty');
}
// list of HTTP status codes
$httpStatusCodes = array(
100 => 'Continue',
101 => 'Switching Protocols',
102 => 'Processing',
200 => 'OK',
201 => 'Created',
202 => 'Accepted',
203 => 'Non-Authoritative Information',
204 => 'No Content',
205 => 'Reset Content',
206 => 'Partial Content',
207 => 'Multi-Status',
208 => 'Already Reported',
226 => 'IM Used',
300 => 'Multiple Choices',
301 => 'Moved Permanently',
302 => 'Found',
303 => 'See Other',
304 => 'Not Modified',
305 => 'Use Proxy',
306 => 'Switch Proxy',
307 => 'Temporary Redirect',
308 => 'Permanent Redirect',
400 => 'Bad Request',
401 => 'Unauthorized',
402 => 'Payment Required',
403 => 'Forbidden',
404 => 'Not Found',
405 => 'Method Not Allowed',
406 => 'Not Acceptable',
407 => 'Proxy Authentication Required',
408 => 'Request Timeout',
409 => 'Conflict',
410 => 'Gone',
411 => 'Length Required',
412 => 'Precondition Failed',
413 => 'Payload Too Large',
414 => 'Request-URI Too Long',
415 => 'Unsupported Media Type',
416 => 'Requested Range Not Satisfiable',
417 => 'Expectation Failed',
418 => 'I\'m a teapot',
422 => 'Unprocessable Entity',
423 => 'Locked',
424 => 'Failed Dependency',
425 => 'Unordered Collection',
426 => 'Upgrade Required',
428 => 'Precondition Required',
429 => 'Too Many Requests',
431 => 'Request Header Fields Too Large',
449 => 'Retry With',
450 => 'Blocked by Windows Parental Controls',
500 => 'Internal Server Error',
501 => 'Not Implemented',
502 => 'Bad Gateway',
503 => 'Service Unavailable',
504 => 'Gateway Timeout',
505 => 'HTTP Version Not Supported',
506 => 'Variant Also Negotiates',
507 => 'Insufficient Storage',
508 => 'Loop Detected',
509 => 'Bandwidth Limit Exceeded',
510 => 'Not Extended',
511 => 'Network Authentication Required',
599 => 'Network Connect Timeout Error'
);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
if (isset($options['timeout'])) {
$timeout = (int) $options['timeout'];
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
}
curl_exec($ch);
$returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
} else {
return "'{$url}' does not exist";
}
}
#8
4
karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.
karim79的get_headers()解决方案对我不起作用,因为我得到了Pinterest的疯狂结果。
get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(): Failed to enable crypto
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
Anyway, this developer demonstrates that cURL is way faster than get_headers():
无论如何,这个开发人员演示了cURL比get_headers()快得多:
http://php.net/manual/fr/function.get-headers.php#104723
http://php.net/manual/fr/function.get-headers.php # 104723
Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.
由于很多人要求karim79修复cURL解决方案,所以我今天构建了这个解决方案。
/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_USERAGENT, true);
$headers = curl_exec($handle);
curl_close($handle);
if (empty($failCodeList) or !is_array($failCodeList)){
$failCodeList = array(404);
}
if (!empty($headers)){
$exists = true;
$headers = explode(PHP_EOL, $headers);
foreach($failCodeList as $code){
if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
$exists = false;
break;
}
}
}
}
return $exists;
}
Let me explains the curl options:
让我来解释旋度选项:
CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.
CURLOPT_RETURNTRANSFER:返回一个字符串,而不是在屏幕上显示调用页面。
CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate
CURLOPT_SSL_VERIFYPEER: cUrl不会签出证书
CURLOPT_HEADER: include the header in the string
CURLOPT_HEADER:在字符串中包含header
CURLOPT_NOBODY: don't include the body in the string
CURLOPT_NOBODY:不要将主体包含在字符串中。
CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)
CURLOPT_USERAGENT:有些站点需要它正常工作(例如:https://plus.google.com)
Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:
附加说明:在这个函数中,我使用Diego Perini的regex在发送请求之前验证URL:
const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)
附加说明2:我将标题字符串和用户标题[0]分开,以确保只验证返回代码和消息(例如:200、404、405等等)。
Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.
附加说明3:有时仅验证代码404是不够的(请参阅单元测试),因此有一个可选的$failCodeList参数来提供要拒绝的所有代码列表。
And, of course, here's the unit test (including all the popular social network) to legitimates my coding:
当然,这里有单元测试(包括所有流行的社交网络)来合法化我的代码:
public function testIsUrlExists(){
//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));
$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}
Great success to all,
巨大的成功,
Jonathan Parent-Lévesque from Montreal
乔纳森Parent-Levesque从蒙特利尔
#9
3
pretty fast:
很快:
function http_response($url){
$resURL = curl_init();
curl_setopt($resURL, CURLOPT_URL, $url);
curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback');
curl_setopt($resURL, CURLOPT_FAILONERROR, 1);
curl_exec ($resURL);
$intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE);
curl_close ($resURL);
if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}
echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');
#10
2
function urlIsOk($url)
{
$headers = @get_headers($url);
$httpStatus = intval(substr($headers[0], 9, 3));
if ($httpStatus<400)
{
return true;
}
return false;
}
#11
2
All above solutions + extra sugar. (Ultimate AIO solution)
以上所有的溶液+额外的糖。(最终AIO解决方案)
/**
* Check that given URL is valid and exists.
* @param string $url URL to check
* @return bool TRUE when valid | FALSE anyway
*/
function urlExists ( $url ) {
// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);
// Validate URI
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
// check only for http/https schemes.
|| !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
) {
return false;
}
// Check that URL exists
$file_headers = @get_headers($url);
return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}
Example:
例子:
var_dump ( urlExists('http://*.com/') );
// Output: true;
#12
1
Here is a solution that reads only the first byte of source code... returning false if the file_get_contents fails... This will also work for remote files like images.
这里有一个只读取源代码第一个字节的解决方案……如果file_get_contents失败,则返回false…这也适用于像映像这样的远程文件。
function urlExists($url)
{
if (@file_get_contents($url,false,NULL,0,1))
{
return true;
}
return false;
}
#13
1
to check if url is online or offline ---
检查url是联机的还是脱机的
function get_http_response_code($theURL) {
$headers = @get_headers($theURL);
return substr($headers[0], 9, 3);
}
#14
0
the simple way is curl (and FASTER too)
简单的方法是卷曲(而且更快)
<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr, CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);
if ($ht == '404')
{ echo 'OK';}
else { echo 'NO';}
?>
#15
0
Other way to check if a URL is valid or not can be:
检查URL是否有效的其他方法可以是:
<?php
if (isValidURL("http://www.gimepix.com")) {
echo "URL is valid...";
} else {
echo "URL is not valid...";
}
function isValidURL($url) {
$file_headers = @get_headers($url);
if (strpos($file_headers[0], "200 OK") > 0) {
return true;
} else {
return false;
}
}
?>
#16
0
get_headers() returns an array with the headers sent by the server in response to a HTTP request.
get_headers()返回一个数组,其中包含服务器在响应HTTP请求时发送的报头。
$image_path = 'https://your-domain.com/assets/img/image.jpg';
$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers);
if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
echo 'Failed because path does not exist.</br>';
}else{
echo 'It works. Your good to go!</br>';
}
#17
0
function url_exists($url) {
$headers = @get_headers($url);
return (strpos($headers[0],'200')===false)? false:true;
}
#1
239
Here:
在这里:
$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
From here and right below the above post, there's a curl solution:
从这里和右下方,有一个旋度解:
function url_exists($url) {
if (!$fp = curl_init($url)) return false;
return true;
}
#2
46
When figuring out if an url exists from php there are a few things to pay attention to:
在判断php是否存在url时,需要注意以下几点:
- Is the url itself valid (a string, not empty, good syntax), this is quick to check server side.
- url本身是有效的(一个字符串,不是空的,良好的语法),这是快速检查服务器端的。
- Waiting for a response might take time and block code execution.
- 等待响应可能需要时间和块代码执行。
- Not all headers returned by get_headers() are well formed.
- 并非所有get_headers()返回的头都是格式良好的。
- Use curl (if you can).
- 使用旋度(如果可以的话)。
- Prevent fetching the entire body/content, but only request the headers.
- 避免获取整个主体/内容,但只请求头部。
- Consider redirecting urls:
- Do you want the first code returned?
- 您想要返回第一个代码吗?
- Or follow all redirects and return the last code?
- 或者遵循所有重定向并返回最后的代码?
- You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.
- 您可能会得到一个200,但是它可以使用元标记或javascript重定向。弄清楚之后会发生什么是很困难的。
- 考虑重定向url:要返回第一个代码吗?或者遵循所有重定向并返回最后的代码?您可能会得到一个200,但是它可以使用元标记或javascript重定向。弄清楚之后会发生什么是很困难的。
Keep in mind that whatever method you use, it takes time to wait for a response.
All code might (and probably will) halt untill you either know the result or the requests have timed out.
记住,无论使用什么方法,都需要等待响应。在您知道结果或请求超时之前,所有代码都可能(也可能)停止。
For example: the code below could take a LONG time to display the page if the urls are invalid or unreachable:
例如:如果url无效或无法访问,下面的代码可能需要很长时间才能显示页面:
<?php
$urls = getUrls(); // some function getting say 10 or more external links
foreach($urls as $k=>$url){
// this could potentially take 0-30 seconds each
// (more or less depending on connection, target site, timeout settings...)
if( ! isValidUrl($url) ){
unset($urls[$k]);
}
}
echo "yay all done! now show my site";
foreach($urls as $url){
echo "<a href=\"{$url}\">{$url}</a><br/>";
}
The functions below could be helpfull, you probably want to modify them to suit your needs:
下面的功能可能会有帮助,您可能想要修改它们以适应您的需要:
function isValidUrl($url){
// first do some quick sanity checks:
if(!$url || !is_string($url)){
return false;
}
// quick check url is roughly a valid http request: ( http://blah/... )
if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
return false;
}
// the next bit could be slow:
if(getHttpResponseCode_using_curl($url) != 200){
// if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl
return false;
}
// all good!
return true;
}
function getHttpResponseCode_using_curl($url, $followredirects = true){
// returns int responsecode, or false (if url does not exist or connection timeout occurs)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$ch = @curl_init($url);
if($ch === false){
return false;
}
@curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers
@curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body
@curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!)
if($followredirects){
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
@curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true
}else{
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
}
// @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result
// @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser
@curl_exec($ch);
if(@curl_errno($ch)){ // should be 0
@curl_close($ch);
return false;
}
$code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
@curl_close($ch);
return $code;
}
function getHttpResponseCode_using_getheaders($url, $followredirects = true){
// returns string responsecode, or false if no responsecode found in headers (or url does not exist)
// NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
// if $followredirects == false: return the FIRST known httpcode (ignore redirects)
// if $followredirects == true : return the LAST known httpcode (when redirected)
if(! $url || ! is_string($url)){
return false;
}
$headers = @get_headers($url);
if($headers && is_array($headers)){
if($followredirects){
// we want the the last errorcode, reverse array so we start at the end:
$headers = array_reverse($headers);
}
foreach($headers as $hline){
// search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
// note that the exact syntax/version/output differs, so there is some string magic involved here
if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
$code = $matches[1];
return $code;
}
}
// no HTTP/xxx found in headers:
return false;
}
// no headers :
return false;
}
#3
45
$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;
so anytime you contact a website and get something else than 200 ok it will work
所以任何时候你联系一个网站,得到超过200的东西都可以
#4
15
you cannot use curl in certain servers u can use this code
您不能在某些服务器中使用curl,您可以使用此代码
<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
{
echo 'url exists';
}
else
{
echo 'url does not exist';
}
?>
#5
7
function URLIsValid($URL)
{
$exists = true;
$file_headers = @get_headers($URL);
$InvalidHeaders = array('404', '403', '500');
foreach($InvalidHeaders as $HeaderVal)
{
if(strstr($file_headers[0], $HeaderVal))
{
$exists = false;
break;
}
}
return $exists;
}
#6
6
$url = 'http://google.com';
$not_url = 'stp://google.com';
if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;
// Found 'http://google.com'!Can't find 'stp://google.com'.
#7
5
I use this function:
我使用这个函数:
/**
* @param $url
* @param array $options
* @return string
* @throws Exception
*/
function checkURL($url, array $options = array()) {
if (empty($url)) {
throw new Exception('URL is empty');
}
// list of HTTP status codes
$httpStatusCodes = array(
100 => 'Continue',
101 => 'Switching Protocols',
102 => 'Processing',
200 => 'OK',
201 => 'Created',
202 => 'Accepted',
203 => 'Non-Authoritative Information',
204 => 'No Content',
205 => 'Reset Content',
206 => 'Partial Content',
207 => 'Multi-Status',
208 => 'Already Reported',
226 => 'IM Used',
300 => 'Multiple Choices',
301 => 'Moved Permanently',
302 => 'Found',
303 => 'See Other',
304 => 'Not Modified',
305 => 'Use Proxy',
306 => 'Switch Proxy',
307 => 'Temporary Redirect',
308 => 'Permanent Redirect',
400 => 'Bad Request',
401 => 'Unauthorized',
402 => 'Payment Required',
403 => 'Forbidden',
404 => 'Not Found',
405 => 'Method Not Allowed',
406 => 'Not Acceptable',
407 => 'Proxy Authentication Required',
408 => 'Request Timeout',
409 => 'Conflict',
410 => 'Gone',
411 => 'Length Required',
412 => 'Precondition Failed',
413 => 'Payload Too Large',
414 => 'Request-URI Too Long',
415 => 'Unsupported Media Type',
416 => 'Requested Range Not Satisfiable',
417 => 'Expectation Failed',
418 => 'I\'m a teapot',
422 => 'Unprocessable Entity',
423 => 'Locked',
424 => 'Failed Dependency',
425 => 'Unordered Collection',
426 => 'Upgrade Required',
428 => 'Precondition Required',
429 => 'Too Many Requests',
431 => 'Request Header Fields Too Large',
449 => 'Retry With',
450 => 'Blocked by Windows Parental Controls',
500 => 'Internal Server Error',
501 => 'Not Implemented',
502 => 'Bad Gateway',
503 => 'Service Unavailable',
504 => 'Gateway Timeout',
505 => 'HTTP Version Not Supported',
506 => 'Variant Also Negotiates',
507 => 'Insufficient Storage',
508 => 'Loop Detected',
509 => 'Bandwidth Limit Exceeded',
510 => 'Not Extended',
511 => 'Network Authentication Required',
599 => 'Network Connect Timeout Error'
);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
if (isset($options['timeout'])) {
$timeout = (int) $options['timeout'];
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
}
curl_exec($ch);
$returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
} else {
return "'{$url}' does not exist";
}
}
#8
4
karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.
karim79的get_headers()解决方案对我不起作用,因为我得到了Pinterest的疯狂结果。
get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(): Failed to enable crypto
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
Anyway, this developer demonstrates that cURL is way faster than get_headers():
无论如何,这个开发人员演示了cURL比get_headers()快得多:
http://php.net/manual/fr/function.get-headers.php#104723
http://php.net/manual/fr/function.get-headers.php # 104723
Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.
由于很多人要求karim79修复cURL解决方案,所以我今天构建了这个解决方案。
/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_USERAGENT, true);
$headers = curl_exec($handle);
curl_close($handle);
if (empty($failCodeList) or !is_array($failCodeList)){
$failCodeList = array(404);
}
if (!empty($headers)){
$exists = true;
$headers = explode(PHP_EOL, $headers);
foreach($failCodeList as $code){
if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
$exists = false;
break;
}
}
}
}
return $exists;
}
Let me explains the curl options:
让我来解释旋度选项:
CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.
CURLOPT_RETURNTRANSFER:返回一个字符串,而不是在屏幕上显示调用页面。
CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate
CURLOPT_SSL_VERIFYPEER: cUrl不会签出证书
CURLOPT_HEADER: include the header in the string
CURLOPT_HEADER:在字符串中包含header
CURLOPT_NOBODY: don't include the body in the string
CURLOPT_NOBODY:不要将主体包含在字符串中。
CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)
CURLOPT_USERAGENT:有些站点需要它正常工作(例如:https://plus.google.com)
Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:
附加说明:在这个函数中,我使用Diego Perini的regex在发送请求之前验证URL:
const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)
附加说明2:我将标题字符串和用户标题[0]分开,以确保只验证返回代码和消息(例如:200、404、405等等)。
Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.
附加说明3:有时仅验证代码404是不够的(请参阅单元测试),因此有一个可选的$failCodeList参数来提供要拒绝的所有代码列表。
And, of course, here's the unit test (including all the popular social network) to legitimates my coding:
当然,这里有单元测试(包括所有流行的社交网络)来合法化我的代码:
public function testIsUrlExists(){
//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));
$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}
Great success to all,
巨大的成功,
Jonathan Parent-Lévesque from Montreal
乔纳森Parent-Levesque从蒙特利尔
#9
3
pretty fast:
很快:
function http_response($url){
$resURL = curl_init();
curl_setopt($resURL, CURLOPT_URL, $url);
curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback');
curl_setopt($resURL, CURLOPT_FAILONERROR, 1);
curl_exec ($resURL);
$intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE);
curl_close ($resURL);
if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}
echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');
#10
2
function urlIsOk($url)
{
$headers = @get_headers($url);
$httpStatus = intval(substr($headers[0], 9, 3));
if ($httpStatus<400)
{
return true;
}
return false;
}
#11
2
All above solutions + extra sugar. (Ultimate AIO solution)
以上所有的溶液+额外的糖。(最终AIO解决方案)
/**
* Check that given URL is valid and exists.
* @param string $url URL to check
* @return bool TRUE when valid | FALSE anyway
*/
function urlExists ( $url ) {
// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);
// Validate URI
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
// check only for http/https schemes.
|| !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
) {
return false;
}
// Check that URL exists
$file_headers = @get_headers($url);
return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}
Example:
例子:
var_dump ( urlExists('http://*.com/') );
// Output: true;
#12
1
Here is a solution that reads only the first byte of source code... returning false if the file_get_contents fails... This will also work for remote files like images.
这里有一个只读取源代码第一个字节的解决方案……如果file_get_contents失败,则返回false…这也适用于像映像这样的远程文件。
function urlExists($url)
{
if (@file_get_contents($url,false,NULL,0,1))
{
return true;
}
return false;
}
#13
1
to check if url is online or offline ---
检查url是联机的还是脱机的
function get_http_response_code($theURL) {
$headers = @get_headers($theURL);
return substr($headers[0], 9, 3);
}
#14
0
the simple way is curl (and FASTER too)
简单的方法是卷曲(而且更快)
<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr, CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);
if ($ht == '404')
{ echo 'OK';}
else { echo 'NO';}
?>
#15
0
Other way to check if a URL is valid or not can be:
检查URL是否有效的其他方法可以是:
<?php
if (isValidURL("http://www.gimepix.com")) {
echo "URL is valid...";
} else {
echo "URL is not valid...";
}
function isValidURL($url) {
$file_headers = @get_headers($url);
if (strpos($file_headers[0], "200 OK") > 0) {
return true;
} else {
return false;
}
}
?>
#16
0
get_headers() returns an array with the headers sent by the server in response to a HTTP request.
get_headers()返回一个数组,其中包含服务器在响应HTTP请求时发送的报头。
$image_path = 'https://your-domain.com/assets/img/image.jpg';
$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers);
if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
echo 'Failed because path does not exist.</br>';
}else{
echo 'It works. Your good to go!</br>';
}
#17
0
function url_exists($url) {
$headers = @get_headers($url);
return (strpos($headers[0],'200')===false)? false:true;
}