I'm teaching myself some basic scraping and I've found that sometimes the URL's that I feed into my code return 404, which gums up all the rest of my code.
我教自己一些基本的抓取,我发现有时候我将URL发送到我的代码返回404,这将破坏我所有的代码。
So I need a test at the top of the code to check if the URL returns 404 or not.
因此,我需要在代码的顶部进行测试,以检查URL是否返回404。
This would seem like a pretty straightfoward task, but Google's not giving me any answers. I worry I'm searching for the wrong stuff.
这看起来很简单,但是谷歌没有给我任何答案。我担心我在寻找错误的东西。
One blog recommended I use this:
有一个博客建议我这样做:
$valid = @fsockopen($url, 80, $errno, $errstr, 30);
and then test to see if $valid if empty or not.
然后测试$是否为空。
But I think the URL that's giving me problems has a redirect on it, so $valid is coming up empty for all values. Or perhaps I'm doing something else wrong.
但是我认为给我问题的URL有一个重定向,所以$valid对所有值都是空的。或者我做错了什么。
I've also looked into a "head request" but I've yet to find any actual code examples I can play with or try out.
我也研究了“head request”,但是我还没有找到任何实际的代码示例,我可以使用它们或尝试它们。
Suggestions? And what's this about curl?
建议吗?关于旋度是什么?
14 个解决方案
#1
246
If you are using PHP's curl
bindings, you can check the error code using curl_getinfo
as such:
如果您正在使用PHP的curl绑定,您可以使用curl_getinfo来检查错误代码:
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
/* Handle 404 here. */
}
curl_close($handle);
/* Handle $response here. */
#2
92
If your running php5 you can use:
如果您正在运行php5,您可以使用:
$url = 'http://www.example.com';
print_r(get_headers($url, 1));
Alternatively with php4 a user has contributed the following:
另一种方法是使用php4用户提供以下内容:
/**
This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.
Features:
- supports (and requires) full URLs.
- supports changing of default port in URL.
- stops downloading from socket as soon as end-of-headers is detected.
Limitations:
- only gets the root URL (see line with "GET / HTTP/1.1").
- don't support HTTPS (nor the default HTTPS port).
*/
if(!function_exists('get_headers'))
{
function get_headers($url,$format=0)
{
$url=parse_url($url);
$end = "\r\n\r\n";
$fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);
if ($fp)
{
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: ".$url['host']."\r\n";
$out .= "Connection: Close\r\n\r\n";
$var = '';
fwrite($fp, $out);
while (!feof($fp))
{
$var.=fgets($fp, 1280);
if(strpos($var,$end))
break;
}
fclose($fp);
$var=preg_replace("/\r\n\r\n.*\$/",'',$var);
$var=explode("\r\n",$var);
if($format)
{
foreach($var as $i)
{
if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))
$v[$parts[1]]=$parts[2];
}
return $v;
}
else
return $var;
}
}
}
Both would have a result similar to:
两者的结果都类似:
Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
[ETag] => "3f80f-1b6-3e1cb03b"
[Accept-Ranges] => bytes
[Content-Length] => 438
[Connection] => close
[Content-Type] => text/html
)
Therefore you could just check to see that the header response was OK eg:
因此你可以检查一下头的反应是否正常例如:
$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.1 200 OK') {
//valid
}
if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {
//moved or redirect page
}
W3C规范和定义
#3
31
With strager's code, you can also check the CURLINFO_HTTP_CODE for other codes. Some websites do not report a 404, rather they simply redirect to a custom 404 page and return 302 (redirect) or something similar. I used this to check if an actual file (eg. robots.txt) existed on the server or not. Clearly this kind of file would not cause a redirect if it existed, but if it didn't it would redirect to a 404 page, which as I said before may not have a 404 code.
使用strager的代码,您还可以检查CURLINFO_HTTP_CODE以获取其他代码。有些网站不报告404页面,而是简单地重定向到自定义404页面并返回302(重定向)或类似的东西。我用这个来检查一个实际的文件。在服务器上存在或不存在。显然,如果存在这种文件,它不会导致重定向,但如果不存在,它会重定向到404页面,正如我之前所说,404页面可能没有404代码。
function is_404($url) {
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
/* If the document has loaded successfully without any redirection or error */
if ($httpCode >= 200 && $httpCode < 300) {
return false;
} else {
return true;
}
}
#4
20
As strager suggests, look into using cURL. You may also be interested in setting CURLOPT_NOBODY with curl_setopt to skip downloading the whole page (you just want the headers).
正如strager所建议的,考虑使用旋度。您可能还想用CURLOPT_NOBODY设置curl_setopt来跳过下载整个页面(您只需要标题)。
#5
15
If you are looking for an easiest solution and the one you can try in one go on php5 do
如果你正在寻找一个最简单的解决方案,你可以尝试一下php5
file_get_contents('www.yoursite.com');
//and check by echoing
echo $http_response_header[0];
#6
6
I found this answer here:
我在这里找到了答案:
if(($twitter_XML_raw=file_get_contents($timeline))==false){
// Retrieve HTTP status code
list($version,$status_code,$msg) = explode(' ',$http_response_header[0], 3);
// Check the HTTP Status code
switch($status_code) {
case 200:
$error_status="200: Success";
break;
case 401:
$error_status="401: Login failure. Try logging out and back in. Password are ONLY used when posting.";
break;
case 400:
$error_status="400: Invalid request. You may have exceeded your rate limit.";
break;
case 404:
$error_status="404: Not found. This shouldn't happen. Please let me know what happened using the feedback link above.";
break;
case 500:
$error_status="500: Twitter servers replied with an error. Hopefully they'll be OK soon!";
break;
case 502:
$error_status="502: Twitter servers may be down or being upgraded. Hopefully they'll be OK soon!";
break;
case 503:
$error_status="503: Twitter service unavailable. Hopefully they'll be OK soon!";
break;
default:
$error_status="Undocumented error: " . $status_code;
break;
}
Essentially, you use the "file get contents" method to retrieve the URL, which automatically populates the http response header variable with the status code.
本质上,您可以使用“file get contents”方法来检索URL,该URL将自动使用状态代码填充http响应头变量。
#7
3
addendum;tested those 3 methods considering performance.
附录;测试了这三种方法的性能。
The result, at least in my testing environment:
结果,至少在我的测试环境中:
Curl wins
旋度赢得
This test is done under the consideration that only the headers (noBody) is needed. Test yourself:
这个测试是在只需要报头(没有人)的情况下进行的。测试自己:
$url = "http://de.wikipedia.org/wiki/Pinocchio";
$start_time = microtime(TRUE);
$headers = get_headers($url);
echo $headers[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
$start_time = microtime(TRUE);
$response = file_get_contents($url);
echo $http_response_header[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
$start_time = microtime(TRUE);
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle, CURLOPT_NOBODY, 1); // and *only* get the header
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
// if($httpCode == 404) {
// /* Handle 404 here. */
// }
echo $httpCode."<br>";
curl_close($handle);
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
#8
2
As an additional hint to the great accepted answer:
作为一个额外的暗示,伟大接受的回答:
When using a variation of the proposed solution, I got errors because of php setting 'max_execution_time'. So what I did was the following:
当使用提议的解决方案的变体时,由于php设置了“max_execution_time”,所以出现了错误。我所做的是:
set_time_limit(120);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
$result = curl_exec($curl);
set_time_limit(ini_get('max_execution_time'));
curl_close($curl);
First I set the time limit to a higher number of seconds, in the end I set it back to the value defined in the php settings.
首先我将时间限制设置为更高的秒数,最后我将它设置为php设置中定义的值。
#9
1
You can use this code too, to see the status of any link:
您也可以使用此代码查看任何链接的状态:
<?php
function get_url_status($url, $timeout = 10)
{
$ch = curl_init();
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE); // find HTTP status
curl_close($ch); // close handle
echo $status; //or return $status;
//example checking
if ($status == '302') { echo 'HEY, redirection';}
}
get_url_status('http://yourpage.comm');
?>
#10
1
<?php
$url= 'www.something.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.4");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
echo $httpcode;
?>
#11
1
Here is a short solution.
这里有一个简短的解决方案。
$handle = curl_init($uri);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle,CURLOPT_HTTPHEADER,array ("Accept: application/rdf+xml"));
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 200||$httpCode == 303)
{
echo "you might get a reply";
}
curl_close($handle);
In your case, you can change application/rdf+xml
to whatever you use.
在您的示例中,可以将应用程序/rdf+xml更改为您使用的任何内容。
#12
0
this is just and slice of code, hope works for you
这只是一小段代码,希望对你有用
$ch = @curl_init();
@curl_setopt($ch, CURLOPT_URL, 'http://example.com');
@curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
@curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$response = @curl_exec($ch);
$errno = @curl_errno($ch);
$error = @curl_error($ch);
$response = $response;
$info = @curl_getinfo($ch);
return $info['http_code'];
#13
0
To catch all errors : 4XX and 5XX, i use this little script :
为了捕获所有错误:4XX和5XX,我使用这个小脚本:
function URLIsValid($URL){
$headers = @get_headers($URL);
preg_match("/ [45][0-9]{2} /", (string)$headers[0] , $match);
return count($match) === 0;
}
#14
0
This will give you true if url does not return 200 OK
如果url不返回200 OK,则返回true
function check_404($url) {
$headers=get_headers($url, 1);
if ($headers[0]!='HTTP/1.1 200 OK') return true; else return false;
}
#1
246
If you are using PHP's curl
bindings, you can check the error code using curl_getinfo
as such:
如果您正在使用PHP的curl绑定,您可以使用curl_getinfo来检查错误代码:
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
/* Handle 404 here. */
}
curl_close($handle);
/* Handle $response here. */
#2
92
If your running php5 you can use:
如果您正在运行php5,您可以使用:
$url = 'http://www.example.com';
print_r(get_headers($url, 1));
Alternatively with php4 a user has contributed the following:
另一种方法是使用php4用户提供以下内容:
/**
This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.
Features:
- supports (and requires) full URLs.
- supports changing of default port in URL.
- stops downloading from socket as soon as end-of-headers is detected.
Limitations:
- only gets the root URL (see line with "GET / HTTP/1.1").
- don't support HTTPS (nor the default HTTPS port).
*/
if(!function_exists('get_headers'))
{
function get_headers($url,$format=0)
{
$url=parse_url($url);
$end = "\r\n\r\n";
$fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);
if ($fp)
{
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: ".$url['host']."\r\n";
$out .= "Connection: Close\r\n\r\n";
$var = '';
fwrite($fp, $out);
while (!feof($fp))
{
$var.=fgets($fp, 1280);
if(strpos($var,$end))
break;
}
fclose($fp);
$var=preg_replace("/\r\n\r\n.*\$/",'',$var);
$var=explode("\r\n",$var);
if($format)
{
foreach($var as $i)
{
if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))
$v[$parts[1]]=$parts[2];
}
return $v;
}
else
return $var;
}
}
}
Both would have a result similar to:
两者的结果都类似:
Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
[ETag] => "3f80f-1b6-3e1cb03b"
[Accept-Ranges] => bytes
[Content-Length] => 438
[Connection] => close
[Content-Type] => text/html
)
Therefore you could just check to see that the header response was OK eg:
因此你可以检查一下头的反应是否正常例如:
$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.1 200 OK') {
//valid
}
if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {
//moved or redirect page
}
W3C规范和定义
#3
31
With strager's code, you can also check the CURLINFO_HTTP_CODE for other codes. Some websites do not report a 404, rather they simply redirect to a custom 404 page and return 302 (redirect) or something similar. I used this to check if an actual file (eg. robots.txt) existed on the server or not. Clearly this kind of file would not cause a redirect if it existed, but if it didn't it would redirect to a 404 page, which as I said before may not have a 404 code.
使用strager的代码,您还可以检查CURLINFO_HTTP_CODE以获取其他代码。有些网站不报告404页面,而是简单地重定向到自定义404页面并返回302(重定向)或类似的东西。我用这个来检查一个实际的文件。在服务器上存在或不存在。显然,如果存在这种文件,它不会导致重定向,但如果不存在,它会重定向到404页面,正如我之前所说,404页面可能没有404代码。
function is_404($url) {
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
/* If the document has loaded successfully without any redirection or error */
if ($httpCode >= 200 && $httpCode < 300) {
return false;
} else {
return true;
}
}
#4
20
As strager suggests, look into using cURL. You may also be interested in setting CURLOPT_NOBODY with curl_setopt to skip downloading the whole page (you just want the headers).
正如strager所建议的,考虑使用旋度。您可能还想用CURLOPT_NOBODY设置curl_setopt来跳过下载整个页面(您只需要标题)。
#5
15
If you are looking for an easiest solution and the one you can try in one go on php5 do
如果你正在寻找一个最简单的解决方案,你可以尝试一下php5
file_get_contents('www.yoursite.com');
//and check by echoing
echo $http_response_header[0];
#6
6
I found this answer here:
我在这里找到了答案:
if(($twitter_XML_raw=file_get_contents($timeline))==false){
// Retrieve HTTP status code
list($version,$status_code,$msg) = explode(' ',$http_response_header[0], 3);
// Check the HTTP Status code
switch($status_code) {
case 200:
$error_status="200: Success";
break;
case 401:
$error_status="401: Login failure. Try logging out and back in. Password are ONLY used when posting.";
break;
case 400:
$error_status="400: Invalid request. You may have exceeded your rate limit.";
break;
case 404:
$error_status="404: Not found. This shouldn't happen. Please let me know what happened using the feedback link above.";
break;
case 500:
$error_status="500: Twitter servers replied with an error. Hopefully they'll be OK soon!";
break;
case 502:
$error_status="502: Twitter servers may be down or being upgraded. Hopefully they'll be OK soon!";
break;
case 503:
$error_status="503: Twitter service unavailable. Hopefully they'll be OK soon!";
break;
default:
$error_status="Undocumented error: " . $status_code;
break;
}
Essentially, you use the "file get contents" method to retrieve the URL, which automatically populates the http response header variable with the status code.
本质上,您可以使用“file get contents”方法来检索URL,该URL将自动使用状态代码填充http响应头变量。
#7
3
addendum;tested those 3 methods considering performance.
附录;测试了这三种方法的性能。
The result, at least in my testing environment:
结果,至少在我的测试环境中:
Curl wins
旋度赢得
This test is done under the consideration that only the headers (noBody) is needed. Test yourself:
这个测试是在只需要报头(没有人)的情况下进行的。测试自己:
$url = "http://de.wikipedia.org/wiki/Pinocchio";
$start_time = microtime(TRUE);
$headers = get_headers($url);
echo $headers[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
$start_time = microtime(TRUE);
$response = file_get_contents($url);
echo $http_response_header[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
$start_time = microtime(TRUE);
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle, CURLOPT_NOBODY, 1); // and *only* get the header
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
// if($httpCode == 404) {
// /* Handle 404 here. */
// }
echo $httpCode."<br>";
curl_close($handle);
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";
#8
2
As an additional hint to the great accepted answer:
作为一个额外的暗示,伟大接受的回答:
When using a variation of the proposed solution, I got errors because of php setting 'max_execution_time'. So what I did was the following:
当使用提议的解决方案的变体时,由于php设置了“max_execution_time”,所以出现了错误。我所做的是:
set_time_limit(120);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
$result = curl_exec($curl);
set_time_limit(ini_get('max_execution_time'));
curl_close($curl);
First I set the time limit to a higher number of seconds, in the end I set it back to the value defined in the php settings.
首先我将时间限制设置为更高的秒数,最后我将它设置为php设置中定义的值。
#9
1
You can use this code too, to see the status of any link:
您也可以使用此代码查看任何链接的状态:
<?php
function get_url_status($url, $timeout = 10)
{
$ch = curl_init();
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE); // find HTTP status
curl_close($ch); // close handle
echo $status; //or return $status;
//example checking
if ($status == '302') { echo 'HEY, redirection';}
}
get_url_status('http://yourpage.comm');
?>
#10
1
<?php
$url= 'www.something.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.4");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
echo $httpcode;
?>
#11
1
Here is a short solution.
这里有一个简短的解决方案。
$handle = curl_init($uri);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle,CURLOPT_HTTPHEADER,array ("Accept: application/rdf+xml"));
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 200||$httpCode == 303)
{
echo "you might get a reply";
}
curl_close($handle);
In your case, you can change application/rdf+xml
to whatever you use.
在您的示例中,可以将应用程序/rdf+xml更改为您使用的任何内容。
#12
0
this is just and slice of code, hope works for you
这只是一小段代码,希望对你有用
$ch = @curl_init();
@curl_setopt($ch, CURLOPT_URL, 'http://example.com');
@curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
@curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$response = @curl_exec($ch);
$errno = @curl_errno($ch);
$error = @curl_error($ch);
$response = $response;
$info = @curl_getinfo($ch);
return $info['http_code'];
#13
0
To catch all errors : 4XX and 5XX, i use this little script :
为了捕获所有错误:4XX和5XX,我使用这个小脚本:
function URLIsValid($URL){
$headers = @get_headers($URL);
preg_match("/ [45][0-9]{2} /", (string)$headers[0] , $match);
return count($match) === 0;
}
#14
0
This will give you true if url does not return 200 OK
如果url不返回200 OK,则返回true
function check_404($url) {
$headers=get_headers($url, 1);
if ($headers[0]!='HTTP/1.1 200 OK') return true; else return false;
}