I am loading a bunch of rss feeds using DOM and sometimes one will 404 instead of producing the file. The problem is that the web-server sends out an html 404 page in place of the expected xml file so using this code:
我正在使用DOM加载一堆rss feed,有时会有404而不是生成文件。问题是web服务器发出一个html 404页面来代替预期的xml文件,所以使用这个代码:
$rssDom = new DOMDocument();
$rssDom->load($url);
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
I get this warning:
我收到这个警告:
Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined
Followed by this error:
接下来是这个错误:
Fatal error: Call to a member function getElementsByTagName() on a non-object
Normally, this code works fine, but on the occasion that I get a 404 it fails to do anything. I tried a standard try-catch around the load statement but it doesn't seem to catch it.
通常,这段代码工作正常,但在我得到404的情况下,它无法做任何事情。我在load语句周围尝试了一个标准的try-catch,但它似乎没有抓住它。
5 个解决方案
#1
4
You can suppress the output of parsing errors with
您可以使用抑制解析错误的输出
libxml_use_internal_errors(true);
To check whether the returned response is a 404 you can check the $http_response_header
after the call to DOMDocument::load()
要检查返回的响应是否为404,您可以在调用DOMDocument :: load()之后检查$ http_response_header
Example:
libxml_use_internal_errors(true);
$rssDom = new DOMDocument();
$rssDom->load($url);
if (strpos($http_response_header[0], '404')) {
die('file not found. exiting.');
}
The alternative would be to use file_get_contents
and then check the response header and if its not a 404 load the markup with DOMDocument::loadXml
. This would prevent DOMDocument
from parsing invalid XML.
另一种方法是使用file_get_contents,然后检查响应头,如果不是404,则使用DOMDocument :: loadXml加载标记。这将阻止DOMDocument解析无效的XML。
Note that all this assumes that the server correctly returns a 404 header in the response.
请注意,所有这些都假设服务器在响应中正确返回404标头。
#2
2
Load the HTML manually with file_get_contents
or curl
(which allows you to do your own error checks) and if all goes well then feed the results to DOMDocument::loadHTML
.
使用file_get_contents或curl手动加载HTML(允许您进行自己的错误检查),如果一切顺利,则将结果提供给DOMDocument :: loadHTML。
There are lots of curl
examples here (e.g. look at this one, although it's surely not the best); to get the HTTP status code you would use curl_getinfo
.
这里有很多卷曲的例子(例如看这个,虽然它肯定不是最好的);要获取HTTP状态代码,您将使用curl_getinfo。
#3
0
to avoid the warning, you could use LIBXML_NOWARNING
(note: suppressing warnings normally isn't a good thing to do).
为了避免警告,你可以使用LIBXML_NOWARNING(注意:抑制警告通常不是一件好事)。
the more important problem here is the fatal error: to avoid this, you should check if the document has been loaded correctly. to to this, just save the load()
s return-value and ise it:
这里更重要的问题是致命错误:要避免这种情况,您应该检查文档是否已正确加载。为此,只需保存load()的返回值即可:
$loaded = $rssDom->load($url, LIBXML_NOWARNING);
if($loaded){
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
}else{
// show error-message or something like that
}
#4
0
Like this:
$rssDom = new DOMDocument();
if($rssDom->load($url)) {
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
}
#5
0
In case someone needs a solution, this works like charm:
如果有人需要解决方案,这就像魅力:
$objDOM = new DOMDocument();
$loaded=@$objDOM->load(url);
if (!$loaded){
//something went terribly wrong
} else {
//this is going ok!!
}
This works as we supress warnings by '@' and load returns true or false in case of errors.
这有效,因为我们通过'@'来抑制警告,并且如果出现错误,则加载返回true或false。
#1
4
You can suppress the output of parsing errors with
您可以使用抑制解析错误的输出
libxml_use_internal_errors(true);
To check whether the returned response is a 404 you can check the $http_response_header
after the call to DOMDocument::load()
要检查返回的响应是否为404,您可以在调用DOMDocument :: load()之后检查$ http_response_header
Example:
libxml_use_internal_errors(true);
$rssDom = new DOMDocument();
$rssDom->load($url);
if (strpos($http_response_header[0], '404')) {
die('file not found. exiting.');
}
The alternative would be to use file_get_contents
and then check the response header and if its not a 404 load the markup with DOMDocument::loadXml
. This would prevent DOMDocument
from parsing invalid XML.
另一种方法是使用file_get_contents,然后检查响应头,如果不是404,则使用DOMDocument :: loadXml加载标记。这将阻止DOMDocument解析无效的XML。
Note that all this assumes that the server correctly returns a 404 header in the response.
请注意,所有这些都假设服务器在响应中正确返回404标头。
#2
2
Load the HTML manually with file_get_contents
or curl
(which allows you to do your own error checks) and if all goes well then feed the results to DOMDocument::loadHTML
.
使用file_get_contents或curl手动加载HTML(允许您进行自己的错误检查),如果一切顺利,则将结果提供给DOMDocument :: loadHTML。
There are lots of curl
examples here (e.g. look at this one, although it's surely not the best); to get the HTTP status code you would use curl_getinfo
.
这里有很多卷曲的例子(例如看这个,虽然它肯定不是最好的);要获取HTTP状态代码,您将使用curl_getinfo。
#3
0
to avoid the warning, you could use LIBXML_NOWARNING
(note: suppressing warnings normally isn't a good thing to do).
为了避免警告,你可以使用LIBXML_NOWARNING(注意:抑制警告通常不是一件好事)。
the more important problem here is the fatal error: to avoid this, you should check if the document has been loaded correctly. to to this, just save the load()
s return-value and ise it:
这里更重要的问题是致命错误:要避免这种情况,您应该检查文档是否已正确加载。为此,只需保存load()的返回值即可:
$loaded = $rssDom->load($url, LIBXML_NOWARNING);
if($loaded){
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
}else{
// show error-message or something like that
}
#4
0
Like this:
$rssDom = new DOMDocument();
if($rssDom->load($url)) {
$channel = $rssDom->getElementsByTagName('channel');
$channel = $channel->item(0);
$items = $channel->getElementsByTagName('item');
}
#5
0
In case someone needs a solution, this works like charm:
如果有人需要解决方案,这就像魅力:
$objDOM = new DOMDocument();
$loaded=@$objDOM->load(url);
if (!$loaded){
//something went terribly wrong
} else {
//this is going ok!!
}
This works as we supress warnings by '@' and load returns true or false in case of errors.
这有效,因为我们通过'@'来抑制警告,并且如果出现错误,则加载返回true或false。