I am trying to parse this site (to get the img-link): http://statigr.am/feed/parishilton
我正在尝试解析这个网站(以获取img-link):http://statigr.am/feed/parishilton
This is my code:
这是我的代码:
include 'parse/simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://statigr.am/feed/parishilton/');
// Find all images
foreach($html->find('img') as $element)
{
echo $element->src . '<br>';
}
The script doesn't return anything! Why is that ? I want the img
link.
该脚本不会返回任何内容!这是为什么 ?我想要img链接。
1 个解决方案
#1
0
It's because all images are inside CDATA
section and parser ignores it, so the solution is
这是因为所有图像都在CDATA部分内部,解析器忽略它,因此解决方案是
$html = file_get_html('http://statigr.am/feed/parishilton/');
$html = str_replace("<![CDATA[","",$html); // clean-up
$html = str_replace("]]>","",$html); // clean-up
$html = str_get_html($html); // re-construct the dom object
// Loop
foreach($html->find('item description img') as $el)
{
echo $el->src . "<br />";
}
Replace all CDATA
from the returned content and then use str_get_html
to create DOM
object from that string and loop through the images. (Tested and works).
从返回的内容中替换所有CDATA,然后使用str_get_html从该字符串创建DOM对象并循环遍历图像。 (经过测试和工作)。
Output :
http://distilleryimage3.s3.amazonaws.com/cc25d8562c9611e3a8b922000a1f8ac2_8.jpg
http://distilleryimage7.s3.amazonaws.com/4d8e22da2c8911e3a6a022000ae81e78_8.jpg
http://distilleryimage5.s3.amazonaws.com/ce6aa38a2be711e391ae22000ae9112d_8.jpg
http://distilleryimage3.s3.amazonaws.com/d64ab4c42bc811e39cbd22000a1fafdb_8.jpg
......
......
#1
0
It's because all images are inside CDATA
section and parser ignores it, so the solution is
这是因为所有图像都在CDATA部分内部,解析器忽略它,因此解决方案是
$html = file_get_html('http://statigr.am/feed/parishilton/');
$html = str_replace("<![CDATA[","",$html); // clean-up
$html = str_replace("]]>","",$html); // clean-up
$html = str_get_html($html); // re-construct the dom object
// Loop
foreach($html->find('item description img') as $el)
{
echo $el->src . "<br />";
}
Replace all CDATA
from the returned content and then use str_get_html
to create DOM
object from that string and loop through the images. (Tested and works).
从返回的内容中替换所有CDATA,然后使用str_get_html从该字符串创建DOM对象并循环遍历图像。 (经过测试和工作)。
Output :
http://distilleryimage3.s3.amazonaws.com/cc25d8562c9611e3a8b922000a1f8ac2_8.jpg
http://distilleryimage7.s3.amazonaws.com/4d8e22da2c8911e3a6a022000ae81e78_8.jpg
http://distilleryimage5.s3.amazonaws.com/ce6aa38a2be711e391ae22000ae9112d_8.jpg
http://distilleryimage3.s3.amazonaws.com/d64ab4c42bc811e39cbd22000a1fafdb_8.jpg
......
......