here is the situation. I'm retrieving a page using curl into a variable. So I now have all the HTML in one snug variable. I need to however using code access a certain DIV notes contents actually its like this - there is one div node on the page with the ID of 'image' and its kinda like this:
这是情况。我正在使用curl将页面检索到变量中。所以我现在将所有HTML放在一个舒适的变量中。我需要使用代码访问某些DIV注释内容实际上就像这样 - 页面上有一个div节点,其ID为'image',它有点像这样:
<html>
<body>
..........
<div id="image">
<a href="somelocation">
<img src="location.jpg"/> <!-- I need to grab the src of this image object -->
</a>
</div>
<div> Other stuff blah blah</div>
</body>
</html>
I need to grab the src attribute of an image tag which is nested within a div tag of the id 'image' which is tucked away somewhere on an HTML page.
我需要获取图像标记的src属性,该标记嵌套在id'image'的div标记内,该标记隐藏在HTML页面的某个位置。
How do I do this server end considering I'm retrieving this page using curl.
考虑到我正在使用curl检索此页面,我该如何做这个服务器端。
Thanks again.
1 个解决方案
#1
Have you considered using an HTML DOM Parser ?
您是否考虑过使用HTML DOM Parser?
This will handle all the parsing (even of irregular HTML) and the subsequent querying of elements.
这将处理所有解析(甚至是不规则的HTML)以及随后的元素查询。
(I wouldn't use regexps - HTML isn't regular and not suited to regexp usage. Huge numbers of edge cases exist to trip you up)
(我不会使用正则表达式 - HTML不是常规的,不适合regexp使用。存在大量边缘情况以绊倒你)
#1
Have you considered using an HTML DOM Parser ?
您是否考虑过使用HTML DOM Parser?
This will handle all the parsing (even of irregular HTML) and the subsequent querying of elements.
这将处理所有解析(甚至是不规则的HTML)以及随后的元素查询。
(I wouldn't use regexps - HTML isn't regular and not suited to regexp usage. Huge numbers of edge cases exist to trip you up)
(我不会使用正则表达式 - HTML不是常规的,不适合regexp使用。存在大量边缘情况以绊倒你)