Is there a way to parse a website's source on the iPhone to get the URL's of photos on that page? If so how would you do that?
6 个解决方案
I recommend regular expressions. There's a great open source Regex library for Cocoa called RegexKit. For the most part, you can just drop it in your code and it'll "just work".
我推荐正则表达式。 Cocoa有一个很棒的开源Regex库,名为RegexKit。在大多数情况下,你可以将它放在你的代码中,它“只是工作”。
Getting all the urls of images wouldn't be too difficult (less than 20 lines of code) if you assume that all images are going to be in <img> tags. You'd just grab all the image tags (something like: <img\s+[^>]+>), then iterate through those matches. For each match, you'd pull out whatever's in the src attribute: src\s*=\s*("|')?\s*([^\s"']+)(\s|"|')
] +>),然后迭代这些匹配。对于每个匹配,你将拉出src属性中的任何内容:src \ s * = \ s *(“|')?\ s *([^ \ s”'] +)(\ s |“|')
You might need to tweak that a bit, but it shouldn't be too bad.
I'd say go for regular expressions - there is a one page library that wraps c regexesthat you can drop into your project.
我会说正则表达式 - 有一个单页库可以包含你可以放入项目的c regexest。
There is no super easy way. When I had to do it I wrote a libxml2 SAX parser. libxml2 has an html reader that works fairly well with malformed html, and libxml2 is included with the base system.
没有超级简单的方法。当我不得不这样做时,我写了一个libxml2 SAX解析器。 libxml2有一个html阅读器,它可以很好地处理格式错误的html,并且libxml2包含在基本系统中。
You could try it using regular expressions, but I wouldn't recommend that. You should have a look at NSXMLParser, assuming the webpage is coded to be XHTML compliant. TouchXML is another good library.
您可以使用正则表达式尝试它,但我不建议这样做。您应该看看NSXMLParser,假设网页编码为符合XHTML。 TouchXML是另一个很好的库。
Are you OK with any approach you use not picking up on images loaded dynamically via JavaScript.
The closest thing I could see working is to parse out any JavaScript imports, load those up too, and then use a regular expression across the whole file looking for anything that ends in ".jpg/.gif/.png" and grab the full URL out from that. The libxml approach would miss out on references to images not in img tags, but it might well be good enough.
我能看到的最接近的工作是解析任何JavaScript导入,加载它们,然后在整个文件中使用正则表达式查找以“.jpg / .gif / .png”结尾的任何内容并抓住全部从中输出的URL。 libxml方法会错过对不在img标签中的图像的引用,但它可能已经足够好了。
I recommend regular expressions. There's a great open source Regex library for Cocoa called RegexKit. For the most part, you can just drop it in your code and it'll "just work".
我推荐正则表达式。 Cocoa有一个很棒的开源Regex库,名为RegexKit。在大多数情况下,你可以将它放在你的代码中,它“只是工作”。
Getting all the urls of images wouldn't be too difficult (less than 20 lines of code) if you assume that all images are going to be in <img> tags. You'd just grab all the image tags (something like: <img\s+[^>]+>), then iterate through those matches. For each match, you'd pull out whatever's in the src attribute: src\s*=\s*("|')?\s*([^\s"']+)(\s|"|')
] +>),然后迭代这些匹配。对于每个匹配,你将拉出src属性中的任何内容:src \ s * = \ s *(“|')?\ s *([^ \ s”'] +)(\ s |“|')
You might need to tweak that a bit, but it shouldn't be too bad.
I'd say go for regular expressions - there is a one page library that wraps c regexesthat you can drop into your project.
我会说正则表达式 - 有一个单页库可以包含你可以放入项目的c regexest。
There is no super easy way. When I had to do it I wrote a libxml2 SAX parser. libxml2 has an html reader that works fairly well with malformed html, and libxml2 is included with the base system.
没有超级简单的方法。当我不得不这样做时,我写了一个libxml2 SAX解析器。 libxml2有一个html阅读器,它可以很好地处理格式错误的html,并且libxml2包含在基本系统中。
You could try it using regular expressions, but I wouldn't recommend that. You should have a look at NSXMLParser, assuming the webpage is coded to be XHTML compliant. TouchXML is another good library.
您可以使用正则表达式尝试它,但我不建议这样做。您应该看看NSXMLParser,假设网页编码为符合XHTML。 TouchXML是另一个很好的库。
take a look at Event Driven XML Parsing in the iPhone reference library
Are you OK with any approach you use not picking up on images loaded dynamically via JavaScript.
The closest thing I could see working is to parse out any JavaScript imports, load those up too, and then use a regular expression across the whole file looking for anything that ends in ".jpg/.gif/.png" and grab the full URL out from that. The libxml approach would miss out on references to images not in img tags, but it might well be good enough.
我能看到的最接近的工作是解析任何JavaScript导入,加载它们,然后在整个文件中使用正则表达式查找以“.jpg / .gif / .png”结尾的任何内容并抓住全部从中输出的URL。 libxml方法会错过对不在img标签中的图像的引用,但它可能已经足够好了。