什么是完全解析网址的最有效方法(使用php和curl)

时间:2022-10-10 20:24:26

I'm looking for the most effecient way to resolve a given url to its final end point, following all 30x redirects and location headers.

我正在寻找最有效的方法来解决给定网址的最终终点,遵循所有30倍重定向和位置标题。

Basically, I have a bunch of URLs like http://foo.com that when you go to them, they end up at a page like http://foo.com/Welcome.html and i need to find that last url.

基本上,我有一堆像http://foo.com这样的网址,当你去他们时,他们最终会在http://foo.com/Welcome.html这样的网页上找到最后的网址。

right now, i'm using CURLOPT_FOLLOWLOCATION and CURLOPT_NOBODY (since i really dont care about the text returned) and once its exec'd, i run curl_getinfo() and save the 'url' key from that array.

现在,我正在使用CURLOPT_FOLLOWLOCATION和CURLOPT_NOBODY(因为我真的不关心返回的文本)并且一旦执行了,我运行curl_getinfo()并从该数组中保存'url'键。

i just keep thinking that this is such a huge waste of <something> and there is likely a better way.

我只是一直认为这是对 的巨大浪费,而且可能有更好的方法。

EDIT: For those that read this later. I did end up finding a better solution (that didnt involve curl), see get_headers() in php5+

编辑:对于那些稍后阅读的人。我最终找到了一个更好的解决方案(没有涉及卷曲),请参阅php5 +中的get_headers()

1 个解决方案

#1


You can do this manually in php by analysing received headers, but cURL does exactly the same thing. There are no other direct methods, and cURL is the most comfortable one. Don't care about it.

您可以通过分析收到的标题在php中手动执行此操作,但cURL完全相同。没有其他直接方法,cURL是最舒适的方法。不关心它。

OR you can use some search engines information, that had been already retrived by crawler.

或者您可以使用已经由爬虫重新获得的一些搜索引擎信息。

#1


You can do this manually in php by analysing received headers, but cURL does exactly the same thing. There are no other direct methods, and cURL is the most comfortable one. Don't care about it.

您可以通过分析收到的标题在php中手动执行此操作,但cURL完全相同。没有其他直接方法,cURL是最舒适的方法。不关心它。

OR you can use some search engines information, that had been already retrived by crawler.

或者您可以使用已经由爬虫重新获得的一些搜索引擎信息。