HTTP获取请求使用HttpLib“永久移动”

时间:2022-01-17 19:16:23

Scope:

I am currently trying to write a Web Scrapper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.

我目前正在尝试为此特定页面编写Web Scrapper。我有一个非常强大的“Web爬行”背景使用C#,但这个httplib打败了我。

Problem:

When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.

当我尝试为上面指定的页面发出Http Get请求时,我得到一个“Moved Permanently”,它指向同一个URL。我可以使用请求lib发出请求,但我希望使用httplib使其工作,这样我就能理解我做错了什么。

Code Sample:

I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.

我是Python的新手,所以任何错误的语言指南或语法都是C#的错。

import httplib

# Wrapper for a "HTTP GET" Request
class HttpClient(object):
    def HttpGet(self, url, host):
        connection = httplib.HTTPConnection(host)
        connection.request('GET', url)
        return connection.getresponse().read()


# Using "HttpClient" class
httpclient = httpClient()

# This is the full URL I need to make a get request for : https://420101.com/strain-database

httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText

I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.

我真的想让它使用httplib库,而不是请求或任何其他花哨的库,因为我觉得我错过了一些非常小的东西。

1 个解决方案

#1


The problem i've had too little or too much caffeine in my system.

问题是我的系统中咖啡因含量太少或太多。

To get a https, I needed the HTTPSConnection class.

要获得https,我需要HTTPSConnection类。

Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.

此外,我想要获取的地址中没有“www”。因此,它不应该包含在主机中。

Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.

两个错误的地址都将我重定向到正确的地址,并带有301错误代码。如果我使用请求或功能更全面的模块,它会自动跟随重定向。

My Validation:

c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason

200 OK

#1


The problem i've had too little or too much caffeine in my system.

问题是我的系统中咖啡因含量太少或太多。

To get a https, I needed the HTTPSConnection class.

要获得https,我需要HTTPSConnection类。

Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.

此外,我想要获取的地址中没有“www”。因此,它不应该包含在主机中。

Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.

两个错误的地址都将我重定向到正确的地址,并带有301错误代码。如果我使用请求或功能更全面的模块,它会自动跟随重定向。

My Validation:

c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason

200 OK