I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?
我目前正在尝试使用Python登录到一个站点,但是该站点似乎在同一个页面上发送了一个cookie和一个重定向语句。Python似乎在跟踪这个重定向,从而阻止我读取登录页面发送的cookie。如何防止Python的urllib(或urllib2)在重定向后打开?
4 个解决方案
#1
33
You could do a couple of things:
你可以做一些事情:
- Build your own HTTPRedirectHandler that intercepts each redirect
- 构建自己的HTTPRedirectHandler,它可以拦截每个重定向。
- Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.
- 创建一个HTTPCookieProcessor的实例并安装这个打开器,这样您就可以访问cookiejar了。
This is a quick little thing that shows both
这是一件很简单的事情。
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
#2
28
If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.
如果您所需要的只是停止重定向,那么有一个简单的方法。例如,我只希望得到cookie和更好的性能,我不想被重定向到任何其他页面。我希望代码保存为3xx。例如,让我们使用302。
class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# only add this line to stop 302 redirection.
if code == 302: return response
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)
In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()
这样,您甚至不需要进入urllib2.HTTPRedirectHandler.http_error_302()
Yet more common case is that we simply want to stop redirection (as required):
更常见的情况是,我们只是想停止重定向(根据需要):
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
And normally use it this way:
通常这样使用:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
redirection_target = response.headers['Location']
#3
11
urllib2.urlopen
calls build_opener()
which uses this list of handler classes:
urllib2。urlopen调用build_opener(),它使用这个处理程序类列表:
handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]
You could try calling urllib2.build_opener(handlers)
yourself with a list that omits HTTPRedirectHandler
, then call the open()
method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener)
to your own non-redirecting opener.
您可以尝试将urllib2.build_opener(处理程序)命名为omits HTTPRedirectHandler,然后调用open()方法来打开您的URL。如果您真的不喜欢重定向,您甚至可以将urllib2.install_opener(开瓶器)命名为您自己的非重定向启动器。
It sounds like your real problem is that urllib2
isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?
听起来你真正的问题是urllib2并没有像你想的那样做饼干。还可以看到如何使用Python登录到一个网页并检索cookie以供以后使用?
#4
3
This question was asked before here.
这个问题在此之前被问到过。
EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.
编辑:如果你要处理古怪的网络应用,你应该试试机械化。它是一个很好的模拟web浏览器的库。您可以控制重定向、cookies、页面刷新……如果网站不依赖JavaScript,你就能很好地与机械化相处。
#1
33
You could do a couple of things:
你可以做一些事情:
- Build your own HTTPRedirectHandler that intercepts each redirect
- 构建自己的HTTPRedirectHandler,它可以拦截每个重定向。
- Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.
- 创建一个HTTPCookieProcessor的实例并安装这个打开器,这样您就可以访问cookiejar了。
This is a quick little thing that shows both
这是一件很简单的事情。
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
#2
28
If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.
如果您所需要的只是停止重定向,那么有一个简单的方法。例如,我只希望得到cookie和更好的性能,我不想被重定向到任何其他页面。我希望代码保存为3xx。例如,让我们使用302。
class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# only add this line to stop 302 redirection.
if code == 302: return response
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)
In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()
这样,您甚至不需要进入urllib2.HTTPRedirectHandler.http_error_302()
Yet more common case is that we simply want to stop redirection (as required):
更常见的情况是,我们只是想停止重定向(根据需要):
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
And normally use it this way:
通常这样使用:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
redirection_target = response.headers['Location']
#3
11
urllib2.urlopen
calls build_opener()
which uses this list of handler classes:
urllib2。urlopen调用build_opener(),它使用这个处理程序类列表:
handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]
You could try calling urllib2.build_opener(handlers)
yourself with a list that omits HTTPRedirectHandler
, then call the open()
method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener)
to your own non-redirecting opener.
您可以尝试将urllib2.build_opener(处理程序)命名为omits HTTPRedirectHandler,然后调用open()方法来打开您的URL。如果您真的不喜欢重定向,您甚至可以将urllib2.install_opener(开瓶器)命名为您自己的非重定向启动器。
It sounds like your real problem is that urllib2
isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?
听起来你真正的问题是urllib2并没有像你想的那样做饼干。还可以看到如何使用Python登录到一个网页并检索cookie以供以后使用?
#4
3
This question was asked before here.
这个问题在此之前被问到过。
EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.
编辑:如果你要处理古怪的网络应用,你应该试试机械化。它是一个很好的模拟web浏览器的库。您可以控制重定向、cookies、页面刷新……如果网站不依赖JavaScript,你就能很好地与机械化相处。