为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?

时间:2021-07-15 18:11:29

#!/usr/bin/env python
#coding:utf-8
import requests,urllib,urllib2,urllib3,urlparse

url = "http://xxx.com/index.php?Q=u=%OS%26%20"
print "original:",url
#print requests.get(url).content
print "-------------------"
url_parts = urlparse.urlparse(url)
print "splited:",url_parts.query
print "-------------------"
params = dict(urlparse.parse_qsl(url_parts.query,True))
print "parsed:","Q="+params['Q']
print "-------------------"
url_dealed = urlparse.urlunsplit((url_parts.scheme,url_parts.netloc\
                                    ,url_parts.path,urllib.urlencode(params)\
                                    ,url_parts.fragment))
print "unsplited(dealed_final):",url_dealed
print "-------------------"

#requests.get(url="http://xxx.com/index.php?Q=u=%OS%26%20").content
#requests.get(url="http://xxx.com/index.php",params).content
print "requests:","http://xxx.com/index.php?Q=u=%25OS%2526%2520"

#urllib.urlopen(url="http://xxx.com/index.php?Q=u=%OS%26%20").read()
print "urllib:","http://xxx.com/index.php?Q=u=%OS%26%20"

#urllib2.urlopen(url="http://xxx.com/index.php?Q=u=%OS%26%20").read()
print "urllib2:","http://xxx.com/index.php?Q=u=%OS%26%20"

#urllib3.ProxyManager('http://localhost:8888/').request("GET", "http://xxx.com/index.php?Q=u%3D%25OS%26+").data
print "urllib3:","http://xxx.com/index.php?Q=u=%OS%26%20"

#url_dealed = "http://xxx.com/index.php?Q=u%3D%25OS%26+"
#requests.get(url_dealed).content
#urllib3.ProxyManager('http://localhost:8888/').request("GET", url_dealed).data
#urllib2.urlopen(url_dealed).read()
#urllib.urlopen(url_dealed).read()
print "all requests dealed_final:","http://xxx.com/index.php?Q=u%3D%25OS%26+"

借号发个学习交流帖,求各路大神科普四个python库的默认编码行为差异:
今天测试个注入,把浏览器的url贴到requests参数里发包一直没得到预期响应,决定抓包diff一把,没想到真是requests做了手脚。
1. requests请求http://xxx.com/index.php?Q=u=%25OS%2526%2520会把Q值中的%都URL编码为%25。
2. requests请求经过分割-编码-组装处理后的http://xxx.com/index.php?Q=u%3D%25OS%26+保持原样。
3. urllib,urllib2,urllib3对1,2中的URL请求都保持原样。
说好的requests是封装的urllib3,为啥要改变这种默认行为?

感谢(0)
分享到: 0
  1. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    1# answer 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-03 20:55

    顶 同问

  2. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    2# 隐形人真忙 (关注安全研发与漏洞) 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-03 20:59

    这个是个坑    requests编码的很厉害   一般我的话用urllib的这个方法:
    urllib.quote(string[, safe])
    对字符串进行编码。参数safe指定不需要编码的字符。至于为什么   只能看源码咯...

  3. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    3# null_z 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-03 21:18

    如果是你自己拼接的url 然后用 requests不会编码
    如果是字典作为参数传入requests会编码一次
    requests非标准库。

  4. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    4# Larry (‮‮了转么怎名签道知于终MT我) 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-03 21:28

    同二楼,今天也想urlencode,觉得也可以用urllib.quote_plus(string[, safe])可以编码URL中HTML形式的查询字符串,然后再自己拼接一下

  5. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    5# fate0 (我在未来等你) 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-03 22:58

    https://github.com/kennethreitz/requests/blob/master/requests%2Futils.py#L443

    因为 OS 不是合法的 16 进制

  6. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    6# 一个小渣渣 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-04 08:37

    我也一直在纠结 编写exp的时候 有时候会用类似%u0027 这样来绕过一些限制  但是用了这种编码url 直接报错

  7. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    7# zxx 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-04 11:16

    from requests import Request, Session
    s = Session()
    url="http://xxx.com/index.php?Q=u=%OS%26%20"
    req = Request('GET', url)
    prepped = s.prepare_request(req)
    prepped.url = prepped.url.replace('%25', '%')
    resp = s.send(prepped)

  8. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    8# ModNar 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-04 11:46

    @fate0 # We couldn't unquote the given URI, so let's try quoting it, but
            # there may be unquoted '%'s in the URI. We need to make sure they're
            # properly quoted so they do not cause issues elsewhere
    为了不能解码的16进制,就这样简单粗暴地编码%,还影响了别的能解码的16进制,有点无法理解。。。传统浏览器都没这么做,requests这么做也不提供个开关。

  9. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    9# ModNar 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-04 11:49

    @隐形人真忙 这样处理再传给requests可以解决这类问题,urllib.urlencode(dict(urlparse.parse_qsl(urlparse.urlparse(url).query,True))),也是解码后编码,把%26和%20保留了~

  10. 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异?
    10# 隐形人真忙 (关注安全研发与漏洞) 为啥requests和urllib,urllib2,urllib3的URL编码行为存在这种差异? | 2016-03-04 17:20

    @ModNar 确实  当时为了解决   特地仔细读了文档   然而并没有这个flag