requests.adapters.HTTPAdapter中的pool_connections是什么意思?

时间:2020-12-18 18:10:47

When initializing a requests' Session, two HTTPAdapter will be created and mount to http and https.

初始化请求会话时,将创建两个HTTPAdapter并挂载到http和https。

This is how HTTPAdapter is defined:

以下是HTTPAdapter的定义:

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10,
                                    max_retries=0, pool_block=False)

While I understand the meaning of pool_maxsize(which is the number of session a pool can save), I don't understand what pool_connections means or what it does. Doc says:

虽然我理解pool_maxsize(池可以保存的会话数量)的含义,但我不理解pool_connections是什么意思,以及它的作用。医生说:

Parameters: 
pool_connections – The number of urllib3 connection pools to cache.

But what does it mean "to cache"? And what's the point using multiple connection pools?

但是“缓存”是什么意思呢?使用多个连接池有什么意义?

2 个解决方案

#1


7  

Requests uses urllib3 to manage its connections and other features.

请求使用urllib3管理其连接和其他特性。

Re-using connections is an important factor in keeping recurring HTTP requests performant. The urllib3 README explains:

重用连接是保持反复出现的HTTP请求性能的重要因素。urllib3 README解释道:

Why do I want to reuse connections?

为什么我要重用连接?

Performance. When you normally do a urllib call, a separate socket connection is created with each request. By reusing existing sockets (supported since HTTP 1.1), the requests will take up less resources on the server's end, and also provide a faster response time at the client's end. [...]

的性能。当您通常执行urllib调用时,将为每个请求创建单独的套接字连接。通过重用现有套接字(自HTTP 1.1以来得到支持),请求将占用服务器端更少的资源,并在客户端提供更快的响应时间。[…]

To answer your question, "pool_maxsize" is the number of connections to keep around per host (this is useful for multi-threaded applications), whereas "pool_connections" is the number of host-pools to keep around. For example, if you're connecting to 100 different hosts, and pool_connections=10, then only the latest 10 hosts' connections will be re-used.

要回答您的问题,“pool_maxsize”是每个主机保持的连接数量(这对于多线程应用程序非常有用),而“pool_connections”是要保持的主机池数量。例如,如果您正在连接100个不同的主机,并且pool_connections=10,那么将只重用最近的10个主机的连接。

#2


16  

I wrote an article about this. pasted it here:

我写了一篇关于这个的文章。粘贴在这里:

Requests' secret: pool_connections and pool_maxsize

Requests is one of the, if not the most well-known Python third-party library for Python programmers. With its simple API and high performance, people tend to use requests instead of urllib2 provided by standard library for HTTP requests. However, people who use requests every day may not know the internals, and today I want to introduce two of them: pool_connections and pool_maxsize.

请求是Python程序员最知名的Python第三方库之一。由于其简单的API和高性能,人们倾向于使用请求而不是标准库为HTTP请求提供的urllib2。但是,每天使用请求的人可能不知道内部信息,今天我将介绍其中的两个:pool_connections和pool_maxsize。

Let's start with Session:

让我们开始会议:

import requests

s = requests.Session()
s.get('https://www.google.com')

It's pretty simple. You probably know requests' Session can persists cookie. Cool. But do you know Session has a mount method?

这很简单。您可能知道请求的会话可以持久化cookie。酷。但是你知道Session有挂载方法吗?

mount(prefix, adapter)
Registers a connection adapter to a prefix.
Adapters are sorted in descending order by key length.

挂载(前缀、适配器)将连接适配器注册到前缀。适配器按关键长度按降序排序。

No? Well, in fact you've already used this method when you initialize a Session object:

没有?实际上,你在初始化会话对象时已经使用了这个方法:

class Session(SessionRedirectMixin):

    def __init__(self):
        ...
        # Default connection adapters.
        self.adapters = OrderedDict()
        self.mount('https://', HTTPAdapter())
        self.mount('http://', HTTPAdapter())

Now comes the interesting part. If you've read Ian Cordasco's article Retries in Requests, you should know that HTTPAdapter can be used to provide retry functionality. But what is an HTTPAdapter really? Quote from doc:

现在有趣的部分来了。如果您读过Ian Cordasco在请求中重试的文章,您应该知道HTTPAdapter可以用来提供重试功能。但是什么是HTTPAdapter呢?引用文件:

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)

类requests.adapters。HTTPAdapter(pool_connections = 10,pool_maxsize = 10,max_retries = 0,pool_block = False)

The built-in HTTP Adapter for urllib3.

urllib3的内置HTTP适配器。

Provides a general-case interface for Requests sessions to contact HTTP and HTTPS urls by implementing the Transport Adapter interface. This class will usually be created by the Session class under the covers.

通过实现传输适配器接口,为连接HTTP和HTTPS url的请求会话提供通用接口。这个类通常是由会话类在幕后创建的。

Parameters:
* pool_connections – The number of urllib3 connection pools to cache. * pool_maxsize – The maximum number of connections to save in the pool. * max_retries(int) – The maximum number of retries each connection should attempt. Note, this applies only to failed DNS lookups, socket connections and connection timeouts, never to requests where data has made it to the server. By default, Requests does not retry failed connections. If you need granular control over the conditions under which we retry a request, import urllib3’s Retry class and pass that instead. * pool_block – Whether the connection pool should block for connections. Usage:

参数:* pool_connections—要缓存的urllib3连接池的数量。* pool_maxsize——要保存在池中的最大连接数。* max_retry (int) -每个连接应尝试的最大重试次数。注意,这只适用于失败的DNS查找、套接字连接和连接超时,从不适用于数据已发送到服务器的请求。默认情况下,请求不会重试失败的连接。如果您需要对重试请求的条件进行细粒度控制,那么导入urllib3的重试类,然后传递它。* pool_block——连接池是否应该阻塞连接。用法:

>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)

If the above documentation confuses you, here's my explanation: what HTTP Adapter does is simply providing different configurations for different requests according to target url. Remember the code above?

如果上面的文档让您感到困惑,我的解释是:HTTP适配器所做的只是根据目标url为不同的请求提供不同的配置。还记得上面的代码吗?

self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())

It creates two HTTPAdapter objects with the default argument pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False, and mount to https:// and http:// respectively, which means configuration of the first HTTPAdapter() will be used if you try to send a request to http://xxx, and the second HTTPAdapter() will be used for requests to https://xxx. Thought in this case the two configurations are the same, requests to http and https are still handled separately. We'll see what it means later.

它创建两个HTTPAdapter对象,默认参数pool_connections=10, pool_maxsize=10, max_retry =0, pool_block=False,分别挂载到https://和http://,这意味着如果您试图将请求发送到http://xxx,将使用第一个HTTPAdapter()的配置,第二个HTTPAdapter()用于请求。虽然在本例中,这两个配置是相同的,但是对http和https的请求仍然是分开处理的。我们稍后再看它的意思。

As I said, the main purpose of this article is to explain pool_connections and pool_maxsize.

如前所述,本文的主要目的是解释pool_connections和pool_maxsize。

First let's look at pool_connections. Yesterday I raised a question on * cause I'm not sure if my understanding is correct, the answer eliminates my uncertainty. HTTP, as we all know, is based on TCP protocol. An HTTP connection is also a TCP connection, which is identified by a tuple of five values:

首先来看pool_connections。昨天我提出了一个关于*的问题,因为我不确定我的理解是否正确,这个答案消除了我的不确定性。众所周知,HTTP是基于TCP协议的。HTTP连接也是TCP连接,由五个值的元组标识:

(<protocol>, <src addr>, <src port>, <dest addr>, <dest port>)

Say you've established an HTTP/TCP connection with www.example.com, assume the server supports Keep-Alive, next time you send request to www.example.com/a or www.example.com/b, you could just use the same connection cause none of the five values change. In fact, requests' Session automatically does this for you and will reuse connections as long as it can.

假设您已经与www.example.com建立了HTTP/TCP连接,假设服务器支持Keep-Alive,下次您向www.example.com/a或www.example.com/b时,您可以使用相同的连接,因为这五个值都没有变化。事实上,请求会话会自动为您做这些,并且会尽可能地重用连接。

The question is, what determines if you can reuse old connection or not? Yes, pool_connections!

问题是,什么决定了您是否可以重用旧连接?是的,pool_connections !

pool_connections – The number of urllib3 connection pools to cache.

pool_connections—要缓存的urllib3连接池的数量。

I know, I know, I don't want to brought so many terminologies either, this is the last one, I promise. For easy understanding, one connection pool corresponds to one host, that's what it is.

我知道,我知道,我也不想带这么多术语,这是最后一个,我保证。为了便于理解,一个连接池对应于一个主机,这就是它。

Here's an example(unrelated lines are ignored):

这里有一个例子(不相关的行被忽略):

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')

"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2621
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

HTTPAdapter(pool_connections=1) is mounted to https://, which means only one connection pool persists at a time. After calling s.get('https://www.baidu.com'), the cached connection pool is connectionpool('https://www.baidu.com'). Now s.get('https://www.zhihu.com') came, and the session found that it cannot use the previously cached connection because it's not the same host(one connection pool corresponds to one host, remember?). Therefore the session had to create a new connection pool, or connection if you would like. Since pool_connections=1, session cannot hold two connection pools at the same time, thus it abandoned the old one which is connectionpool('https://www.baidu.com') and kept the new one which is connectionpool('https://www.zhihu.com'). Next get is the same. This is why we see three Starting new HTTPS connection in logging.

HTTPAdapter(pool_connections=1)挂载到https://,这意味着每次只有一个连接池持续存在。调用s.get(‘https://www.baidu.com’)之后,缓存的连接池就是connectionpool(‘https://www.baidu.com’)。现在,s.get(“https://www.zhihu.com”)出现了,会话发现它不能使用先前缓存的连接,因为它不是同一个主机(一个连接池对应一个主机,记得吗?)因此,会话必须创建一个新的连接池,如果您愿意,也可以创建连接。由于pool_connections=1,会话不能同时容纳两个连接池,因此它放弃了原来的connectionpool(‘https://www.baidu.com’),保留了新的connectionpool(‘https://www.zhihu.com’)。下一个get是一样的。这就是为什么我们在日志中看到三个启动新的HTTPS连接的原因。

What if we set pool_connections to 2:

如果我们将pool_connections设置为2:

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=2))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

Great, now we only created connections twice and saved one connection establishing time.

很好,现在我们只创建了两次连接,并保存了一个连接建立时间。

Finally, pool_maxsize.

最后,pool_maxsize。

First and foremost, you should be caring about pool_maxsize only if you use Session in a multithreaded environment, like making concurrent requests from multiple threads using the same Session.

首先,只有在多线程环境中使用会话时,您才应该关注pool_maxsize,比如使用同一个会话从多个线程发出并发请求。

Actually, pool_maxsize is an argument for initializing urllib3's HTTPConnectionPool, which is exactly the connection pool we mentioned above. HTTPConnectionPool is a container for a collection of connections to a specific host, and pool_maxsize is the number of connections to save that can be reused. If you're running your code in one thread, it's neither possible or needed to create multiple connections to the same host, cause requests library is blocking, so that HTTP request are always sent one after another.

实际上,pool_maxsize是初始化urllib3的HTTPConnectionPool的参数,它正是我们上面提到的连接池。HTTPConnectionPool是指向特定主机的连接集合的容器,pool_maxsize是要保存的可重用连接的数量。如果您在一个线程中运行代码,则不可能或不需要创建到同一主机的多个连接,因为请求库阻塞,所以HTTP请求总是一个接一个地发送。

Things are different if there are multiple threads.

如果有多个线程,情况就不同了。

def thread_get(url):
    s.get(url)

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
"""

See? It established two connections for the same host www.zhihu.com, like I said, this can only happen in a multithreaded environment. In this case, we create a connectionpool with pool_maxsize=2, and there're no more than two connections at the same time, so it's enough. We can see that requests from t3 and t4 did not create new connections, they reused the old ones.

看到了吗?它为同一个主机www.zhihu.com建立了两个连接,就像我说的,这只能在多线程环境中实现。在本例中,我们使用pool_maxsize=2创建了一个connectionpool,同时不超过两个连接,所以这就足够了。我们可以看到来自t3和t4的请求没有创建新的连接,它们重用旧的连接。

What if there's not enough size?

如果没有足够的尺寸呢?

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start()
t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (3): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
"""

Now, pool_maxsize=1,warning came as expected:

现在,pool_maxsize = 1,警告之际,预期:

Connection pool is full, discarding connection: www.zhihu.com

We can also noticed that since only one connection can be saved in this pool, a new connection is created again for t3 or t4. Obviously this is very inefficient. That's why in urllib3's documentation it says:

我们还可以注意到,由于在这个池中只能保存一个连接,因此将再次为t3或t4创建一个新的连接。显然这是非常低效的。这就是为什么在urllib3的文件中说:

If you’re planning on using such a pool in a multithreaded environment, you should set the maxsize of the pool to a higher number, such as the number of threads.

如果您打算在多线程环境中使用这样一个池,那么应该将该池的maxsize设置为一个更高的数字,例如线程的数量。

Last but not least, HTTPAdapter instances mounted to different prefix are independent.

最后但并非最不重要的是,挂载到不同前缀的HTTPAdapter实例是独立的。

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
s.mount('https://baidu.com', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 =Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57669
"""

The above code is easy to understand so I don't explain.

上面的代码很容易理解,所以我不解释。

I guess that's all. Hope this article help you understand requests better. BTW I created a gist here which contains all of the testing code used in this article. Just download and play with it :)

我想这是所有人。希望本文能帮助您更好地理解请求。顺便说一下,我在这里创建了一个包含本文中使用的所有测试代码的要点。只要下载并试用:)

Appendix

  1. For https, requests uses urllib3's HTTPSConnectionPool, but it's pretty much the same as HTTPConnectionPool so I don't differeniate them in this article.
  2. 对于https,请求使用urllib3的HTTPSConnectionPool,但它与HTTPConnectionPool几乎相同,因此我在本文中没有对它们进行区分。
  3. Session's mount method will ensure the longest prefix gets matched first. Its implementation is pretty interesting so I posted it here.

    会话的挂载方法将确保首先匹配最长的前缀。它的实现非常有趣,所以我在这里发布了它。

    def mount(self, prefix, adapter):
        """Registers a connection adapter to a prefix.
        Adapters are sorted in descending order by key length."""
        self.adapters[prefix] = adapter
        keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
        for key in keys_to_move:
            self.adapters[key] = self.adapters.pop(key)
    

    Note that self.adapters is an OrderedDict.

    注意自我。适配器是一种OrderedDict。

#1


7  

Requests uses urllib3 to manage its connections and other features.

请求使用urllib3管理其连接和其他特性。

Re-using connections is an important factor in keeping recurring HTTP requests performant. The urllib3 README explains:

重用连接是保持反复出现的HTTP请求性能的重要因素。urllib3 README解释道:

Why do I want to reuse connections?

为什么我要重用连接?

Performance. When you normally do a urllib call, a separate socket connection is created with each request. By reusing existing sockets (supported since HTTP 1.1), the requests will take up less resources on the server's end, and also provide a faster response time at the client's end. [...]

的性能。当您通常执行urllib调用时,将为每个请求创建单独的套接字连接。通过重用现有套接字(自HTTP 1.1以来得到支持),请求将占用服务器端更少的资源,并在客户端提供更快的响应时间。[…]

To answer your question, "pool_maxsize" is the number of connections to keep around per host (this is useful for multi-threaded applications), whereas "pool_connections" is the number of host-pools to keep around. For example, if you're connecting to 100 different hosts, and pool_connections=10, then only the latest 10 hosts' connections will be re-used.

要回答您的问题,“pool_maxsize”是每个主机保持的连接数量(这对于多线程应用程序非常有用),而“pool_connections”是要保持的主机池数量。例如,如果您正在连接100个不同的主机,并且pool_connections=10,那么将只重用最近的10个主机的连接。

#2


16  

I wrote an article about this. pasted it here:

我写了一篇关于这个的文章。粘贴在这里:

Requests' secret: pool_connections and pool_maxsize

Requests is one of the, if not the most well-known Python third-party library for Python programmers. With its simple API and high performance, people tend to use requests instead of urllib2 provided by standard library for HTTP requests. However, people who use requests every day may not know the internals, and today I want to introduce two of them: pool_connections and pool_maxsize.

请求是Python程序员最知名的Python第三方库之一。由于其简单的API和高性能,人们倾向于使用请求而不是标准库为HTTP请求提供的urllib2。但是,每天使用请求的人可能不知道内部信息,今天我将介绍其中的两个:pool_connections和pool_maxsize。

Let's start with Session:

让我们开始会议:

import requests

s = requests.Session()
s.get('https://www.google.com')

It's pretty simple. You probably know requests' Session can persists cookie. Cool. But do you know Session has a mount method?

这很简单。您可能知道请求的会话可以持久化cookie。酷。但是你知道Session有挂载方法吗?

mount(prefix, adapter)
Registers a connection adapter to a prefix.
Adapters are sorted in descending order by key length.

挂载(前缀、适配器)将连接适配器注册到前缀。适配器按关键长度按降序排序。

No? Well, in fact you've already used this method when you initialize a Session object:

没有?实际上,你在初始化会话对象时已经使用了这个方法:

class Session(SessionRedirectMixin):

    def __init__(self):
        ...
        # Default connection adapters.
        self.adapters = OrderedDict()
        self.mount('https://', HTTPAdapter())
        self.mount('http://', HTTPAdapter())

Now comes the interesting part. If you've read Ian Cordasco's article Retries in Requests, you should know that HTTPAdapter can be used to provide retry functionality. But what is an HTTPAdapter really? Quote from doc:

现在有趣的部分来了。如果您读过Ian Cordasco在请求中重试的文章,您应该知道HTTPAdapter可以用来提供重试功能。但是什么是HTTPAdapter呢?引用文件:

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)

类requests.adapters。HTTPAdapter(pool_connections = 10,pool_maxsize = 10,max_retries = 0,pool_block = False)

The built-in HTTP Adapter for urllib3.

urllib3的内置HTTP适配器。

Provides a general-case interface for Requests sessions to contact HTTP and HTTPS urls by implementing the Transport Adapter interface. This class will usually be created by the Session class under the covers.

通过实现传输适配器接口,为连接HTTP和HTTPS url的请求会话提供通用接口。这个类通常是由会话类在幕后创建的。

Parameters:
* pool_connections – The number of urllib3 connection pools to cache. * pool_maxsize – The maximum number of connections to save in the pool. * max_retries(int) – The maximum number of retries each connection should attempt. Note, this applies only to failed DNS lookups, socket connections and connection timeouts, never to requests where data has made it to the server. By default, Requests does not retry failed connections. If you need granular control over the conditions under which we retry a request, import urllib3’s Retry class and pass that instead. * pool_block – Whether the connection pool should block for connections. Usage:

参数:* pool_connections—要缓存的urllib3连接池的数量。* pool_maxsize——要保存在池中的最大连接数。* max_retry (int) -每个连接应尝试的最大重试次数。注意,这只适用于失败的DNS查找、套接字连接和连接超时,从不适用于数据已发送到服务器的请求。默认情况下,请求不会重试失败的连接。如果您需要对重试请求的条件进行细粒度控制,那么导入urllib3的重试类,然后传递它。* pool_block——连接池是否应该阻塞连接。用法:

>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)

If the above documentation confuses you, here's my explanation: what HTTP Adapter does is simply providing different configurations for different requests according to target url. Remember the code above?

如果上面的文档让您感到困惑,我的解释是:HTTP适配器所做的只是根据目标url为不同的请求提供不同的配置。还记得上面的代码吗?

self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())

It creates two HTTPAdapter objects with the default argument pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False, and mount to https:// and http:// respectively, which means configuration of the first HTTPAdapter() will be used if you try to send a request to http://xxx, and the second HTTPAdapter() will be used for requests to https://xxx. Thought in this case the two configurations are the same, requests to http and https are still handled separately. We'll see what it means later.

它创建两个HTTPAdapter对象,默认参数pool_connections=10, pool_maxsize=10, max_retry =0, pool_block=False,分别挂载到https://和http://,这意味着如果您试图将请求发送到http://xxx,将使用第一个HTTPAdapter()的配置,第二个HTTPAdapter()用于请求。虽然在本例中,这两个配置是相同的,但是对http和https的请求仍然是分开处理的。我们稍后再看它的意思。

As I said, the main purpose of this article is to explain pool_connections and pool_maxsize.

如前所述,本文的主要目的是解释pool_connections和pool_maxsize。

First let's look at pool_connections. Yesterday I raised a question on * cause I'm not sure if my understanding is correct, the answer eliminates my uncertainty. HTTP, as we all know, is based on TCP protocol. An HTTP connection is also a TCP connection, which is identified by a tuple of five values:

首先来看pool_connections。昨天我提出了一个关于*的问题,因为我不确定我的理解是否正确,这个答案消除了我的不确定性。众所周知,HTTP是基于TCP协议的。HTTP连接也是TCP连接,由五个值的元组标识:

(<protocol>, <src addr>, <src port>, <dest addr>, <dest port>)

Say you've established an HTTP/TCP connection with www.example.com, assume the server supports Keep-Alive, next time you send request to www.example.com/a or www.example.com/b, you could just use the same connection cause none of the five values change. In fact, requests' Session automatically does this for you and will reuse connections as long as it can.

假设您已经与www.example.com建立了HTTP/TCP连接,假设服务器支持Keep-Alive,下次您向www.example.com/a或www.example.com/b时,您可以使用相同的连接,因为这五个值都没有变化。事实上,请求会话会自动为您做这些,并且会尽可能地重用连接。

The question is, what determines if you can reuse old connection or not? Yes, pool_connections!

问题是,什么决定了您是否可以重用旧连接?是的,pool_connections !

pool_connections – The number of urllib3 connection pools to cache.

pool_connections—要缓存的urllib3连接池的数量。

I know, I know, I don't want to brought so many terminologies either, this is the last one, I promise. For easy understanding, one connection pool corresponds to one host, that's what it is.

我知道,我知道,我也不想带这么多术语,这是最后一个,我保证。为了便于理解,一个连接池对应于一个主机,这就是它。

Here's an example(unrelated lines are ignored):

这里有一个例子(不相关的行被忽略):

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')

"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2621
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

HTTPAdapter(pool_connections=1) is mounted to https://, which means only one connection pool persists at a time. After calling s.get('https://www.baidu.com'), the cached connection pool is connectionpool('https://www.baidu.com'). Now s.get('https://www.zhihu.com') came, and the session found that it cannot use the previously cached connection because it's not the same host(one connection pool corresponds to one host, remember?). Therefore the session had to create a new connection pool, or connection if you would like. Since pool_connections=1, session cannot hold two connection pools at the same time, thus it abandoned the old one which is connectionpool('https://www.baidu.com') and kept the new one which is connectionpool('https://www.zhihu.com'). Next get is the same. This is why we see three Starting new HTTPS connection in logging.

HTTPAdapter(pool_connections=1)挂载到https://,这意味着每次只有一个连接池持续存在。调用s.get(‘https://www.baidu.com’)之后,缓存的连接池就是connectionpool(‘https://www.baidu.com’)。现在,s.get(“https://www.zhihu.com”)出现了,会话发现它不能使用先前缓存的连接,因为它不是同一个主机(一个连接池对应一个主机,记得吗?)因此,会话必须创建一个新的连接池,如果您愿意,也可以创建连接。由于pool_connections=1,会话不能同时容纳两个连接池,因此它放弃了原来的connectionpool(‘https://www.baidu.com’),保留了新的connectionpool(‘https://www.zhihu.com’)。下一个get是一样的。这就是为什么我们在日志中看到三个启动新的HTTPS连接的原因。

What if we set pool_connections to 2:

如果我们将pool_connections设置为2:

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=2))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

Great, now we only created connections twice and saved one connection establishing time.

很好,现在我们只创建了两次连接,并保存了一个连接建立时间。

Finally, pool_maxsize.

最后,pool_maxsize。

First and foremost, you should be caring about pool_maxsize only if you use Session in a multithreaded environment, like making concurrent requests from multiple threads using the same Session.

首先,只有在多线程环境中使用会话时,您才应该关注pool_maxsize,比如使用同一个会话从多个线程发出并发请求。

Actually, pool_maxsize is an argument for initializing urllib3's HTTPConnectionPool, which is exactly the connection pool we mentioned above. HTTPConnectionPool is a container for a collection of connections to a specific host, and pool_maxsize is the number of connections to save that can be reused. If you're running your code in one thread, it's neither possible or needed to create multiple connections to the same host, cause requests library is blocking, so that HTTP request are always sent one after another.

实际上,pool_maxsize是初始化urllib3的HTTPConnectionPool的参数,它正是我们上面提到的连接池。HTTPConnectionPool是指向特定主机的连接集合的容器,pool_maxsize是要保存的可重用连接的数量。如果您在一个线程中运行代码,则不可能或不需要创建到同一主机的多个连接,因为请求库阻塞,所以HTTP请求总是一个接一个地发送。

Things are different if there are multiple threads.

如果有多个线程,情况就不同了。

def thread_get(url):
    s.get(url)

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
"""

See? It established two connections for the same host www.zhihu.com, like I said, this can only happen in a multithreaded environment. In this case, we create a connectionpool with pool_maxsize=2, and there're no more than two connections at the same time, so it's enough. We can see that requests from t3 and t4 did not create new connections, they reused the old ones.

看到了吗?它为同一个主机www.zhihu.com建立了两个连接,就像我说的,这只能在多线程环境中实现。在本例中,我们使用pool_maxsize=2创建了一个connectionpool,同时不超过两个连接,所以这就足够了。我们可以看到来自t3和t4的请求没有创建新的连接,它们重用旧的连接。

What if there's not enough size?

如果没有足够的尺寸呢?

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start()
t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (3): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
"""

Now, pool_maxsize=1,warning came as expected:

现在,pool_maxsize = 1,警告之际,预期:

Connection pool is full, discarding connection: www.zhihu.com

We can also noticed that since only one connection can be saved in this pool, a new connection is created again for t3 or t4. Obviously this is very inefficient. That's why in urllib3's documentation it says:

我们还可以注意到,由于在这个池中只能保存一个连接,因此将再次为t3或t4创建一个新的连接。显然这是非常低效的。这就是为什么在urllib3的文件中说:

If you’re planning on using such a pool in a multithreaded environment, you should set the maxsize of the pool to a higher number, such as the number of threads.

如果您打算在多线程环境中使用这样一个池,那么应该将该池的maxsize设置为一个更高的数字,例如线程的数量。

Last but not least, HTTPAdapter instances mounted to different prefix are independent.

最后但并非最不重要的是,挂载到不同前缀的HTTPAdapter实例是独立的。

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
s.mount('https://baidu.com', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 =Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57669
"""

The above code is easy to understand so I don't explain.

上面的代码很容易理解,所以我不解释。

I guess that's all. Hope this article help you understand requests better. BTW I created a gist here which contains all of the testing code used in this article. Just download and play with it :)

我想这是所有人。希望本文能帮助您更好地理解请求。顺便说一下,我在这里创建了一个包含本文中使用的所有测试代码的要点。只要下载并试用:)

Appendix

  1. For https, requests uses urllib3's HTTPSConnectionPool, but it's pretty much the same as HTTPConnectionPool so I don't differeniate them in this article.
  2. 对于https,请求使用urllib3的HTTPSConnectionPool,但它与HTTPConnectionPool几乎相同,因此我在本文中没有对它们进行区分。
  3. Session's mount method will ensure the longest prefix gets matched first. Its implementation is pretty interesting so I posted it here.

    会话的挂载方法将确保首先匹配最长的前缀。它的实现非常有趣,所以我在这里发布了它。

    def mount(self, prefix, adapter):
        """Registers a connection adapter to a prefix.
        Adapters are sorted in descending order by key length."""
        self.adapters[prefix] = adapter
        keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
        for key in keys_to_move:
            self.adapters[key] = self.adapters.pop(key)
    

    Note that self.adapters is an OrderedDict.

    注意自我。适配器是一种OrderedDict。