summary of the problem
we are having a setup wherein a lot(800 to 2400 per second( of incoming connections to a linux box and we have a NAT device between the client and server. so there are so many TIME_WAIT sockets left in the system. To overcome that we had set tcp_tw_recycle to 1, but that led to drop of in comming connections. after browsing through the net we did find the references for why the dropping of frames with tcp_tw_recycle and NAT device happens.
我们有一个设置,其中很多(每秒800到2400(到linux盒的传入连接,我们在客户端和服务器之间有一个NAT设备。所以系统中有很多TIME_WAIT套接字。为了克服这个问题,我们将tcp_tw_recycle设置为1,但这导致了连接的丢失。在浏览网络后,我们确实找到了为什么使用tcp_tw_recycle和NAT设备丢弃帧的参考。
resolution tried
we then tried by setting tcp_tw_reuse to 1 it worked fine without any issues with the same setup and configuration.
然后我们尝试将tcp_tw_reuse设置为1,它运行正常,没有任何相同的设置和配置问题。
But the documentation says that tcp_tw_recycle and tcp_tw_reuse should not be used when the Connections that go through TCP state aware nodes, such as firewalls, NAT devices or load balancers may see dropped frames. The more connections there are, the more likely you will see this issue.
但是文档说当通过TCP状态感知节点的连接(例如防火墙,NAT设备或负载平衡器)可能会看到丢帧时,不应使用tcp_tw_recycle和tcp_tw_reuse。连接越多,您就越有可能看到此问题。
Queries
1) can tcp_tw_reuse be used in this type of scenarios? 2) if not, which part of the linux code is preventing tcp_tw_reuse being used for such scenario? 3) generally what is the difference between tcp_tw_recycle and tcp_tw_reuse?
1)可以在这种情况下使用tcp_tw_reuse吗? 2)如果没有,linux代码的哪一部分阻止tcp_tw_reuse用于这种情况? 3)一般tcp_tw_recycle和tcp_tw_reuse有什么区别?
1 个解决方案
#1
48
By default, when both tcp_tw_reuse
and tcp_tw_recycle
are disabled, the kernel will make sure that sockets in TIME_WAIT
state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
默认情况下,当tcp_tw_reuse和tcp_tw_recycle都被禁用时,内核将确保处于TIME_WAIT状态的套接字将保持足够长的时间 - 足够长以确保属于未来连接的数据包不会被误认为是未来的数据包。旧连接。
When you enable tcp_tw_reuse
, sockets in TIME_WAIT
state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps
(a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
启用tcp_tw_reuse时,可以在TIME_WAIT状态到期之前使用套接字,内核将尝试确保没有关于TCP序列号的冲突。如果启用tcp_timestamps(a.k.a. PAWS,用于防止包装序列号),它将确保不会发生这些冲突。但是,您需要在两端启用TCP时间戳(至少,这是我的理解)。有关详细信息,请参阅tcp_twsk_unique的定义。
When you enable tcp_tw_recycle
, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT
state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN
packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
启用tcp_tw_recycle后,内核会变得更加激进,并会对远程主机使用的时间戳进行假设。它将跟踪具有TIME_WAIT状态连接的每个远程主机使用的最后一个时间戳,并允许在时间戳正确增加时重新使用套接字。但是,如果主机使用的时间戳发生更改(即及时扭转),则SYN数据包将以静默方式丢弃,并且连接将无法建立(您将看到类似于“connect timeout”的错误)。如果你想深入研究内核代码,tcp_timewait_state_process的定义可能是一个很好的起点。
Now, timestamps should never go back in time; unless:
现在,时间戳永远不会回溯;除非:
- the host is rebooted (but then, by the time it comes back up,
TIME_WAIT
socket will probably have expired, so it will be a non issue); - 主机重新启动(但是,当它重新启动时,TIME_WAIT套接字可能已经过期,因此它将是一个非问题);
- the IP address is quickly reused by something else (
TIME_WAIT
connections will stay a bit, but other connections will probably be struck byTCP RST
and that will free up some space); - IP地址可以被其他东西快速重用(TIME_WAIT连接会保持一点,但是其他连接可能会被TCP RST攻击并释放一些空间);
- network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
- 网络地址转换(或智能裤子防火墙)涉及连接的中间。
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT
bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
在后一种情况下,您可以在同一IP地址后面有多个主机,因此,不同的时间戳序列(或者,防火墙在每个连接上随机化所述时间戳)。在这种情况下,某些主机将随机无法连接,因为它们映射到服务器的TIME_WAIT存储桶具有较新时间戳的端口。这就是为什么文档告诉你“NAT设备或负载平衡器可能因为设置而开始丢帧”。
Some people recommend to leave tcp_tw_recycle
alone, but enable tcp_tw_reuse
and lower tcp_fin_timeout
. I concur :-)
有些人建议单独保留tcp_tw_recycle,但启用tcp_tw_reuse并降低tcp_fin_timeout。我同意 :-)
#1
48
By default, when both tcp_tw_reuse
and tcp_tw_recycle
are disabled, the kernel will make sure that sockets in TIME_WAIT
state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
默认情况下,当tcp_tw_reuse和tcp_tw_recycle都被禁用时,内核将确保处于TIME_WAIT状态的套接字将保持足够长的时间 - 足够长以确保属于未来连接的数据包不会被误认为是未来的数据包。旧连接。
When you enable tcp_tw_reuse
, sockets in TIME_WAIT
state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps
(a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
启用tcp_tw_reuse时,可以在TIME_WAIT状态到期之前使用套接字,内核将尝试确保没有关于TCP序列号的冲突。如果启用tcp_timestamps(a.k.a. PAWS,用于防止包装序列号),它将确保不会发生这些冲突。但是,您需要在两端启用TCP时间戳(至少,这是我的理解)。有关详细信息,请参阅tcp_twsk_unique的定义。
When you enable tcp_tw_recycle
, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT
state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN
packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
启用tcp_tw_recycle后,内核会变得更加激进,并会对远程主机使用的时间戳进行假设。它将跟踪具有TIME_WAIT状态连接的每个远程主机使用的最后一个时间戳,并允许在时间戳正确增加时重新使用套接字。但是,如果主机使用的时间戳发生更改(即及时扭转),则SYN数据包将以静默方式丢弃,并且连接将无法建立(您将看到类似于“connect timeout”的错误)。如果你想深入研究内核代码,tcp_timewait_state_process的定义可能是一个很好的起点。
Now, timestamps should never go back in time; unless:
现在,时间戳永远不会回溯;除非:
- the host is rebooted (but then, by the time it comes back up,
TIME_WAIT
socket will probably have expired, so it will be a non issue); - 主机重新启动(但是,当它重新启动时,TIME_WAIT套接字可能已经过期,因此它将是一个非问题);
- the IP address is quickly reused by something else (
TIME_WAIT
connections will stay a bit, but other connections will probably be struck byTCP RST
and that will free up some space); - IP地址可以被其他东西快速重用(TIME_WAIT连接会保持一点,但是其他连接可能会被TCP RST攻击并释放一些空间);
- network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
- 网络地址转换(或智能裤子防火墙)涉及连接的中间。
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT
bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
在后一种情况下,您可以在同一IP地址后面有多个主机,因此,不同的时间戳序列(或者,防火墙在每个连接上随机化所述时间戳)。在这种情况下,某些主机将随机无法连接,因为它们映射到服务器的TIME_WAIT存储桶具有较新时间戳的端口。这就是为什么文档告诉你“NAT设备或负载平衡器可能因为设置而开始丢帧”。
Some people recommend to leave tcp_tw_recycle
alone, but enable tcp_tw_reuse
and lower tcp_fin_timeout
. I concur :-)
有些人建议单独保留tcp_tw_recycle,但启用tcp_tw_reuse并降低tcp_fin_timeout。我同意 :-)