什么常见的编程错误会导致在epoll边缘触发模式下卡住CLOSE_WAIT?

时间:2021-10-16 13:00:07

I'm wondering what common programming situations/bugs might cause a server process I have enter into CLOSE_WAIT but not actually close the socket.

我想知道常见的编程情况/错误可能会导致我进入CLOSE_WAIT而不是实际关闭套接字的服务器进程。

What I'm wanting to do is trigger this situation so that I can fix it. In a normal development environment I've not been able to trigger it, but the same code used on a live server is occasionally getting them so that after many many days we have hundreds of them.

我想要做的是触发这种情况,以便我可以解决它。在正常的开发环境中,我无法触发它,但是在实时服务器上使用的相同代码偶尔会得到它们,以便在很多天后我们有数百个。

Googling for close_wait and it actually seems to be a very common problem, even in mature and supposedly well written services like nginx.

谷歌搜索close_wait,它实际上似乎是一个非常普遍的问题,即使在成熟的,据称写得很好的服务,如nginx。

1 个解决方案

#1


1  

CLOSE_WAIT is basically when the remote end shut down the socket but the local application has not yet invoked a close() on it. This is usually happens when you are not expecting to read data from the socket and thus aren't watching it for readability.

CLOSE_WAIT基本上是当远程端关闭套接字但本地应用程序尚未调用close()时。这通常发生在您不期望从套接字读取数据并因此不会为了可读性而观察它时。

Many applications for convenience sake will always monitor a socket for readability to detect a close.

为方便起见,许多应用程序将始终监视套接字以便于检测关闭。

A scenario to try out is this:

尝试的方案是这样的:

  1. Peer sends 2k of data and immediately closes the data
  2. Peer发送2k数据并立即关闭数据
  3. Your socket is then registered with epoll and gets a notification for readability
  4. 然后,您的套接字将在epoll中注册,并获取可读性通知
  5. Your application only reads 1k of data
  6. 您的应用程序只读取1k数据
  7. You stop monitoring the socket for readability
  8. 您停止监视套接字是否可读
  9. (I'm not sure if edge-triggered epoll will end up delivering the shutdown event as a separate event).
  10. (我不确定边缘触发的epoll是否会最终将关闭事件作为单独的事件发送)。

See also:

也可以看看:

(from man epoll_ctl)

(来自man epoll_ctl)

EPOLLRDHUP (since Linux 2.6.17) Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

EPOLLRDHUP(自Linux 2.6.17起)流套接字对等关闭连接,或关闭写入一半的连接。 (当使用边沿触发监视时,此标志对于编写简单代码以检测对等关闭特别有用。)

#1


1  

CLOSE_WAIT is basically when the remote end shut down the socket but the local application has not yet invoked a close() on it. This is usually happens when you are not expecting to read data from the socket and thus aren't watching it for readability.

CLOSE_WAIT基本上是当远程端关闭套接字但本地应用程序尚未调用close()时。这通常发生在您不期望从套接字读取数据并因此不会为了可读性而观察它时。

Many applications for convenience sake will always monitor a socket for readability to detect a close.

为方便起见,许多应用程序将始终监视套接字以便于检测关闭。

A scenario to try out is this:

尝试的方案是这样的:

  1. Peer sends 2k of data and immediately closes the data
  2. Peer发送2k数据并立即关闭数据
  3. Your socket is then registered with epoll and gets a notification for readability
  4. 然后,您的套接字将在epoll中注册,并获取可读性通知
  5. Your application only reads 1k of data
  6. 您的应用程序只读取1k数据
  7. You stop monitoring the socket for readability
  8. 您停止监视套接字是否可读
  9. (I'm not sure if edge-triggered epoll will end up delivering the shutdown event as a separate event).
  10. (我不确定边缘触发的epoll是否会最终将关闭事件作为单独的事件发送)。

See also:

也可以看看:

(from man epoll_ctl)

(来自man epoll_ctl)

EPOLLRDHUP (since Linux 2.6.17) Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

EPOLLRDHUP(自Linux 2.6.17起)流套接字对等关闭连接,或关闭写入一半的连接。 (当使用边沿触发监视时,此标志对于编写简单代码以检测对等关闭特别有用。)