套接字recv（）在MSG_WAITALL的大消息上挂起

I have an application that reads large files from a server and hangs frequently on a particular machine. It has worked successfully under RHEL5.2 for a long time. We have recently upgraded to RHEL6.1 and it now hangs regularly.

我有一个应用程序从服务器读取大文件并经常挂起在特定的机器上。它在RHEL5.2下成功运行了很长时间。我们最近升级到RHEL6.1，它现在定期挂起。

I have created a test app that reproduces the problem. It hangs approx 98 times out of 100.

我创建了一个可以重现问题的测试应用程序。它在100个中挂起约98次。

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/time.h>

int mFD = 0;

void open_socket()
{
  struct addrinfo hints, *res;
  memset(&hints, 0, sizeof(hints));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_family = AF_INET;

  if (getaddrinfo("localhost", "60000", &hints, &res) != 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  mFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

  if (mFD == -1)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  if (connect(mFD, res->ai_addr, res->ai_addrlen) < 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  freeaddrinfo(res);
}

void read_message(int size, void* data)
{
  int bytesLeft = size;
  int numRd = 0;

  while (bytesLeft != 0)
  {
    fprintf(stderr, "reading %d bytes\n", bytesLeft);

    /* Replacing MSG_WAITALL with 0 works fine */
    int num = recv(mFD, data, bytesLeft, MSG_WAITALL);

    if (num == 0)
    {
      break;
    }
    else if (num < 0 && errno != EINTR)
    {
      fprintf(stderr, "Exit %d\n", __LINE__);
      exit(1);
    }
    else if (num > 0)
    {
      numRd += num;
      data += num;
      bytesLeft -= num;
      fprintf(stderr, "read %d bytes - remaining = %d\n", num, bytesLeft);
    }
  }

  fprintf(stderr, "read total of %d bytes\n", numRd);
}

int main(int argc, char **argv)
{
  open_socket();

  uint32_t raw_len = atoi(argv[1]);
  char raw[raw_len];

  read_message(raw_len, raw);

  return 0;
}

Some notes from my testing:

我测试的一些注意事项：

If "localhost" maps to the loopback address 127.0.0.1, the app hangs on the call to recv() and NEVER returns.
如果“localhost”映射到环回地址127.0.0.1，应用程序将挂起对recv（）的调用并且NEVER返回。
If "localhost" maps to the ip of the machine, thus routing the packets via the ethernet interface, the app completes successfully.
如果“localhost”映射到计算机的ip，从而通过以太网接口路由数据包，则应用程序成功完成。
When I experience a hang, the server sends a "TCP Window Full" message, and the client responds with a "TCP ZeroWindow" message (see image and attached tcpdump capture). From this point, it hangs forever with the server sending keep-alives and the client sending ZeroWindow messages. The client never seems to expand its window, allowing the transfer to complete.
当我遇到挂起时，服务器发送“TCP Window Full”消息，客户端响应“TCP ZeroWindow”消息（参见图像并附上tcpdump capture）。从这一点来看，它会永远挂起，服务器发送keep-alives，客户端发送ZeroWindow消息。客户端似乎永远不会扩展其窗口，允许传输完成。
During the hang, if I examine the output of "netstat -a", there is data in the servers send queue but the clients receive queue is empty.
在挂起期间，如果我检查“netstat -a”的输出，则服务器发送队列中有数据，但客户端接收队列为空。
If I remove the MSG_WAITALL flag from the recv() call, the app completes successfully.
如果我从recv（）调用中删除MSG_WAITALL标志，则应用程序成功完成。
The hanging issue only arises using the loopback interface on 1 particular machine. I suspect this may all be related to timing dependencies.
悬挂问题仅在1台特定机器上使用环回接口时出现。我怀疑这可能都与时序依赖性有关。
As I drop the size of the 'file', the likelihood of the hang occurring is reduced
当我删除'文件'的大小时，挂起的可能性就会降低

The source for the test app can be found here:

测试应用程序的源代码可以在这里找到：

Socket test source

套接字测试源

The tcpdump capture from the loopback interface can be found here:

可以在此处找到loopback接口的tcpdump捕获：

tcpdump capture

tcpdump捕获

I reproduce the issue by issuing the following commands:

我通过发出以下命令重现该问题：

>  gcc socket_test.c -o socket_test
>  perl -e 'for (1..6000000){ print "a" }' | nc -l 60000
>  ./socket_test 6000000

This sees 6000000 bytes sent to the test app which tries to read the data using a single call to recv().

这看到发送到测试应用程序的6000000个字节，它试图使用一次调用recv（）来读取数据。

I would love to hear any suggestions on what I might be doing wrong or any further ways to debug the issue.

我很乐意听到有关我可能做错的建议或任何进一步调试问题的方法。

2 个解决方案

#1

MSG_WAITALL should block until all data has been received. From the manual page on recv:

MSG_WAITALL应该阻塞，直到收到所有数据。从recv的手册页：

This flag requests that the operation block until the full request is satisfied.

该标志请求操作块直到满足完整请求。

However, the buffers in the network stack probably are not large enough to contain everything, which is the reason for the error messages on the server. The client network stack simply can't hold that much data.

但是，网络堆栈中的缓冲区可能不足以容纳所有内容，这就是服务器上出现错误消息的原因。客户端网络堆栈根本无法容纳那么多数据。

The solution is either to increase the buffer sizes (SO_RCVBUF option to setsockopt), split the message into smaller pieces, or receiving smaller chunks putting it into your own buffer. The last is what I would recommend.

解决方案是增加缓冲区大小（SO_RCVBUF选项到setsockopt），将消息拆分成更小的块，或者接收更小的块将它放入自己的缓冲区。最后是我推荐的。

Edit: I see in your code that you already do what I suggested (read smaller chunks with own buffering,) so just remove the MSG_WAITALL flag and it should work.

编辑：我在你的代码中看到你已经完成了我的建议（用自己的缓冲读取较小的块），所以只需删除MSG_WAITALL标志即可。

Oh, and when recv returns zero, that means the other end have closed the connection, and that you should do it too.

哦，当recv返回零时，这意味着另一端关闭了连接，你也应该这样做。

#2

Consider these two possible rules:

考虑以下两个可能的规则：

The receiver may wait for the sender to send more before receiving what has already been sent.

接收方可以在接收已发送的内容之前等待发送方发送更多内容。
The sender may wait for the receiver to receive what has already been sent before sending more.

发送方可以在发送更多内容之前等待接收方接收已发送的内容。

We can have either of these rules, but we cannot have both of these rules.

我们可以有这些规则中的任何一个，但我们不能同时拥有这两个规则。

Why? Because if the receiver is permitted to wait for the sender, that means the sender cannot wait for the receiver to receive before sending more, otherwise we deadlock. And if the sender is permitted to wait for the receiver, that means the receiver cannot wait for the sender to send before receiving more, otherwise we deadlock.

为什么？因为如果允许接收方等待发送方，这意味着发送方在发送更多之前不能等待接收方接收，否则我们就会死锁。如果允许发送方等待接收方，则意味着接收方无法等待发送方在发送之前发送更多内容，否则我们会死锁。

If both of these things happen at the same time, we deadlock. The sender will not send more until the receiver receives what has already been sent, and the receiver will not receive what has already been sent unless the sender send more. Boom.

如果这两件事同时发生，我们就陷入僵局。在接收方收到已发送的内容之前，发送方不会发送更多内容，除非发送方发送更多内容，否则接收方将不会收到已发送的内容。繁荣。

TCP chooses rule 2 (for reasons that should be obvious). Thus it cannot support rule 1. But in your code, you are the receiver, and you are waiting for the sender to send more before you receive what has already been sent. So this will deadlock.

TCP选择规则2（原因应该是显而易见的）。因此它不支持规则1.但是在您的代码中，您是接收者，并且您在等待发送者收到已发送的内容之前发送更多内容。所以这将陷入僵局。

#1