从C/ c++中的TCP套接字读取数据的正确方法是什么?

时间:2023-01-19 16:25:32

Here's my code:

这是我的代码:

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
    cout << "Receiving chunk... ";

    // Read a bit at a time, eventually "end" string will be received.
    bzero(buffer, BUFFER_SIZE);
    int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    if (readResult < 0)
    {
        THROW_VIMRID_EX("Could not read from socket.");
    }

    // Concatenate the received data to the existing data.
    readStream << buffer;

    // Continue reading while end is not found.
    readData = readStream.str().find("end;") == string::npos;

    cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

It's a little bit of C and C++ as you can tell. The BUFFER_SIZE is 256 - should I just increase the size? If so, what to? Does it matter?

它有点像C和c++。BUFFER_SIZE是256 -应该只增加大小吗?如果有,是什么?这有关系吗?

I know that if "end" is not received for what ever reason, this will be an endless loop, which is bad - so if you could suggest a better way, please also do so.

我知道如果因为任何原因没有收到“end”,这将是一个无休止的循环,这是不好的-所以如果你能提出一个更好的方法,请也这么做。

6 个解决方案

#1


29  

Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

在不了解完整的应用程序的情况下,很难说最好的解决问题的方法是什么,但是一种常见的技术是使用头,头以一个固定长度字段开始,该字段表示消息的其余部分的长度。

Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

假设消息头仅由一个4字节的整数组成,该整数表示消息其余部分的长度。然后简单地做下面的事情。

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead);
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

Then later in the code

然后在代码中。

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

This makes a few assumptions:

这就做出了一些假设:

  • ints are the same size on the sender and receiver.
  • 发射端和接收端尺寸相同。
  • Endianess is the same on both the sender and receiver.
  • 在发送方和接收方都是相同的。
  • You have control of the protocol on both sides
  • 双方都可以控制协议
  • When you send a message you can calculate the length up front.
  • 当你发送消息时,你可以预先计算长度。

Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

由于想要显式地知道要发送到网络上的整数的大小是很常见的,所以在头文件中定义它们,并显式地使用它们,例如:

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

This would change the above to:

这将改变上面的内容:

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

I hope this helps.

我希望这可以帮助。

#2


9  

Several pointers:

几个指针:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

您需要处理返回值0,它告诉您远程主机关闭了套接字。

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

对于非阻塞套接字,还需要检查错误返回值(-1),并确保errno不是EINPROGRESS,这是预期的。

You definitely need better error handling - you're potentially leaking the buffer pointed to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

您肯定需要更好的错误处理——您可能会泄漏“buffer”指向的缓冲区。我注意到,在这个代码片段中,您不需要分配任何内容。

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

有人指出,如果read()填充了整个缓冲区,那么您的缓冲区不是以null结尾的C字符串。这确实是个问题,而且是严重的问题。

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

您的缓冲区大小有点小,但是只要您不尝试读取超过256字节的字节,或者您为它分配的任何内容,它就可以工作。

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

如果你担心进入一个无限循环,当远程主机发送你一个畸形的消息(潜在的拒绝服务攻击),那么您应该使用select()与套接字超时检查可读性,只有阅读如果数据可用,如果选择()次纾困。

Something like this might work for you:

像这样的东西可能对你有用:

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

根据您期望接收的数据量,您可以重复扫描整个消息以获取“结束”,令牌非常低效。最好使用状态机(状态为'e'->'n'->'d'->';';')进行此操作,这样您只需查看每个传入字符一次。

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.

认真地说,你应该考虑找一个图书馆来为你做这一切。做对是不容易的。

#3


3  

If you actually create the buffer as per dirks suggestion, then:

如果根据dirks的建议实际创建缓冲区,则:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);

may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:

可以完全填充缓冲区,可能覆盖在提取字符串流时所依赖的终止零字符。你需要:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );

#4


3  

1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:

1)其他人(特别是dirkgentle)注意到缓冲区需要分配一些内存空间。对于较小的N值(比如N <= 4096),也可以在堆栈上分配:

#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]

This saves you the worry of ensuring that you delete[] the buffer should an exception be thrown.

这就省去了确保在抛出异常时删除缓冲区的麻烦。

But remember that stacks are finite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

但是请记住堆栈的大小是有限的(堆也是,但是堆栈是finiter),所以您不希望在堆栈中放置太多。

2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).

2)对于-1返回代码,您不应该简单地立即返回(立即抛出异常更加粗略)。如果您的代码不仅仅是一个简短的家庭作业,那么您需要处理某些常规条件。例如,如果在非阻塞套接字上当前没有可用的数据,则可以在errno中返回EAGAIN。阅读(2)请查看手册页。

#5


1  

Where are you allocating memory for your buffer? The line where you invoke bzero invokes undefined behavior since buffer does not point to any valid region of memory.

您在哪里为缓冲区分配内存?调用b0的行调用未定义的行为,因为缓冲区不指向内存的任何有效区域。

char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;

#6


1  

This is an article that I always refer to when working with sockets..

这是我在使用socket时经常提到的一篇文章。

THE WORLD OF SELECT()

世界上的SELECT()

It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.

它将向您展示如何可靠地使用“select()”,并在底部包含一些其他有用的链接,以获得关于套接字的进一步信息。

#1


29  

Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

在不了解完整的应用程序的情况下,很难说最好的解决问题的方法是什么,但是一种常见的技术是使用头,头以一个固定长度字段开始,该字段表示消息的其余部分的长度。

Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

假设消息头仅由一个4字节的整数组成,该整数表示消息其余部分的长度。然后简单地做下面的事情。

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead);
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

Then later in the code

然后在代码中。

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

This makes a few assumptions:

这就做出了一些假设:

  • ints are the same size on the sender and receiver.
  • 发射端和接收端尺寸相同。
  • Endianess is the same on both the sender and receiver.
  • 在发送方和接收方都是相同的。
  • You have control of the protocol on both sides
  • 双方都可以控制协议
  • When you send a message you can calculate the length up front.
  • 当你发送消息时,你可以预先计算长度。

Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

由于想要显式地知道要发送到网络上的整数的大小是很常见的,所以在头文件中定义它们,并显式地使用它们,例如:

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

This would change the above to:

这将改变上面的内容:

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

I hope this helps.

我希望这可以帮助。

#2


9  

Several pointers:

几个指针:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

您需要处理返回值0,它告诉您远程主机关闭了套接字。

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

对于非阻塞套接字,还需要检查错误返回值(-1),并确保errno不是EINPROGRESS,这是预期的。

You definitely need better error handling - you're potentially leaking the buffer pointed to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

您肯定需要更好的错误处理——您可能会泄漏“buffer”指向的缓冲区。我注意到,在这个代码片段中,您不需要分配任何内容。

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

有人指出,如果read()填充了整个缓冲区,那么您的缓冲区不是以null结尾的C字符串。这确实是个问题,而且是严重的问题。

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

您的缓冲区大小有点小,但是只要您不尝试读取超过256字节的字节,或者您为它分配的任何内容,它就可以工作。

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

如果你担心进入一个无限循环,当远程主机发送你一个畸形的消息(潜在的拒绝服务攻击),那么您应该使用select()与套接字超时检查可读性,只有阅读如果数据可用,如果选择()次纾困。

Something like this might work for you:

像这样的东西可能对你有用:

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

根据您期望接收的数据量,您可以重复扫描整个消息以获取“结束”,令牌非常低效。最好使用状态机(状态为'e'->'n'->'d'->';';')进行此操作,这样您只需查看每个传入字符一次。

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.

认真地说,你应该考虑找一个图书馆来为你做这一切。做对是不容易的。

#3


3  

If you actually create the buffer as per dirks suggestion, then:

如果根据dirks的建议实际创建缓冲区,则:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);

may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:

可以完全填充缓冲区,可能覆盖在提取字符串流时所依赖的终止零字符。你需要:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );

#4


3  

1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:

1)其他人(特别是dirkgentle)注意到缓冲区需要分配一些内存空间。对于较小的N值(比如N <= 4096),也可以在堆栈上分配:

#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]

This saves you the worry of ensuring that you delete[] the buffer should an exception be thrown.

这就省去了确保在抛出异常时删除缓冲区的麻烦。

But remember that stacks are finite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

但是请记住堆栈的大小是有限的(堆也是,但是堆栈是finiter),所以您不希望在堆栈中放置太多。

2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).

2)对于-1返回代码,您不应该简单地立即返回(立即抛出异常更加粗略)。如果您的代码不仅仅是一个简短的家庭作业,那么您需要处理某些常规条件。例如,如果在非阻塞套接字上当前没有可用的数据,则可以在errno中返回EAGAIN。阅读(2)请查看手册页。

#5


1  

Where are you allocating memory for your buffer? The line where you invoke bzero invokes undefined behavior since buffer does not point to any valid region of memory.

您在哪里为缓冲区分配内存?调用b0的行调用未定义的行为,因为缓冲区不指向内存的任何有效区域。

char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;

#6


1  

This is an article that I always refer to when working with sockets..

这是我在使用socket时经常提到的一篇文章。

THE WORLD OF SELECT()

世界上的SELECT()

It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.

它将向您展示如何可靠地使用“select()”,并在底部包含一些其他有用的链接,以获得关于套接字的进一步信息。