编写流协议:消息大小字段或消息分隔符?

时间:2022-09-23 00:09:08

I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.

我即将编写一个通过TCP流的消息协议。接收方需要知道消息边界的位置。

I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).

我可以发送1)固定长度的消息,2)大小字段,以便接收者知道消息有多大,或3)唯一的消息终止符(我猜这不能在消息中的任何其他地方使用)。

I won't use #1 for efficiency reasons.

出于效率原因,我不会使用#1。

I like #2 but is it possible for the stream to get out of sync?

我喜欢#2,但是流可能会失去同步吗?

I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.

我不喜欢#3的想法,因为它意味着接收者不能提前知道消息的大小,并且还要求终结符不会出现在消息的其他地方。

With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?

对于#2,如果可能不同步,我可以添加终结符,或者我保证永远不会失去同步,只要发送方程序正确无误即可!是否有必要做#2和#3?

Please let me know.

请告诉我。

Thanks, jbu

6 个解决方案

#1


You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message. So option #2 is ok.

您正在使用TCP,数据包传输是可靠的。所以连接要么掉线,要么超时,否则你会读完整个信息。因此选项#2没问题。

#2


I agree with sigjuice. If you have a size field, it's not necessary to add and end-of-message delimiter -- however, it's a good idea. Having both makes things much more robust and easier to debug.

我同意sigjuice。如果你有一个大小字段,则没有必要添加和结束消息分隔符 - 但是,这是一个好主意。两者都使事情更健壮,更容易调试。

Consider using the standard netstring format, which includes both a size field and also a end-of-string character. Because it has a size field, it's OK for the end-of-string character to be used inside the message.

考虑使用标准的netstring格式,它既包括大小字段,也包括字符串结尾字符。因为它有一个size字段,所以可以在消息中使用字符串结尾字符。

#3


Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).

根据您工作的级别,#2实际上可能没有出现同步失败的问题(TCP在数据包中有序列编号,并且如果它按顺序到达,它会以正确的顺序重新组合流) 。

Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.

因此,#2可能是你最好的选择。此外,在传输的早期知道消息大小将使得在接收端更容易分配存储器。

#4


Interesting there is no clear answer here. #2 is safe over TCP no matter what, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted and in the order that it was sent, so there is no possibility that a correct implementation could get out of sync.

有趣的是,这里没有明确的答案。无论如何,#2对TCP都是安全的,并且经常在“现实世界”中完成。这是因为TCP保证所有数据都是未损坏的并且按照发送的顺序到达,因此正确的实现不可能不同步。

#5


If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.

如果您从头开始开发发送和接收代码,使用长度标头和分隔符都没有坏处。这将提供稳健性和错误检测。考虑一下你只使用#2的情况。如果你将长度字段N写入TCP流,但最终发送的消息大小与N不同,则接收端不会更好地知道并最终混淆。

If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.

如果同时使用#2和#3,虽然不是万无一失,但是如果在从TCP流中消耗N个字节后遇到分隔符,则接收者可以更有信心地接收到消息。您还可以安全地在邮件中使用分隔符。

Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.

看一下使用#2和#3的真实世界示例的HTTP Chunked Transfer Coding。

#6


There is a fourth alternative: a self-describing protocol such as XML.

还有第四种选择:自描述协议,如XML。

#1


You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message. So option #2 is ok.

您正在使用TCP,数据包传输是可靠的。所以连接要么掉线,要么超时,否则你会读完整个信息。因此选项#2没问题。

#2


I agree with sigjuice. If you have a size field, it's not necessary to add and end-of-message delimiter -- however, it's a good idea. Having both makes things much more robust and easier to debug.

我同意sigjuice。如果你有一个大小字段,则没有必要添加和结束消息分隔符 - 但是,这是一个好主意。两者都使事情更健壮,更容易调试。

Consider using the standard netstring format, which includes both a size field and also a end-of-string character. Because it has a size field, it's OK for the end-of-string character to be used inside the message.

考虑使用标准的netstring格式,它既包括大小字段,也包括字符串结尾字符。因为它有一个size字段,所以可以在消息中使用字符串结尾字符。

#3


Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).

根据您工作的级别,#2实际上可能没有出现同步失败的问题(TCP在数据包中有序列编号,并且如果它按顺序到达,它会以正确的顺序重新组合流) 。

Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.

因此,#2可能是你最好的选择。此外,在传输的早期知道消息大小将使得在接收端更容易分配存储器。

#4


Interesting there is no clear answer here. #2 is safe over TCP no matter what, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted and in the order that it was sent, so there is no possibility that a correct implementation could get out of sync.

有趣的是,这里没有明确的答案。无论如何,#2对TCP都是安全的,并且经常在“现实世界”中完成。这是因为TCP保证所有数据都是未损坏的并且按照发送的顺序到达,因此正确的实现不可能不同步。

#5


If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.

如果您从头开始开发发送和接收代码,使用长度标头和分隔符都没有坏处。这将提供稳健性和错误检测。考虑一下你只使用#2的情况。如果你将长度字段N写入TCP流,但最终发送的消息大小与N不同,则接收端不会更好地知道并最终混淆。

If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.

如果同时使用#2和#3,虽然不是万无一失,但是如果在从TCP流中消耗N个字节后遇到分隔符,则接收者可以更有信心地接收到消息。您还可以安全地在邮件中使用分隔符。

Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.

看一下使用#2和#3的真实世界示例的HTTP Chunked Transfer Coding。

#6


There is a fourth alternative: a self-describing protocol such as XML.

还有第四种选择:自描述协议,如XML。