如何知道当使用LZ4解压时输出缓冲区太小?

时间:2022-01-01 20:20:23

The documentation of LZ4_decompress_safe says:

LZ4_decompress_safe的文档说:

/*! LZ4_decompress_safe() :
    compressedSize : is the precise full size of the compressed block.
    maxDecompressedSize : is the size of destination buffer, which must be already allocated.
    return : the number of bytes decompressed into destination buffer (necessarily <= maxDecompressedSize)
             If destination buffer is not large enough, decoding will stop and output an error code (<0).
             If the source stream is detected malformed, the function will stop decoding and return a negative result.
             This function is protected against buffer overflow exploits, including malicious data packets.
             It never writes outside output buffer, nor reads outside input buffer.
*/
LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);

But doesn't specify how to distinguish whether the issue is with a too small destination buffer or from malformed input/bad combination of parameters/...

但并没有具体说明如何区分问题是由于目标缓冲区太小,还是由于输入错误/参数组合错误……

In the case where I don't know what the target decompressed size is, how can I know whether I should retry with a bigger buffer, or not?

在我不知道目标解压缩大小的情况下,我如何知道是否应该重试更大的缓冲区?

1 个解决方案

#1


2  

There is an issue opened about this, and for now there is no public API to distinguish between errors.

这方面有一个问题,现在还没有公共API来区分错误。


As a heuristic, looking at the code shows the possible return values:

作为一种启发式,查看代码会显示可能的返回值:

    /* end of decoding */
    if (endOnInput)
       return (int) (((char*)op)-dest);     /* Nb of output bytes decoded */
    else
       return (int) (((const char*)ip)-source);   /* Nb of input bytes read */

    /* Overflow error detected */
_output_error:
    return (int) (-(((const char*)ip)-source))-1;

So there are only 2 cases:

所以只有两种情况:

  • either the decoding was successful, and you get a positive result (whose signification depends on whether you are in full or partial mode)
  • 解码成功后,你会得到一个积极的结果(其意义取决于你是否处于全模式或部分模式)
  • or the decoding was unsuccessful and you get a negative result
  • 或者解码不成功,结果是否定的

In the case of the negative result, the value is -(position_in_input + 1).

对于负结果,值为-(position_in_input + 1)。

This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:

这表明,猜测目标缓冲区是否太小,可以通过重新尝试(大量)更大的缓冲区来实现成功,并检查失败是否发生在相同的位置:

  • if the second decompression attempt succeeds, you're good!
  • 如果第二次解压尝试成功,您就很好!
  • if the second decompression attempt fails at the same position, then the issue is likely with the input,
  • 如果第二次减压尝试在相同位置失败,那么输入可能会出现问题,
  • otherwise, you have to try with a bigger buffer again.
  • 否则,您必须再次尝试使用更大的缓冲区。

Or otherwise said, as long as the result differs, try again, otherwise, there's your result.

或者换句话说,只要结果不同,再试一次,否则,这就是你的结果。


Limitation

限制

The input pointer does not necessarily advance one byte at a time, it may advance length bytes in two places where length is read from the input and unbounded.

输入指针不必每次只前进一个字节,它可以在从输入读取长度和*的两个地方前进长度字节。

If decoding fails because the output buffer was too small, and the new output buffer is still too small for length, then decoding will fail in the same position even though the input is not (necessarily) malformed.

如果由于输出缓冲区太小而导致解码失败,并且新的输出缓冲区对于长度仍然太小,那么即使输入没有(必要)畸形,解码也会在相同的位置失败。

If false positives are an issue, then one may attempt to:

如果假阳性是一个问题,那么可以尝试:

  • decode the length, by checking the input stream at the position returned,
  • 解码长度,通过检查返回位置的输入流,
  • simply allocate 255 * <input size> - 2526 as per Mark Adler's answer, which is reasonable for small inputs.
  • 只需根据Mark Adler的答案分配255 * <输入大小> - 2526,这对于小的输入是合理的。

#1


2  

There is an issue opened about this, and for now there is no public API to distinguish between errors.

这方面有一个问题,现在还没有公共API来区分错误。


As a heuristic, looking at the code shows the possible return values:

作为一种启发式,查看代码会显示可能的返回值:

    /* end of decoding */
    if (endOnInput)
       return (int) (((char*)op)-dest);     /* Nb of output bytes decoded */
    else
       return (int) (((const char*)ip)-source);   /* Nb of input bytes read */

    /* Overflow error detected */
_output_error:
    return (int) (-(((const char*)ip)-source))-1;

So there are only 2 cases:

所以只有两种情况:

  • either the decoding was successful, and you get a positive result (whose signification depends on whether you are in full or partial mode)
  • 解码成功后,你会得到一个积极的结果(其意义取决于你是否处于全模式或部分模式)
  • or the decoding was unsuccessful and you get a negative result
  • 或者解码不成功,结果是否定的

In the case of the negative result, the value is -(position_in_input + 1).

对于负结果,值为-(position_in_input + 1)。

This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:

这表明,猜测目标缓冲区是否太小,可以通过重新尝试(大量)更大的缓冲区来实现成功,并检查失败是否发生在相同的位置:

  • if the second decompression attempt succeeds, you're good!
  • 如果第二次解压尝试成功,您就很好!
  • if the second decompression attempt fails at the same position, then the issue is likely with the input,
  • 如果第二次减压尝试在相同位置失败,那么输入可能会出现问题,
  • otherwise, you have to try with a bigger buffer again.
  • 否则,您必须再次尝试使用更大的缓冲区。

Or otherwise said, as long as the result differs, try again, otherwise, there's your result.

或者换句话说,只要结果不同,再试一次,否则,这就是你的结果。


Limitation

限制

The input pointer does not necessarily advance one byte at a time, it may advance length bytes in two places where length is read from the input and unbounded.

输入指针不必每次只前进一个字节,它可以在从输入读取长度和*的两个地方前进长度字节。

If decoding fails because the output buffer was too small, and the new output buffer is still too small for length, then decoding will fail in the same position even though the input is not (necessarily) malformed.

如果由于输出缓冲区太小而导致解码失败,并且新的输出缓冲区对于长度仍然太小,那么即使输入没有(必要)畸形,解码也会在相同的位置失败。

If false positives are an issue, then one may attempt to:

如果假阳性是一个问题,那么可以尝试:

  • decode the length, by checking the input stream at the position returned,
  • 解码长度,通过检查返回位置的输入流,
  • simply allocate 255 * <input size> - 2526 as per Mark Adler's answer, which is reasonable for small inputs.
  • 只需根据Mark Adler的答案分配255 * <输入大小> - 2526,这对于小的输入是合理的。