The documentation of LZ4_decompress_safe
says:
LZ4_decompress_safe的文档说:
/*! LZ4_decompress_safe() : compressedSize : is the precise full size of the compressed block. maxDecompressedSize : is the size of destination buffer, which must be already allocated. return : the number of bytes decompressed into destination buffer (necessarily <= maxDecompressedSize) If destination buffer is not large enough, decoding will stop and output an error code (<0). If the source stream is detected malformed, the function will stop decoding and return a negative result. This function is protected against buffer overflow exploits, including malicious data packets. It never writes outside output buffer, nor reads outside input buffer. */ LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);
But doesn't specify how to distinguish whether the issue is with a too small destination buffer or from malformed input/bad combination of parameters/...
但并没有具体说明如何区分问题是由于目标缓冲区太小,还是由于输入错误/参数组合错误……
In the case where I don't know what the target decompressed size is, how can I know whether I should retry with a bigger buffer, or not?
在我不知道目标解压缩大小的情况下,我如何知道是否应该重试更大的缓冲区?
1 个解决方案
#1
2
There is an issue opened about this, and for now there is no public API to distinguish between errors.
这方面有一个问题,现在还没有公共API来区分错误。
As a heuristic, looking at the code shows the possible return values:
作为一种启发式,查看代码会显示可能的返回值:
/* end of decoding */ if (endOnInput) return (int) (((char*)op)-dest); /* Nb of output bytes decoded */ else return (int) (((const char*)ip)-source); /* Nb of input bytes read */ /* Overflow error detected */ _output_error: return (int) (-(((const char*)ip)-source))-1;
So there are only 2 cases:
所以只有两种情况:
- either the decoding was successful, and you get a positive result (whose signification depends on whether you are in full or partial mode)
- 解码成功后,你会得到一个积极的结果(其意义取决于你是否处于全模式或部分模式)
- or the decoding was unsuccessful and you get a negative result
- 或者解码不成功,结果是否定的
In the case of the negative result, the value is -(position_in_input + 1)
.
对于负结果,值为-(position_in_input + 1)。
This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:
这表明,猜测目标缓冲区是否太小,可以通过重新尝试(大量)更大的缓冲区来实现成功,并检查失败是否发生在相同的位置:
- if the second decompression attempt succeeds, you're good!
- 如果第二次解压尝试成功,您就很好!
- if the second decompression attempt fails at the same position, then the issue is likely with the input,
- 如果第二次减压尝试在相同位置失败,那么输入可能会出现问题,
- otherwise, you have to try with a bigger buffer again.
- 否则,您必须再次尝试使用更大的缓冲区。
Or otherwise said, as long as the result differs, try again, otherwise, there's your result.
或者换句话说,只要结果不同,再试一次,否则,这就是你的结果。
Limitation
限制
The input pointer does not necessarily advance one byte at a time, it may advance length
bytes in two places where length
is read from the input and unbounded.
输入指针不必每次只前进一个字节,它可以在从输入读取长度和*的两个地方前进长度字节。
If decoding fails because the output buffer was too small, and the new output buffer is still too small for length
, then decoding will fail in the same position even though the input is not (necessarily) malformed.
如果由于输出缓冲区太小而导致解码失败,并且新的输出缓冲区对于长度仍然太小,那么即使输入没有(必要)畸形,解码也会在相同的位置失败。
If false positives are an issue, then one may attempt to:
如果假阳性是一个问题,那么可以尝试:
- decode the
length
, by checking the input stream at the position returned, - 解码长度,通过检查返回位置的输入流,
- simply allocate
255 * <input size> - 2526
as per Mark Adler's answer, which is reasonable for small inputs. - 只需根据Mark Adler的答案分配255 * <输入大小> - 2526,这对于小的输入是合理的。
#1
2
There is an issue opened about this, and for now there is no public API to distinguish between errors.
这方面有一个问题,现在还没有公共API来区分错误。
As a heuristic, looking at the code shows the possible return values:
作为一种启发式,查看代码会显示可能的返回值:
/* end of decoding */ if (endOnInput) return (int) (((char*)op)-dest); /* Nb of output bytes decoded */ else return (int) (((const char*)ip)-source); /* Nb of input bytes read */ /* Overflow error detected */ _output_error: return (int) (-(((const char*)ip)-source))-1;
So there are only 2 cases:
所以只有两种情况:
- either the decoding was successful, and you get a positive result (whose signification depends on whether you are in full or partial mode)
- 解码成功后,你会得到一个积极的结果(其意义取决于你是否处于全模式或部分模式)
- or the decoding was unsuccessful and you get a negative result
- 或者解码不成功,结果是否定的
In the case of the negative result, the value is -(position_in_input + 1)
.
对于负结果,值为-(position_in_input + 1)。
This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:
这表明,猜测目标缓冲区是否太小,可以通过重新尝试(大量)更大的缓冲区来实现成功,并检查失败是否发生在相同的位置:
- if the second decompression attempt succeeds, you're good!
- 如果第二次解压尝试成功,您就很好!
- if the second decompression attempt fails at the same position, then the issue is likely with the input,
- 如果第二次减压尝试在相同位置失败,那么输入可能会出现问题,
- otherwise, you have to try with a bigger buffer again.
- 否则,您必须再次尝试使用更大的缓冲区。
Or otherwise said, as long as the result differs, try again, otherwise, there's your result.
或者换句话说,只要结果不同,再试一次,否则,这就是你的结果。
Limitation
限制
The input pointer does not necessarily advance one byte at a time, it may advance length
bytes in two places where length
is read from the input and unbounded.
输入指针不必每次只前进一个字节,它可以在从输入读取长度和*的两个地方前进长度字节。
If decoding fails because the output buffer was too small, and the new output buffer is still too small for length
, then decoding will fail in the same position even though the input is not (necessarily) malformed.
如果由于输出缓冲区太小而导致解码失败,并且新的输出缓冲区对于长度仍然太小,那么即使输入没有(必要)畸形,解码也会在相同的位置失败。
If false positives are an issue, then one may attempt to:
如果假阳性是一个问题,那么可以尝试:
- decode the
length
, by checking the input stream at the position returned, - 解码长度,通过检查返回位置的输入流,
- simply allocate
255 * <input size> - 2526
as per Mark Adler's answer, which is reasonable for small inputs. - 只需根据Mark Adler的答案分配255 * <输入大小> - 2526,这对于小的输入是合理的。