Given the length in characters or bytes of some array array()
, is there any way to know what the compressed length/size of the result will be?
给定一些数组array()的字符或字节长度,有没有办法知道结果的压缩长度/大小是多少?
An example code is below.
下面是一个示例代码。
Dim c() As Byte
Using memory As System.IO.MemoryStream = New System.IO.MemoryStream()
Using gzip As System.IO.Compression.GZipStream = New System.IO.Compression.GZipStream(memory, System.IO.Compression.CompressionMode.Compress, True)
gzip.Write(array, 0, array.Length)
End Using
c = memory.ToArray()
End Using
I can run tests for example types of data (I happen to be working with all ASCII characters, a simple XML, so I am getting around 9:1 compression), but is there any way to know the compression ratio before compressing and querying?
我可以运行测试例如数据类型(我碰巧使用所有ASCII字符,一个简单的XML,所以我得到了大约9:1的压缩),但有没有办法在压缩和查询之前知道压缩率?
My specific use case is a variable amount of input data in array()
, compressed and sent via webservice, to an API that limits the size of the call. I will be able to loop through and send multiple calls, if my compressed data is too long for a single call (will probably happen about once every 10 calls), but I don't know how to tell what is too big.
我的具体用例是array()中的可变数量的输入数据,通过webservice压缩和发送到限制调用大小的API。如果我的压缩数据对于一次调用来说太长了(可能每10次调用一次就会发生),我将能够循环并发送多个调用,但我不知道如何判断什么是太大。
I could simply make a conservative guess (maybe, I know compression will be at least 1.5:1, and so don't create an array()
that would make a 1.5:1 compressed string bigger than this API allows), but I would prefer to be a bit more precise. This webservice also limits the number of calls/day, so just sending 100 calls/day is not ideal.
我可以简单地做一个保守的猜测(也许,我知道压缩将至少为1.5:1,所以不要创建一个数组(),它会使1.5:1的压缩字符串比这个API允许的大),但我会更喜欢更精确一点。这个网络服务也限制了每天的通话次数,因此每天发送100个电话并不理想。
1 个解决方案
#1
3
The only way to know for certain what the size will be is to actually run through the compression algorithm. If you want to do that without allocating space for the output bytes, you could make a null Stream
implementation as the target for compression, so that the results of compression are simply thrown out, but the number of bytes is counted.
确定大小的唯一方法是实际运行压缩算法。如果你想在不为输出字节分配空间的情况下这样做,你可以将一个空的Stream实现作为压缩目标,这样就可以简单地抛出压缩结果,但计算字节数。
The type of compression that an algorithm like GZip does can vary wildly in its efficiency based on the input. Compare the compressed size of a sequence repeating the same byte N times with the compressed size of N random bytes in a row, and you'll see what I mean. That said, if your data has a characteristic form, there might very well be a typical compression ratio that you could use to generate an approximate estimate.
像GZip这样的算法所做的压缩类型可以根据输入的效率而有很大差异。将重复相同字节N次的序列的压缩大小与一行中N个随机字节的压缩大小进行比较,您将看到我的意思。也就是说,如果您的数据具有特征形式,则可能会使用典型的压缩比来生成近似估计值。
#1
3
The only way to know for certain what the size will be is to actually run through the compression algorithm. If you want to do that without allocating space for the output bytes, you could make a null Stream
implementation as the target for compression, so that the results of compression are simply thrown out, but the number of bytes is counted.
确定大小的唯一方法是实际运行压缩算法。如果你想在不为输出字节分配空间的情况下这样做,你可以将一个空的Stream实现作为压缩目标,这样就可以简单地抛出压缩结果,但计算字节数。
The type of compression that an algorithm like GZip does can vary wildly in its efficiency based on the input. Compare the compressed size of a sequence repeating the same byte N times with the compressed size of N random bytes in a row, and you'll see what I mean. That said, if your data has a characteristic form, there might very well be a typical compression ratio that you could use to generate an approximate estimate.
像GZip这样的算法所做的压缩类型可以根据输入的效率而有很大差异。将重复相同字节N次的序列的压缩大小与一行中N个随机字节的压缩大小进行比较,您将看到我的意思。也就是说,如果您的数据具有特征形式,则可能会使用典型的压缩比来生成近似估计值。