I have a C / C++ program which needs to read in a file that may or may not be gzip compressed. I know we can use gzread() from zlib to read in both compressed and uncompressed files - however, I want to use the zlib functions ONLY if the file is gzip compressed (for performance reasons).
我有一个C / c++程序,它需要在一个文件中读取,这个文件可能是gzip压缩的,也可能不是gzip压缩的。我知道我们可以使用zlib中的gzread()来读取压缩文件和未压缩文件——但是,我希望只在文件被gzip压缩时使用zlib函数(出于性能原因)。
So is there any way to programatically detect or check if a certain file is gzipped from C / C++?
那么有什么方法可以程序化地检测或检查某个文件是否从C / c++压缩?
4 个解决方案
#1
43
There is a magic number at the beginning of the file. Just read the first two bytes and check if they are equal to 0x1f8b
.
在文件的开头有一个神奇的数字。只要读取前两个字节并检查它们是否等于0x1f8b。
#2
8
Do you prefer false positives, false negatives, or no false results at all (there goes performance down the drain...)?
你是喜欢假阳性,假阴性,还是根本不喜欢假结果?
The RFC 1952: GZIP file format specification version 4.3 states the first 2 bytes (of each member and therefore) of the file are '\x1F'
and '\x8B'
. Use that for a first check that can result in false positives.
RFC 1952: GZIP文件格式规范4.3版本规定文件的前两个字节(每个成员的前两个字节)是'\x1F'和'\x8B'。第一次检查可能会导致误报。
#3
3
What is the difference in performance between reading compressed and uncompressed files using gzread()?
使用gzread()读取压缩和未压缩文件的性能差异是什么?
Anyway, in order to detect if a file is gzipped, you can read the magic number at the beginning of the file, which is 1f 8b
according to the link.
无论如何,为了检测一个文件是否被gzip压缩,您可以在文件的开头读取神奇的数字,根据链接,这个数字是1f8b。
#4
1
You can test for the signatures described in the RFCs 1951 and 1952 to get an idea. For GZIP files the second one is the relevant and it is definitive. There are some false positives on other formats, so you should check as much of the header for plausible values.
您可以测试RFCs 1951和1952年描述的签名,以获得一个想法。对于GZIP文件,第二个是相关的,是确定的。在其他格式上有一些假阳性,所以您应该检查尽可能多的页眉,以获得合理的值。
For just zlib streams it's somewhat harder, because they are even more prone to false positives. But you would rarely encounter those in the wild on their own.
仅仅对于zlib流来说就有点难了,因为它们更容易出现假阳性。但是你很少会遇到那些独自在野外的人。
#1
43
There is a magic number at the beginning of the file. Just read the first two bytes and check if they are equal to 0x1f8b
.
在文件的开头有一个神奇的数字。只要读取前两个字节并检查它们是否等于0x1f8b。
#2
8
Do you prefer false positives, false negatives, or no false results at all (there goes performance down the drain...)?
你是喜欢假阳性,假阴性,还是根本不喜欢假结果?
The RFC 1952: GZIP file format specification version 4.3 states the first 2 bytes (of each member and therefore) of the file are '\x1F'
and '\x8B'
. Use that for a first check that can result in false positives.
RFC 1952: GZIP文件格式规范4.3版本规定文件的前两个字节(每个成员的前两个字节)是'\x1F'和'\x8B'。第一次检查可能会导致误报。
#3
3
What is the difference in performance between reading compressed and uncompressed files using gzread()?
使用gzread()读取压缩和未压缩文件的性能差异是什么?
Anyway, in order to detect if a file is gzipped, you can read the magic number at the beginning of the file, which is 1f 8b
according to the link.
无论如何,为了检测一个文件是否被gzip压缩,您可以在文件的开头读取神奇的数字,根据链接,这个数字是1f8b。
#4
1
You can test for the signatures described in the RFCs 1951 and 1952 to get an idea. For GZIP files the second one is the relevant and it is definitive. There are some false positives on other formats, so you should check as much of the header for plausible values.
您可以测试RFCs 1951和1952年描述的签名,以获得一个想法。对于GZIP文件,第二个是相关的,是确定的。在其他格式上有一些假阳性,所以您应该检查尽可能多的页眉,以获得合理的值。
For just zlib streams it's somewhat harder, because they are even more prone to false positives. But you would rarely encounter those in the wild on their own.
仅仅对于zlib流来说就有点难了,因为它们更容易出现假阳性。但是你很少会遇到那些独自在野外的人。