TIFF图形生成和压缩:R、GIMP、IrfanView、Photoshop文件大小

I generated some high resolution publication quality plots for example

例如，我生成了一些高分辨率的出版物质量图。

library(plot3D)
Volcano<-volcano
zf=10 #zoom factor
tiff("Volcano.tif", width=1800*zf, height=900*zf, res=175*zf, compression="lzw")
image2D(z = Volcano, clab = "height, m",colkey = list(dist = -0.20, shift = 0.15,side = 3, length = 0.5, width = 0.5,cex.clab = 1.2, col.clab = "white", line.clab = 2,col.axis = "white", col.ticks = "white", cex.axis = 0.8))
dev.off()

the file is 22 MB.

这个文件是22 MB。

Now I open the file with GIMP and without doing anything else I export it as "Volcano gimp.tif" (don't change resolution, or do anything else). GIMP generates a file ("Volcano gimp.tif") that is 1.9 MB.

现在我用GIMP打开文件，不做任何其他事情，我将它导出为“Volcano GIMP”。tif(不要改变决议，或者做其他事情)。GIMP生成一个文件(“Volcano gimp.tif”)，即1.9 MB。

imagemagick reports similar image stats:

imagemagick报告类似的图像统计:

$ identify Volcano.tif
Volcano.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 22.37MB 0.000u 0:00.000
$ identify "Volcano gimp.tif"
Volcano gimp.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 1.89MB 0.000u 0:00.000

even using identify -verbose the 2 files appear to be similar.

即使使用identify -verbose，这两个文件看起来也是相似的。

What is the difference between these files? Why do they have so different file sizes?

这些文件有什么不同?为什么它们的文件大小不同?

UPDATE: OK, things are getting crazier. I did the same thing with IrfanView and I get different file sizes. The initial file is the Volcano.tif generated from R with compression="lzw". Check how Volcano irfan.tif and Volcano gimp.tif differ in size but all other stats are the same. Memory footprint, DPI, Colors, Resolution is identical. Disk size is different.

更新:好的，事情越来越疯狂了。我对IrfanView做了同样的事情，得到了不同的文件大小。最初的文件是火山。用压缩生成的tif =“lzw”。看看火山艾尔。tif和火山gimp。tif的大小不同，但所有其他属性都是相同的。内存占用、DPI、颜色、分辨率都是相同的。磁盘大小是不同的。

TIFF图形生成和压缩:R、GIMP、IrfanView、Photoshop文件大小

UPDATE 2: Adobe Photoshop saves the file down to 2.6 MB

更新2:Adobe Photoshop将文件保存到2.6 MB。

TIFF图形生成和压缩:R、GIMP、IrfanView、Photoshop文件大小

WinRar reports that the original R generated TIFF is highly compressible (from 22MB ->3.6MB)

WinRar报告原始的R生成的TIFF是高度可压缩的(从22MB ->3.6MB)

UPDATE 3: This issue might be similar to Montage / Join 2 TIFF images in a 2 col x 1 row tile without losing quality

更新3:这个问题可能类似于蒙太奇/加入2 TIFF图像在2 col x 1 row tile中不会丢失质量

UPDATE 4: The R generated TIFF file can be found here http://ge.tt/7ZvRd4C1/v/0?c

更新4:生成的TIFF文件可以在这里找到http://ge.tt/7ZvRd4C1/v/0?c

1 个解决方案

#1

Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.

显然，R使用的TIFF LZW压缩器并没有使用一个重要选项(TIFF预测值)，该选项导致一个非常大的文件。当数据压缩能够识别数据中的对称性/冗余时，它的工作效果最好。在本例中，图像数据由包含红色、绿色和蓝色8位值的24位(3字节)像素组成。标准的LZW压缩查看重复模式的字节流。如果将彩色图像简单地看作一组字节流，就会看到重复的3字节模式，而不是重复的不变颜色模式。在数据上启用TIFF预测器会导致不同的过滤器将每个像素的delta与它的邻居存储在一起。如果相邻像素是相同的颜色，它将存储0。长度为0的长字符串比长度至少为3字节的非0的重复模式要好得多。

Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:

这是一个6像素线的例子。编码时，预测器从右边缘开始，每条扫描线向左工作:

Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)

After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.

Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.

结论:这应该作为对R压缩代码所有者的一个错误，因为在没有预测器的全彩色图像上使用LZW会产生较差的结果。同时，需要一个变通方法来更有效地压缩它。

#1

Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:

这是一个6像素线的例子。编码时，预测器从右边缘开始，每条扫描线向左工作:

Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)

After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.

秒客网

TIFF图形生成和压缩:R、GIMP、IrfanView、Photoshop文件大小

1 个解决方案

#1

#1

相关文章