I generated some high resolution publication quality plots for example
例如,我生成了一些高分辨率的出版物质量图。
library(plot3D)
Volcano<-volcano
zf=10 #zoom factor
tiff("Volcano.tif", width=1800*zf, height=900*zf, res=175*zf, compression="lzw")
image2D(z = Volcano, clab = "height, m",colkey = list(dist = -0.20, shift = 0.15,side = 3, length = 0.5, width = 0.5,cex.clab = 1.2, col.clab = "white", line.clab = 2,col.axis = "white", col.ticks = "white", cex.axis = 0.8))
dev.off()
the file is 22 MB.
这个文件是22 MB。
Now I open the file with GIMP and without doing anything else I export it as "Volcano gimp.tif" (don't change resolution, or do anything else). GIMP generates a file ("Volcano gimp.tif") that is 1.9 MB.
现在我用GIMP打开文件,不做任何其他事情,我将它导出为“Volcano GIMP”。tif(不要改变决议,或者做其他事情)。GIMP生成一个文件(“Volcano gimp.tif”),即1.9 MB。
imagemagick
reports similar image stats:
imagemagick报告类似的图像统计:
$ identify Volcano.tif
Volcano.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 22.37MB 0.000u 0:00.000
$ identify "Volcano gimp.tif"
Volcano gimp.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 1.89MB 0.000u 0:00.000
even using identify -verbose
the 2 files appear to be similar.
即使使用identify -verbose,这两个文件看起来也是相似的。
What is the difference between these files? Why do they have so different file sizes?
这些文件有什么不同?为什么它们的文件大小不同?
UPDATE: OK, things are getting crazier. I did the same thing with IrfanView and I get different file sizes. The initial file is the Volcano.tif
generated from R
with compression="lzw"
. Check how Volcano irfan.tif
and Volcano gimp.tif
differ in size but all other stats are the same. Memory footprint, DPI, Colors, Resolution is identical. Disk size is different.
更新:好的,事情越来越疯狂了。我对IrfanView做了同样的事情,得到了不同的文件大小。最初的文件是火山。用压缩生成的tif =“lzw”。看看火山艾尔。tif和火山gimp。tif的大小不同,但所有其他属性都是相同的。内存占用、DPI、颜色、分辨率都是相同的。磁盘大小是不同的。
UPDATE 2: Adobe Photoshop saves the file down to 2.6 MB
更新2:Adobe Photoshop将文件保存到2.6 MB。
WinRar reports that the original R generated TIFF is highly compressible (from 22MB ->3.6MB)
WinRar报告原始的R生成的TIFF是高度可压缩的(从22MB ->3.6MB)
UPDATE 3: This issue might be similar to Montage / Join 2 TIFF images in a 2 col x 1 row tile without losing quality
更新3:这个问题可能类似于蒙太奇/加入2 TIFF图像在2 col x 1 row tile中不会丢失质量
UPDATE 4: The R generated TIFF file can be found here http://ge.tt/7ZvRd4C1/v/0?c
更新4:生成的TIFF文件可以在这里找到http://ge.tt/7ZvRd4C1/v/0?c
1 个解决方案
#1
9
Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.
显然,R使用的TIFF LZW压缩器并没有使用一个重要选项(TIFF预测值),该选项导致一个非常大的文件。当数据压缩能够识别数据中的对称性/冗余时,它的工作效果最好。在本例中,图像数据由包含红色、绿色和蓝色8位值的24位(3字节)像素组成。标准的LZW压缩查看重复模式的字节流。如果将彩色图像简单地看作一组字节流,就会看到重复的3字节模式,而不是重复的不变颜色模式。在数据上启用TIFF预测器会导致不同的过滤器将每个像素的delta与它的邻居存储在一起。如果相邻像素是相同的颜色,它将存储0。长度为0的长字符串比长度至少为3字节的非0的重复模式要好得多。
Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:
这是一个6像素线的例子。编码时,预测器从右边缘开始,每条扫描线向左工作:
Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)
After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.
Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.
结论:这应该作为对R压缩代码所有者的一个错误,因为在没有预测器的全彩色图像上使用LZW会产生较差的结果。同时,需要一个变通方法来更有效地压缩它。
#1
9
Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.
显然,R使用的TIFF LZW压缩器并没有使用一个重要选项(TIFF预测值),该选项导致一个非常大的文件。当数据压缩能够识别数据中的对称性/冗余时,它的工作效果最好。在本例中,图像数据由包含红色、绿色和蓝色8位值的24位(3字节)像素组成。标准的LZW压缩查看重复模式的字节流。如果将彩色图像简单地看作一组字节流,就会看到重复的3字节模式,而不是重复的不变颜色模式。在数据上启用TIFF预测器会导致不同的过滤器将每个像素的delta与它的邻居存储在一起。如果相邻像素是相同的颜色,它将存储0。长度为0的长字符串比长度至少为3字节的非0的重复模式要好得多。
Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:
这是一个6像素线的例子。编码时,预测器从右边缘开始,每条扫描线向左工作:
Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)
After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.
Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.
结论:这应该作为对R压缩代码所有者的一个错误,因为在没有预测器的全彩色图像上使用LZW会产生较差的结果。同时,需要一个变通方法来更有效地压缩它。