For my in-game AI project I use computer vision. So, for supervised learning I capture the screen and pressed keys.
I have a problem to store this huge amount of data (lot's of images of size 320x240) since my hard drive is limited in space. So far I found that saving frames in jpeg performs the best (1000 frames ~20MB).
I also tried to save array of images using numpy (.npy, 1000 frames ~220 MB) and h5py (.h5, 1000 frames ~220MB). In this ways the file sizes were too big to store sufficient amount of data for AI training (even when using gzip compression).
However, saving in jpeg gives very slow read/write speed. So, is there any way to store an array of images in a single file to have high read/write speed and at the same time being compact?
I found interesting research about it (https://*.com/a/41425878), but seems in the case of images it is not helpful.
对于我在游戏中的AI项目,我使用计算机视觉。因此,对于监督学习,我捕获屏幕并按下按键。我有一个问题是存储这么大量的数据(大小为320x240的图像),因为我的硬盘空间有限。到目前为止,我发现在jpeg中保存帧表现最好(1000帧~20MB)。我还试图使用numpy(.npy,1000帧~220 MB)和h5py(.h5,1000帧~220MB)保存图像数组。通过这种方式,文件大小太大,无法为AI培训存储足够的数据(即使使用gzip压缩)。但是,保存jpeg会使读/写速度变慢。那么,有没有办法将一组图像存储在一个文件中,以获得高读/写速度,同时又是紧凑的?我发现了有趣的研究(https://*.com/a/41425878),但在图像的情况下似乎没有帮助。
1 个解决方案
#1
0
Well, if you already have the images as (e.g. NumPy) arrays in memory, saving them using numpy.save
or h5py
is pretty much optimal, as both store the data in binary form (as compared to e.g. numpy.savetxt
). To get even smaller file sizes, you can make use of one of the compression filters of HDF5
/h5py
.
好吧,如果您已经将图像作为(例如NumPy)数组存储在内存中,则使用numpy.save或h5py保存它们是非常优化的,因为它们都以二进制形式存储数据(与例如numpy.savetxt相比)。要获得更小的文件大小,您可以使用HDF5 / h5py的压缩过滤器之一。
The reason that you can get even lower file sizes by saving as jpeg is because this is a lossy compression format, meaning that you actually loose data. To make an objective comparison between "raw data" and "real image" formats, try saving to png instead.
通过保存为jpeg可以获得更低文件大小的原因是因为这是一种有损压缩格式,这意味着您实际上是在丢失数据。要在“原始数据”和“实际图像”格式之间进行客观比较,请尝试保存为png。
#1
0
Well, if you already have the images as (e.g. NumPy) arrays in memory, saving them using numpy.save
or h5py
is pretty much optimal, as both store the data in binary form (as compared to e.g. numpy.savetxt
). To get even smaller file sizes, you can make use of one of the compression filters of HDF5
/h5py
.
好吧,如果您已经将图像作为(例如NumPy)数组存储在内存中,则使用numpy.save或h5py保存它们是非常优化的,因为它们都以二进制形式存储数据(与例如numpy.savetxt相比)。要获得更小的文件大小,您可以使用HDF5 / h5py的压缩过滤器之一。
The reason that you can get even lower file sizes by saving as jpeg is because this is a lossy compression format, meaning that you actually loose data. To make an objective comparison between "raw data" and "real image" formats, try saving to png instead.
通过保存为jpeg可以获得更低文件大小的原因是因为这是一种有损压缩格式,这意味着您实际上是在丢失数据。要在“原始数据”和“实际图像”格式之间进行客观比较,请尝试保存为png。