I am searching for a key value store that can handle values with a size of some Gigabytes. I have had a look on Riak, Redis, CouchDb, MongoDB.
我正在寻找一个密钥值存储库,它可以处理大小为gb的值。我看过Riak, Redis, CouchDb, MongoDB。
I want to store a workspace of a user (equals to a directory in filesystem, recursively with subdirectories and files in it) in this DB. Of course I could use the file system but then I dont't have features such as caching in RAM, failover solution, backup and replication/clustering that are supported by Redis for instance.
我想在这个DB中存储一个用户的工作空间(相当于文件系统中的一个目录,递归地包含子目录和文件)。当然,我可以使用文件系统,但是我没有诸如在RAM中缓存、故障转移解决方案、备份和复制/集群等由Redis支持的特性。
This implies that most of the values saved will be binary data, eventually some Gigabytes big, as one file in a workspace is mapped to one key-value tupel.
这意味着,保存的大多数值将是二进制数据,最终大小将达到几g,因为工作空间中的一个文件映射到一个键值tupel。
Has anyone some experiences with any of these products?
有人对这些产品有什么经验吗?
2 个解决方案
#1
2
First off, getting an MD5 or CRC32 from data size of GB is going to be painfully expensive computationally. Probably better to avoid that. How about store the data in a file, and index the filename?
首先,从GB大小的数据中获取MD5或CRC32在计算上将非常昂贵。最好避免这种情况。如何将数据存储在文件中,并索引文件名?
If you insist, though, my suggestion is still to just store the hash, not the entire data value, with a lookup array/table to the final data location. Safeness of this approach (non-unique possibility) will vary directly with the number of large samples. The longer the hash you create -- 32bit vs 64bit vs 1024bit, etc -- the safer it gets, too. Most any dictionary system in a programming language, or a database engine, will have a binary data storage mechanism. Failing that, you could store a string of the Hex value corresponding to the hashed number in a char column.
不过,如果您坚持这样做,我的建议仍然是使用查找数组/表存储散列,而不是整个数据值。这种方法的安全性(非唯一的可能性)将直接随着大样本的数量而变化。你创建的哈希越长——32位比64位比1024位等等——它也越安全。在编程语言或数据库引擎中,大多数的字典系统都有一个二进制数据存储机制。如果做不到这一点,您可以在char列中存储与散列数对应的十六进制值的字符串。
#2
1
We are now using MongoDB, as it supports large binary values, is very popular and has a large user base. Maybe we are going to switch to another store, but currently it looks very good!
我们现在使用MongoDB,因为它支持较大的二进制值,非常流行,并且有很大的用户基础。也许我们会转到另一家商店,但目前看起来很不错!
#1
2
First off, getting an MD5 or CRC32 from data size of GB is going to be painfully expensive computationally. Probably better to avoid that. How about store the data in a file, and index the filename?
首先,从GB大小的数据中获取MD5或CRC32在计算上将非常昂贵。最好避免这种情况。如何将数据存储在文件中,并索引文件名?
If you insist, though, my suggestion is still to just store the hash, not the entire data value, with a lookup array/table to the final data location. Safeness of this approach (non-unique possibility) will vary directly with the number of large samples. The longer the hash you create -- 32bit vs 64bit vs 1024bit, etc -- the safer it gets, too. Most any dictionary system in a programming language, or a database engine, will have a binary data storage mechanism. Failing that, you could store a string of the Hex value corresponding to the hashed number in a char column.
不过,如果您坚持这样做,我的建议仍然是使用查找数组/表存储散列,而不是整个数据值。这种方法的安全性(非唯一的可能性)将直接随着大样本的数量而变化。你创建的哈希越长——32位比64位比1024位等等——它也越安全。在编程语言或数据库引擎中,大多数的字典系统都有一个二进制数据存储机制。如果做不到这一点,您可以在char列中存储与散列数对应的十六进制值的字符串。
#2
1
We are now using MongoDB, as it supports large binary values, is very popular and has a large user base. Maybe we are going to switch to another store, but currently it looks very good!
我们现在使用MongoDB,因为它支持较大的二进制值,非常流行,并且有很大的用户基础。也许我们会转到另一家商店,但目前看起来很不错!