固定大小键的最快持久键/值db,只插入/获取(不删除/更新)?

时间:2021-10-28 08:47:06

Given the following requirements of a persistent key/value store:

鉴于持久性键/值存储的以下要求:

  • Only fetch, insert, and full iteration of all values (for exports) are required
  • 只需要获取,插入和完全迭代所有值(用于导出)

  • No deleting values or updating values
  • 没有删除值或更新值

  • Keys are always the same size
  • 键总是相同的大小

  • Code embedded in the host application
  • 嵌入在主机应用程序中的代码

And given this usage pattern:

并考虑到这种使用模式:

  • Fetches are random
  • 提取是随机的

  • Inserts and fetches are interleaved with no predictability
  • 插入和提取是交错的,没有可预测性

  • Keys are random, and inserted in random order
  • 密钥是随机的,并以随机顺序插入

What is the best on-disk data structure/algorithm given the requirements?

根据要求,最佳的磁盘数据结构/算法是什么?

Can a custom implementation exceed the performance of LSM-based (Log Structured Merge) implementations (i.e. leveldb, rocksdb)?

自定义实现是否可以超过基于LSM(Log Structured Merge)实现(即leveldb,rocksdb)的性能?

Would a high performance custom implementation for these requirements also be considerably simpler in implementation?

这些要求的高性能定制实现在实现中是否也会相当简单?

1 个解决方案

#1


4  

While it might be possible to have better performance custom implementation for your requirements, a well-configured RocksDB should be able to beat most of such custom implementations in your case. Here is what I would config RocksDB:

虽然可能有更好的性能自定义实现满足您的要求,但配置良好的RocksDB应该能够胜过大多数此类自定义实现。这是我将配置RocksDB:

First, since you don't have updates and deletes, it's best to compact all data into big files in RocksDB. Because RocksDB sort these keys in a customizable order, having some big files gives you faster read performance as binary search across some big files is faster than across multiple small files. Typically, having big files hurt the performance of compaction --- the process of reorganizing big part of the data in RocksDB such that 1. multiple updates associated with a single key will be merged; 2. execute deletions to free disk space; 3. keep the data sorted. However, since you don't have updates and deletes, having big files give you fast read and write performance.

首先,由于您没有更新和删除,因此最好将所有数据压缩到RocksDB中的大文件中。因为RocksDB以可自定义的顺序对这些键进行排序,所以拥有一些大文件可以提高读取性能,因为跨一些大文件的二进制搜索比跨多个小文件更快。通常,拥有大文件会损害压缩的性能 - 重新组织RocksDB中大部分数据的过程,以便1.合并与单个密钥相关的多个更新; 2.执行删除以释放磁盘空间; 3.保持数据排序。但是,由于您没有更新和删除,因此使用大文件可以提供快速的读写性能。

Second, specify large bits for bloom filter, this allow you to avoid most IOs when you're likely to issue some queries where keys do not exist in RocksDB.

其次,为bloom过滤器指定大位,当你可能发出一些ROCKDB中不存在键的查询时,这可以避免大多数IO。

So a read request goes like this. First, it compares the query key with the key ranges of those big files to identify which file the query key might located. Then, once the file(s) which key-range covers the query key, it will computes the bloom bits of the query key to see if the query key may exist in those files. If so, then a binary search inside each file will be triggered to identify the matched data.

所以读取请求就是这样的。首先,它将查询密钥与这些大文件的密钥范围进行比较,以识别查询密钥可能位于哪个文件。然后,一旦密钥范围覆盖查询密钥的文件,它将计算查询密钥的bloom位以查看查询密钥是否可能存在于那些文件中。如果是,则将触发每个文件内的二进制搜索以识别匹配的数据。

As for scan operation, since RocksDB internally sort data in a user customizable order, scan can be done very efficiently using RocksDB's iterator.

至于扫描操作,由于RocksDB在内部按用户自定义顺序对数据进行排序,因此可以使用RocksDB的迭代器非常有效地进行扫描。

More information about RocksDB basics can be found in here.

有关RocksDB基础知识的更多信息,请参见此处。

#1


4  

While it might be possible to have better performance custom implementation for your requirements, a well-configured RocksDB should be able to beat most of such custom implementations in your case. Here is what I would config RocksDB:

虽然可能有更好的性能自定义实现满足您的要求,但配置良好的RocksDB应该能够胜过大多数此类自定义实现。这是我将配置RocksDB:

First, since you don't have updates and deletes, it's best to compact all data into big files in RocksDB. Because RocksDB sort these keys in a customizable order, having some big files gives you faster read performance as binary search across some big files is faster than across multiple small files. Typically, having big files hurt the performance of compaction --- the process of reorganizing big part of the data in RocksDB such that 1. multiple updates associated with a single key will be merged; 2. execute deletions to free disk space; 3. keep the data sorted. However, since you don't have updates and deletes, having big files give you fast read and write performance.

首先,由于您没有更新和删除,因此最好将所有数据压缩到RocksDB中的大文件中。因为RocksDB以可自定义的顺序对这些键进行排序,所以拥有一些大文件可以提高读取性能,因为跨一些大文件的二进制搜索比跨多个小文件更快。通常,拥有大文件会损害压缩的性能 - 重新组织RocksDB中大部分数据的过程,以便1.合并与单个密钥相关的多个更新; 2.执行删除以释放磁盘空间; 3.保持数据排序。但是,由于您没有更新和删除,因此使用大文件可以提供快速的读写性能。

Second, specify large bits for bloom filter, this allow you to avoid most IOs when you're likely to issue some queries where keys do not exist in RocksDB.

其次,为bloom过滤器指定大位,当你可能发出一些ROCKDB中不存在键的查询时,这可以避免大多数IO。

So a read request goes like this. First, it compares the query key with the key ranges of those big files to identify which file the query key might located. Then, once the file(s) which key-range covers the query key, it will computes the bloom bits of the query key to see if the query key may exist in those files. If so, then a binary search inside each file will be triggered to identify the matched data.

所以读取请求就是这样的。首先,它将查询密钥与这些大文件的密钥范围进行比较,以识别查询密钥可能位于哪个文件。然后,一旦密钥范围覆盖查询密钥的文件,它将计算查询密钥的bloom位以查看查询密钥是否可能存在于那些文件中。如果是,则将触发每个文件内的二进制搜索以识别匹配的数据。

As for scan operation, since RocksDB internally sort data in a user customizable order, scan can be done very efficiently using RocksDB's iterator.

至于扫描操作,由于RocksDB在内部按用户自定义顺序对数据进行排序,因此可以使用RocksDB的迭代器非常有效地进行扫描。

More information about RocksDB basics can be found in here.

有关RocksDB基础知识的更多信息,请参见此处。