适用于Android的内存友好型快速键值访问解决方案

时间:2021-11-28 07:33:26

I have an Android application that iterates through an array of thousands of integers and it uses them as key values to access pairs of integers (let us call them id's) in order to make calculations with them. It needs to do it as fast as possible and in the end, it returns a result which is crucial to the application.

我有一个Android应用程序,它遍历数千个整数的数组,并使用它们作为键值来访问整数对(让我们称它们为id),以便用它们进行计算。它需要尽可能快地完成它,最后它返回一个对应用程序至关重要的结果。

I tried loading a HashMap into the memory for fast access to those numbers but it resulted in OOM Exception. I also tried writing those id's to a RandomAccessFile and storing their offsets on the file to another HashMap but it was way too slow. Also, the new HashMap that only stores the offsets is still occupying a large memory.

我尝试将HashMap加载到内存中以便快速访问这些数字,但这导致了OOM异常。我也尝试将这些id写入RandomAccessFile并将它们的偏移量存储在另一个HashMap上,但速度太慢了。此外,仅存储偏移量的新HashMap仍然占用大量内存。

Now I am considering SQLite but I am not sure if it will be any faster. Are there any structures or libraries that could help me with that?

现在我正在考虑SQLite,但我不确定它是否会更快。是否有任何结构或库可以帮助我?

EDIT: Number of keys are more than 20 million whereas I only need to access thousands of them. I do not know which ones I will access beforehand because it changes with user input.

编辑:密钥数量超过2000万,而我只需要访问数千个。我不知道我将事先访问哪些,因为它随用户输入而变化。

3 个解决方案

#1


4  

You could use Trove's TIntLongHashMap to map primitive ints to primitive longs (which store the ints of your value pair). This saves you the object overhead of a plain vanilla Map, which forces you to use wrapper types.

您可以使用Trove的TIntLongHashMap将原始整数映射到原始long(存储值对的整数)。这样可以节省普通香草Map的对象开销,这会强制您使用包装器类型。

EDIT

编辑

Since your update states you have more than 20 million mappings, there will likely be more space-efficient structures than a hash map. An approach to partition your keys into buckets, combined with some sub-key compression will likely save you half the memory over even the most efficient hash map implementation.

由于您的更新声明您有超过2000万个映射,因此可能会有更多节省空间的结构而不是哈希映射。将密钥划分为桶的方法与一些子密钥压缩相结合,即使是最有效的哈希映射实现,也可以节省一半的内存。

#2


1  

SQLite is an embedded relational database, which uses indexes. I would bet it is much faster than using RandomAccessFile. You can give it a try.

SQLite是一个嵌入式关系数据库,它使用索引。我敢打赌它比使用RandomAccessFile快得多。你可以尝试一下。

#3


1  

My suggestion is to rearrange the keys in Buckets - what i mean is identify (more or less) the distribution of your keys, then make files that corresponds to each range of keys (the point is that every file must contain just as much integers that can get in memory and no more then that) then when you search for a key, you just read the whole file to the memory and look for it.

我的建议是重新排列Buckets中的密钥 - 我的意思是识别(或多或少)密钥的分布,然后创建与每个密钥范围相对应的文件(关键是每个文件必须包含尽可能多的整数然后当你搜索一个键时,你只需要将整个文件读入内存并查找它。

exemple, assuming the distribution of the key is uniform, store 500k values corresponding to the 0-500k key values, 500k values corresponding to 500k-1mil keys and so on...

例如,假设密钥的分布是均匀的,存储对应于0-500k密钥值的500k值,对应于500k-1mil密钥的500k值,依此类推......

EDIT : if you did try this approach, and it still went slow, i still have some tricks in my sleaves:

编辑:如果你尝试这种方法,它仍然很慢,我仍然有一些技巧:

  1. First make sure that your division is actually close to equal between all the buckets.
  2. 首先确保你的师在所有桶之间实际上接近相等。
  3. Try to make the buckets smaller, by making more buckets.
  4. 通过制造更多铲斗,尝试使铲斗更小。
  5. The idea about correct division to buckets by ranges is that when you search for a key, you go to the corresponding range bucket and The key either in it or that it is not in the whole collection. so there is no point on Concurnetly reading another bucket.
  6. 关于按范围正确划分桶的想法是,当您搜索某个键时,您将转到相应的范围存储桶和该键中的键,或者它不在整个集合中。因此Concurnetly读取另一个桶是没有意义的。
  7. I never done that, cause im not sure concurrency works on I\O's, but it may be helpfull to Read the whole file with 2 threads one from top to bottom and the other from bottom to top until they meet. (or something like that)
  8. 我从来没有这样做过,因为我不确定并发是否可以在I \ O上运行,但是从顶部到底部读取整个文件可能会有所帮助,另一个从下到上直到它们相遇。 (或类似的东西)
  9. While you read the whole bucket into memory, split it to 3-4 arraylists, Run 3-4 working threads to search your key on each of the arrays, the search must end way faster then.
  10. 当你将整个桶读入内存时,将其拆分为3-4个arraylists,运行3-4个工作线程来搜索每个阵列上的密钥,搜索必须以更快的速度结束。

#1


4  

You could use Trove's TIntLongHashMap to map primitive ints to primitive longs (which store the ints of your value pair). This saves you the object overhead of a plain vanilla Map, which forces you to use wrapper types.

您可以使用Trove的TIntLongHashMap将原始整数映射到原始long(存储值对的整数)。这样可以节省普通香草Map的对象开销,这会强制您使用包装器类型。

EDIT

编辑

Since your update states you have more than 20 million mappings, there will likely be more space-efficient structures than a hash map. An approach to partition your keys into buckets, combined with some sub-key compression will likely save you half the memory over even the most efficient hash map implementation.

由于您的更新声明您有超过2000万个映射,因此可能会有更多节省空间的结构而不是哈希映射。将密钥划分为桶的方法与一些子密钥压缩相结合,即使是最有效的哈希映射实现,也可以节省一半的内存。

#2


1  

SQLite is an embedded relational database, which uses indexes. I would bet it is much faster than using RandomAccessFile. You can give it a try.

SQLite是一个嵌入式关系数据库,它使用索引。我敢打赌它比使用RandomAccessFile快得多。你可以尝试一下。

#3


1  

My suggestion is to rearrange the keys in Buckets - what i mean is identify (more or less) the distribution of your keys, then make files that corresponds to each range of keys (the point is that every file must contain just as much integers that can get in memory and no more then that) then when you search for a key, you just read the whole file to the memory and look for it.

我的建议是重新排列Buckets中的密钥 - 我的意思是识别(或多或少)密钥的分布,然后创建与每个密钥范围相对应的文件(关键是每个文件必须包含尽可能多的整数然后当你搜索一个键时,你只需要将整个文件读入内存并查找它。

exemple, assuming the distribution of the key is uniform, store 500k values corresponding to the 0-500k key values, 500k values corresponding to 500k-1mil keys and so on...

例如,假设密钥的分布是均匀的,存储对应于0-500k密钥值的500k值,对应于500k-1mil密钥的500k值,依此类推......

EDIT : if you did try this approach, and it still went slow, i still have some tricks in my sleaves:

编辑:如果你尝试这种方法,它仍然很慢,我仍然有一些技巧:

  1. First make sure that your division is actually close to equal between all the buckets.
  2. 首先确保你的师在所有桶之间实际上接近相等。
  3. Try to make the buckets smaller, by making more buckets.
  4. 通过制造更多铲斗,尝试使铲斗更小。
  5. The idea about correct division to buckets by ranges is that when you search for a key, you go to the corresponding range bucket and The key either in it or that it is not in the whole collection. so there is no point on Concurnetly reading another bucket.
  6. 关于按范围正确划分桶的想法是,当您搜索某个键时,您将转到相应的范围存储桶和该键中的键,或者它不在整个集合中。因此Concurnetly读取另一个桶是没有意义的。
  7. I never done that, cause im not sure concurrency works on I\O's, but it may be helpfull to Read the whole file with 2 threads one from top to bottom and the other from bottom to top until they meet. (or something like that)
  8. 我从来没有这样做过,因为我不确定并发是否可以在I \ O上运行,但是从顶部到底部读取整个文件可能会有所帮助,另一个从下到上直到它们相遇。 (或类似的东西)
  9. While you read the whole bucket into memory, split it to 3-4 arraylists, Run 3-4 working threads to search your key on each of the arrays, the search must end way faster then.
  10. 当你将整个桶读入内存时,将其拆分为3-4个arraylists,运行3-4个工作线程来搜索每个阵列上的密钥,搜索必须以更快的速度结束。