python / cython中绝对最快的查找

I'd like to do a lookup mapping 32bit integer => 32bit integer.

我想做一个查找映射32位整数=> 32位整数。

The input keys aren't necessary contiguous nor cover 2^32 -1 (nor do I want this in-memory to consume that much space!).

输入键没有必要连续和封面2 ^ 32 1(我也不希望这个内存消耗那么多空间!)。

The use case is for a poker evaluator, so doing a lookup must be as fast as possible. Perfect hashing would be nice, but that might be a bit out of scope.

该用例用于扑克求值器，所以查找必须尽可能快。完美的哈希会很好，但那可能有点超出范围。

I feel like the answer is some kind of cython solution, but I'm not sure about the underpinnings of cython and if it really does any good with Python's dict() type. Of course a flat array with just a simple offset jump would be super fast, but then I'm allocating 2^32 - 1 places in memory for the table, which I don't want.

我觉得答案是某种cython解决方案，但我不确定cython的基础是什么，也不确定它对Python的dict()类型是否真正有用。当然一个平面阵列只有一个简单的抵消跳非常快,但后来我分配2 ^ 32 - 1表在内存的地方,我不想要。

Any tips / strategies? Absolute speed with minimal memory footprint is the goal.

任何建议/策略?以最小内存占用为目标的绝对速度。

3 个解决方案

#1

First, you should actually define what "fast enough" means to you, before you do anything else. You can always make something faster, so you need to set a target so you don't go insane. It is perfectly reasonable for this target to be dual-headed - say something like "Mapping lookups must execute in these parameters (min/max/mean), and when/if we hit those numbers we're willing to spend X more development hours to optimize even further, but then we'll stop."

首先，在做其他事情之前，你应该明确“足够快”对你意味着什么。你总是可以做得更快，所以你需要设定一个目标，这样你就不会发疯。对于这个目标来说，这是完全合理的，比如“映射查找必须在这些参数中执行(最小值/最大值/平均值)，并且当/如果我们达到这些数字，我们愿意花费X更多的开发时间来进一步优化，但是我们将停止。”

Second, the very first thing you should do to make this faster is to copy the code in Objects/dictobject.c in the Cpython source tree (make something new like intdict.c or something) and then modify it so that the keys are not python objects. Chasing after a better hash function will not likely be a good use of your time for integers, but eliminating INCREF/DECREF and PyObject_RichCompareBool calls for your keys will be a huge win. Since you're not deleting keys you could also elide any checks for dummy values (which exist to preserve the collision traversal for deleted entries), although it's possible that you'll get most of that win for free simply by having better branch prediction for your new object.

第二，您要做的第一件事就是在object /dictobject中复制代码。c在Cpython源代码树中(创建一些新的内容，比如intdict。然后修改它，使键不是python对象。追求一个更好的散列函数可能不会很好地利用整数的时间，但是消除INCREF/ decf和PyObject_RichCompareBool对键的调用将是一个巨大的胜利。由于您没有删除键，所以您也可以省略对假值的检查(这些假值存在是为了保存被删除条目的冲突遍历)，尽管您可以通过对新对象进行更好的分支预测来免费获得大部分的胜利。

#2

You aren't smart enough to write something faster than dict. Don't feel bad; 99.99999% of the people on the planet aren't. Use a dict.

你不够聪明，写不出比字典快的东西。地球上99.99999%的人不是。使用一个字典

#3

You are describing a perfect use case for a hash indexed collection. You are also describing a perfect scenario for the strategy of write it first, optimise it second.

您正在描述哈希索引集合的完美用例。您还描述了一个完美的方案，即首先编写它，然后优化它。

So start with the Python dict. It's fast and it absolutely will do the job you need.

所以，从Python词典开始吧，它速度很快，绝对可以满足你的需要。

Then benchmark it. Figure out how fast it needs to go, and how near you are. Then 3 choices.

然后基准。弄清楚它需要多快，以及你有多近。3选择。

It's fast enough. You're done.
这是不够快。你就完成了。
It's nearly fast enough, say within about a factor of two. Write your own hash indexing, paying attention to the hash function and the collision strategy.
这已经够快的了，在大约两倍的范围内。编写自己的哈希索引，注意哈希函数和冲突策略。
It's much too slow. You're dead. There is nothing simple that will give you a 10x or 100x improvement. At least you didn't waste any time on a better hash index.
太缓慢。你死了。没有什么简单的东西能让你得到10倍或100倍的提升。至少你没有浪费任何时间在一个更好的哈希指数上。

#1

#2