分布式哈希表（DHT）的简单基本解释

Could any one give an explanation on how a DHT works?

任何人都可以解释一下DHT的工作原理吗?

Nothing too heavy, just the basics.

没什么太重,只是基础。

5 个解决方案

#1

214

Ok, they're fundamentally a pretty simple idea. A DHT gives you a dictionary-like interface, but the nodes are distributed across the network. The trick with DHTs is that the node that gets to store a particular key is found by hashing that key, so in effect your hash-table buckets are now independent nodes in a network.

好吧,它们基本上是一个非常简单的想法。 DHT为您提供类似字典的界面,但节点分布在整个网络中。 DHT的技巧是通过散列该密钥来找到存储特定密钥的节点,因此实际上您的散列表桶现在是网络中的独立节点。

This gives a lot of fault-tolerance and reliability, and possibly some performance benefit, but it also throws up a lot of headaches. For example, what happens when a node leaves the network, by failing or otherwise? And how do you redistribute keys when a node joins so that the load is roughly balanced. Come to think of it, how do you evenly distribute keys anyhow? And when a node joins, how do you avoid rehashing everything? (Remember you'd have to do this in a normal hash table if you increase the number of buckets).

这提供了很多容错和可靠性,并且可能带来一些性能优势,但它也引起了很多麻烦。例如,当节点离开网络时,失败或其他情况会发生什么?如何在节点加入时重新分配密钥,以便负载大致平衡。想想看,无论如何均匀分配密钥?当一个节点加入时,你如何避免重复一切? (请记住,如果增加桶的数量,则必须在普通哈希表中执行此操作)。

One example DHT that tackles some of these problems is a logical ring of n nodes, each taking responsibility for 1/n of the keyspace. Once you add a node to the network, it finds a place on the ring to sit between two other nodes, and takes responsibility for some of the keys in its sibling nodes. The beauty of this approach is that none of the other nodes in the ring are affected; only the two sibling nodes have to redistribute keys.

解决其中一些问题的DHT的一个示例是n个节点的逻辑环,每个节点负责密钥空间的1 / n。将节点添加到网络后,它会在环上找到一个位于两个其他节点之间的位置,并负责其兄弟节点中的某些键。这种方法的优点在于环中没有其他节点受到影响;只有两个兄弟节点必须重新分配密钥。

For example, say in a three node ring the first node has keys 0-10, the second 11-20 and the third 21-30. If a fourth node comes along and inserts itself between nodes 3 and 0 (remember, they're in a ring), it can take responsibility for say half of 3's keyspace, so now it deals with 26-30 and node 3 deals with 21-25.

例如,假设在三节点环中,第一节点具有键0-10,第二节点11-20和第三节点21-30。如果第四个节点出现并在节点3和0之间插入(记住,它们在一个环中),它可以负责说3个键空间的一半,所以现在它处理26-30和节点3处理21 -25。

There are many other overlay structures such as this that use content-based routing to find the right node on which to store a key. Locating a key in a ring requires searching round the ring one node at a time (unless you keep a local look-up table, problematic in a DHT of thousands of nodes), which is O(n)-hop routing. Other structures - including augmented rings - guarantee O(log n)-hop routing, and some claim to O(1)-hop routing at the cost of more maintenance.

还有许多其他覆盖结构,例如使用基于内容的路由来查找存储密钥的正确节点。将密钥定位在环中需要一次围绕一个节点搜索环(除非您保留本地查找表,在数千个节点的DHT中存在问题),即O(n)-hop路由。其他结构 - 包括增强环 - 保证O(log n)-hop路由,有些声称O(1)-hop路由以更多维护为代价。

Read the wikipedia page, and if you really want to know in a bit of depth, check out this coursepage at Harvard which has a pretty comprehensive reading list.

阅读*页面,如果你真的想深入了解一下,请查看哈佛大学的这篇文章,它有一个非常全面的阅读清单。

#2

DHTs provide the same type of interface to the user as a normal hashtable (look up a value by key), but the data is distributed over an arbitrary number of connected nodes. Wikipedia has a good basic introduction that I would essentially be regurgitating if I write more -

DHT为用户提供与普通哈希表相同类型的接口(按键查找值),但数据分布在任意数量的连接节点上。*有一个很好的基本介绍,如果我写得更多,我基本上会反刍 -

http://en.wikipedia.org/wiki/Distributed_hash_table

#3

I'd like to add onto HenryR's useful answer as I just had an insight into consistent hashing. A normal/naive hash lookup is a function of two variables, one of which is the number of buckets. The beauty of consistent hashing is that we eliminate the number of buckets "n", from the equation.

我想补充HenryR的有用答案,因为我只是对一致哈希的洞察力。普通/朴素哈希查找是两个变量的函数,其中一个变量是桶的数量。一致哈希的美妙之处在于我们从等式中消除了桶“n”的数量。

In naive hashing, first variable is the key of the object to be stored in the table. We'll call the key "x". The second variable is is the number of buckets, "n". So, to determine which bucket/machine the object is stored in, you have to calculate: hash(x) mod(n). Therefore, when you change the number of buckets, you also change the address at which almost every object is stored.

在幼稚哈希中,第一个变量是要存储在表中的对象的键。我们称之为“x”键。第二个变量是桶的数量,“n”。因此,要确定对象存储在哪个桶/机器中,您必须计算:hash(x)mod(n)。因此,当您更改存储桶的数量时,还会更改几乎每个对象的存储地址。

Compare this to consistent hashing. Let's define "R" as the range of a hash function. R is just some constant. In consistent hashing, the address of an object is located at hash(x)/R. Since our lookup is no longer a function of the number of buckets, we end up with less remapping when we change the number of buckets.

将此与一致哈希进行比较。让我们将“R”定义为散列函数的范围。 R只是一些常数。在一致散列中,对象的地址位于散列(x)/ R。由于我们的查找不再是存储桶数量的函数,因此当我们更改存储桶数量时,最终会减少重新映射。

http://michaelnielsen.org/blog/consistent-hashing/

#4

Check out this Wikipedia article Or this powerpoint slide

查看此Wikipedia文章或此powerpoint幻灯片

#5

-2

check out Amazon's Dynamo, it explains an simple yet novel DHT implementation based on circle consistent hashing.

看看亚马逊的Dynamo,它解释了一个基于圆一致散列的简单而新颖的DHT实现。

#1

214