RDBMS中有序列表最合适的数据结构？

I'm storing an ordered list of several million items in a MySQL database. Reasonably often, items need to be added or removed from the list; equally often, the position within the list of an item must be determined. I'd say the read/write ratio is about 50:50.

我在MySQL数据库中存储了数百万项的有序列表。通常,需要在列表中添加或删除项目;同样经常,必须确定项目列表中的位置。我会说读/写比率大约是50:50。

Starting with a linked-list model, I read [1] and the various models discussed there. For a strict linked-list, the adjacency list model would work just fine, but since the read/write ratio is more or less equal, I went for a divide and conquer approach using standard contiguous lists:

从链表模型开始,我阅读[1]和那里讨论的各种模型。对于严格的链表,邻接列表模型可以正常工作,但由于读/写比率或多或少相等,我采用标准连续列表进行分而治之:

Divide the entire list into 'buckets' of approximate length (say ~10000), maintaining an index of bucket sizes and their relative position within the main list. Each item is assigned to a specific bucket and keeps track of its position within that bucket.

将整个列表划分为近似长度的“桶”(比如~10000),保持桶大小的索引及其在主列表中的相对位置。每个项目都分配给一个特定的存储桶,并跟踪其在该存储桶中的位置。

With this approach, an item's position is determined by summing the sizes of the buckets that preceed the item's bucket in the list, then adding the item's position within its own bucket. To insert/remove an item from the list, the 'shifting' of items that results is localized to the bucket into which an item is being added or removed; the size of that bucket must also be updated accordingly.

通过这种方法,项目的位置是通过将列表中项目桶之前的桶的大小相加,然后在其自己的桶中添加项目的位置来确定的。要从列表中插入/删除项目,结果项目的“移位”将本地化到要添加或删除项目的存储区;该桶的大小也必须相应更新。

There is some denormalization in this approach (the bucket sizes), and it is not inherently thread-safe, even with transactions, because during a remove/insert the table of items must be queried to determine the bucket position of the item being modified, and then updated to perform the 'shift' on all the other items in that item's bucket. Unless these actions are atomic (via stored procedure maybe?) threads consistently deadlock.

在这种方法中存在一些非规范化(桶大小),即使对于事务,它也不具有本质上的线程安全性,因为在删除/插入期间必须查询项目表以确定要修改的项目的桶位置,然后更新以对该项目的存储桶中的所有其他项执行“移位”。除非这些操作是原子的(通过存储过程可能?)线程始终是死锁。

Are there any more approriate approaches to keeping this kind of data in an RDBMS? The thread-safety issue is causing me a big headache and it feels like there ought to be a better way to solve this problem than forcing me into using stored procedures.

是否有更复杂的方法将这种数据保存在RDBMS中?线程安全问题让我头痛不已,感觉应该有更好的方法来解决这个问题,而不是强迫我使用存储过程。

Many thanks, Matt.

非常感谢,马特。

[1] Database Structure for Tree Data Structure

[1]树数据结构的数据库结构

1 个解决方案

#1

If you need a linked list (not a hierarchy), you can just use the approach described in this article in my blog:

如果您需要链接列表(不是层次结构),您可以在我的博客中使用本文中描述的方法:

Sorting lists

, with this simple query:

,使用这个简单的查询:

SELECT  @r AS _parent,
        @r := (
        SELECT  id
        FROM    t_list
        WHERE   parent = _parent
        ) AS id
FROM    (
        SELECT  @r := 0
        ) vars,
        t_list

Make sure your id and parent have UNIQUE indexes defined for this to be efficient.

确保您的id和parent具有为此定义的UNIQUE索引才有效。

Replace @r := 0 with @r := @id_of_record_to_start_with to start browsing from any given id.

用@r:= @id_of_record_to_start_with替换@r:= 0以从任何给定的id开始浏览。

To find out the item's position, just reverse the query:

要找出项目的位置,只需反转查询:

SELECT  COUNT(*)
FROM    (
        SELECT  @r AS _id,
                @r := (
                SELECT  parent
                FROM    t_list
                WHERE   id = _id
                ) AS id
        FROM    (
                SELECT  @r := @item_id
                ) vars,
                t_list
        ) q

#1