I have a MySQL database, and a particular table in that database will need to be self-referencing, in a one-to-many fashion. For scalability, I need to find the most efficient solution possible. The two ways most evident to me are:
我有一个MySQL数据库,该数据库中的特定表需要以一对多的方式进行自引用。为了扩展性,我需要找到最有效的解决方案。对我来说最明显的两种方式是:
1) Add a text field to the table, and store a serialized list of primary keys there
1)向表中添加一个文本字段,并在那里存储一个主键的序列化列表
2) Keep a linker table, with each row being a one-to-one.
2)保留一个链接表,每行是一对一的。
In case #1, I see the table growing very very wide (using a spatial analogy), but in case #2, I see the linker table growing to a very large number of rows, which would slow down lookups (by far the most common operation).
在#1的情况下,我看到表变得非常宽(使用空间类比),但在#2的情况下,我看到链接器表增长到非常大的行,这将减慢查找速度(到目前为止最多)常见的操作)。
What's the most efficient manner in which to implement such a one-to-many relationship in MySQL? Or, perhaps, there is a much saner solution keeping the data all directly on the filesystem somehow, or else some other storage engine?
在MySQL中实现这种一对多关系的最有效方式是什么?或者,也许有一个更安全的解决方案,以某种方式将数据直接保存在文件系统上,或者其他一些存储引擎?
4 个解决方案
#1
Just keep a table for the "many", with a key column for the primary table.
只需为“many”保留一个表,其中包含主表的键列。
I quarantee you'll have lots of other more important problems to solve before you run into efficiency or capacity constraints in a standard industrial-strength relational dbms.
在您遇到标准工业级关系dbms中的效率或容量限制之前,我保证您还有许多其他更重要的问题需要解决。
IMHO the most likely second option (with numerous alternative products) is to use an isam.
恕我直言,最有可能的第二选择(有许多替代产品)是使用isam。
#2
If you need to do deep/recursive traversals into the data, a graph database like Neo4j (where I'm on the team) is a good choice. You'll find some information in the article Should you go Beyond Relational Databases? and in this post at High Scalability. For a use case that may be similar to yours, read this thread on MetaFilter. For information on language bindings and other things you may also find the Neo4j wiki and mailing list useful.
如果你需要对数据进行深度/递归遍历,像Neo4j这样的图形数据库(我在团队中)是一个不错的选择。你会在文章中找到一些信息你应该超越关系数据库吗?在这篇高可扩展性的帖子中。对于可能类似于您的用例,请在MetaFilter上阅读此主题。有关语言绑定和其他内容的信息,您还可以找到有用的Neo4j wiki和邮件列表。
#3
Not so much an answer but a few questions and a possible approach....
不是答案,而是一些问题和可能的方法....
If you want to make the table self referencing and only use one field ... there are some options. A calculated maskable 'join' field describes a way to associate many rows with each other.
如果你想让表自引用并且只使用一个字段......有一些选项。计算出的可屏蔽“连接”字段描述了将多个行相互关联的方法。
The best solution will probably consider the nature of the data and relationships? What is the nature of the data and lookups? What sort of relationship are you trying to contain? Association? Related? Parent/Children?
最好的解决方案可能会考虑数据和关系的性质吗?数据和查找的性质是什么?你试图包含什么样的关系?协会?有关?家长/孩子?
#4
My first comment would be that you'll get better responses if you can describe how the data will be used (frequency of adds/updates vs lookups, adds vs updates, etc) in addition to what you've already described. That being said, my first thought would be to just go with a generic representation of
我的第一个评论是,除了您已经描述的内容之外,如果您可以描述数据的使用方式(添加/更新与查找的频率,添加与更新等),您将获得更好的响应。话虽如此,我的第一个想法是只使用通用表示法
CREATE TABLE IF NOT EXISTS one_table (
`one_id` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'The The ID of the items in the one table' ,
... other data
)
CREATE TABLE IF NOT EXISTS many_table (
`many_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'the id of the items in the many table',
`one_id` INT UNSIGNED NOT NULL
COMMENT 'The ID of the item in the one table that this many item belongs to' ,
... other data
)
Making sure, of course, to create an index on the one_id in both tables.
当然,确保在两个表中的one_id上创建索引。
#1
Just keep a table for the "many", with a key column for the primary table.
只需为“many”保留一个表,其中包含主表的键列。
I quarantee you'll have lots of other more important problems to solve before you run into efficiency or capacity constraints in a standard industrial-strength relational dbms.
在您遇到标准工业级关系dbms中的效率或容量限制之前,我保证您还有许多其他更重要的问题需要解决。
IMHO the most likely second option (with numerous alternative products) is to use an isam.
恕我直言,最有可能的第二选择(有许多替代产品)是使用isam。
#2
If you need to do deep/recursive traversals into the data, a graph database like Neo4j (where I'm on the team) is a good choice. You'll find some information in the article Should you go Beyond Relational Databases? and in this post at High Scalability. For a use case that may be similar to yours, read this thread on MetaFilter. For information on language bindings and other things you may also find the Neo4j wiki and mailing list useful.
如果你需要对数据进行深度/递归遍历,像Neo4j这样的图形数据库(我在团队中)是一个不错的选择。你会在文章中找到一些信息你应该超越关系数据库吗?在这篇高可扩展性的帖子中。对于可能类似于您的用例,请在MetaFilter上阅读此主题。有关语言绑定和其他内容的信息,您还可以找到有用的Neo4j wiki和邮件列表。
#3
Not so much an answer but a few questions and a possible approach....
不是答案,而是一些问题和可能的方法....
If you want to make the table self referencing and only use one field ... there are some options. A calculated maskable 'join' field describes a way to associate many rows with each other.
如果你想让表自引用并且只使用一个字段......有一些选项。计算出的可屏蔽“连接”字段描述了将多个行相互关联的方法。
The best solution will probably consider the nature of the data and relationships? What is the nature of the data and lookups? What sort of relationship are you trying to contain? Association? Related? Parent/Children?
最好的解决方案可能会考虑数据和关系的性质吗?数据和查找的性质是什么?你试图包含什么样的关系?协会?有关?家长/孩子?
#4
My first comment would be that you'll get better responses if you can describe how the data will be used (frequency of adds/updates vs lookups, adds vs updates, etc) in addition to what you've already described. That being said, my first thought would be to just go with a generic representation of
我的第一个评论是,除了您已经描述的内容之外,如果您可以描述数据的使用方式(添加/更新与查找的频率,添加与更新等),您将获得更好的响应。话虽如此,我的第一个想法是只使用通用表示法
CREATE TABLE IF NOT EXISTS one_table (
`one_id` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'The The ID of the items in the one table' ,
... other data
)
CREATE TABLE IF NOT EXISTS many_table (
`many_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT 'the id of the items in the many table',
`one_id` INT UNSIGNED NOT NULL
COMMENT 'The ID of the item in the one table that this many item belongs to' ,
... other data
)
Making sure, of course, to create an index on the one_id in both tables.
当然,确保在两个表中的one_id上创建索引。