This question comes up after reading a comment in this question:
这个问题出现在阅读了这个问题的评论之后:
数据库设计
When you create a many-to-many table, should you create a composite primary key on the two foreign key columns, or create a auto-increment surrogate "ID" primary key, and just put indexes on your two FK columns (and maybe a unique constraint)? What are the implications on performance for inserting new records/re-indexing in each case?
在创建多对多表时,应该在两个外键列上创建复合主键,还是创建一个自动递增的代理“ID”主键,并将索引放在两个FK列上(可能还有一个惟一的约束)?在每种情况下插入新记录/重新索引对性能有什么影响?
Basically, this:
基本上,这个:
PartDevice
----------
PartID (PK/FK)
DeviceID (PK/FK)
vs. this:
与这个:
PartDevice
----------
ID (PK/auto-increment)
PartID (FK)
DeviceID (FK)
The commenter says:
评论者说:
making the two IDs the PK means the table is physically sorted on the disk in that order. So if we insert (Part1/Device1), (Part1/Device2), (Part2/Device3), then (Part 1/Device3) the database will have to break the table apart and insert the last one between entries 2 and 3. For many records, this becomes very problematic as it involves shuffling hundreds, thousands, or millions of records every time one is added. By contrast, an autoincrementing PK allows the new records to be tacked on to the end.
使这两个id成为PK意味着表在磁盘上按这个顺序进行物理排序。因此,如果我们插入(Part1/Device1)、(Part1/Device2)、(Part2/Device3),那么(Part1/ Device3)数据库必须将表拆分,并在条目2和3之间插入最后一个。对于许多记录来说,这变得非常有问题,因为每次添加一个记录时都要对数百、数千或数百万条记录进行拖放。相比之下,一个自动递增的PK允许将新记录附加到末尾。
The reason I'm asking is because I've always been inclined to do the composite primary key with no surrogate auto-increment column, but I'm not sure if the surrogate key is actually more performant.
我这样问的原因是,我一直倾向于在没有代理自动增量列的情况下执行复合主键,但我不确定代理键是否实际上更有性能。
4 个解决方案
#1
62
With a simple two-column many-to-many mapping, I see no real advantage to having a surrogate key. Having a primary key on (col1,col2)
is guaranteed unique (assuming your col1
and col2
values in the referenced tables are unique) and a separate index on (col2,col1)
will catch those cases where the opposite order would execute faster. The surrogate is a waste of space.
使用简单的两列多对多映射,我认为拥有代理键没有什么好处。拥有一个主键(col1,col2)是惟一的(假设引用表中的col1和col2值是惟一的),并且一个单独的索引(col2,col1)将捕获那些相反顺序执行得更快的情况。代理是浪费空间。
You won't need indexes on the individual columns since the table should only ever be used to join the two referenced tables together.
您不需要单独列上的索引,因为该表只应该用于将两个引用的表连接在一起。
That comment you refer to in the question is not worth the electrons it uses, in my opinion. It sounds like the author thinks the table is stored in an array rather than an extremely high performance balanced multi-way tree structure.
在我看来,你在问题中提到的那个评论不值得它所使用的电子。听起来作者认为表存储在一个数组中,而不是一个高性能的平衡多路树结构。
For a start, it's never necessary to store or get at the table sorted, just the index. And the index won't be stored sequentially, it'll be stored in an efficient manner to be able to be retrieved quickly.
首先,没有必要存储或获取已排序的表,只是索引。而且索引不会按顺序存储,它将以有效的方式存储,以便能够快速检索。
In addition, the vast majority of database tables are read far more often than written. That makes anything you do on the select side far more relevant than anything on the insert side.
此外,绝大多数数据库表的读取次数要比编写的多得多。这使得选择端所做的一切都比插入端更相关。
#2
13
No surrogate key is needed for link tables.
链接表不需要代理键。
One PK on (col1, col2) and another unique index on (col2, col1) is all you need
您只需要一个PK (col1, col2)和另一个惟一索引(col2, col1)
Unless you use an ORM that can't cope and dictates your DB design for you...
除非你使用一个ORM,它不能应付并指示你的DB设计。
Edit: I answered the same here: SQL: Do you need an auto-incremental primary key for Many-Many tables?
编辑:我在这里也回答了同样的问题:SQL:您是否需要一个自动递增的主键用于多个表?
#3
8
An incremental primary key could be needed if the table is referenced. There might be details in the many-to-many table which needed to be pulled up from another table using the incremental primary key.
如果表被引用,可能需要一个增量主键。多对多表中可能有一些细节需要使用增量主键从另一个表中提取。
for example
例如
PartDevice
----------
ID (PK/auto-increment)
PartID (FK)
DeviceID (FK)
Other Details
It's easy to pull the 'Other Details' using PartDevice.ID as the FK. Thus the use of incremental primary key is needed.
使用部件很容易拉出“其他细节”。颗ID。因此,需要使用增量主键。
#4
4
The shortest and most direct way I can answer your question is to say that there will be a performance impact if the two tables you are linking don't have sequential primary keys. As you stated/quoted, the index for the link table will either become fragmented, or the DBMS will work harder to insert records if the link table does not have its own sequential primary key. This is the reason most people put a sequentially incrementing primary key on link tables.
我可以回答您的问题的最简短和最直接的方法是,如果您所链接的两个表没有顺序主键,那么就会对性能产生影响。正如您所说的/引用的那样,链接表的索引要么会变得支离破碎,要么如果链接表没有自己的顺序主键,DBMS将会更加努力地插入记录。这就是为什么大多数人在链接表上按顺序递增主键的原因。
#1
62
With a simple two-column many-to-many mapping, I see no real advantage to having a surrogate key. Having a primary key on (col1,col2)
is guaranteed unique (assuming your col1
and col2
values in the referenced tables are unique) and a separate index on (col2,col1)
will catch those cases where the opposite order would execute faster. The surrogate is a waste of space.
使用简单的两列多对多映射,我认为拥有代理键没有什么好处。拥有一个主键(col1,col2)是惟一的(假设引用表中的col1和col2值是惟一的),并且一个单独的索引(col2,col1)将捕获那些相反顺序执行得更快的情况。代理是浪费空间。
You won't need indexes on the individual columns since the table should only ever be used to join the two referenced tables together.
您不需要单独列上的索引,因为该表只应该用于将两个引用的表连接在一起。
That comment you refer to in the question is not worth the electrons it uses, in my opinion. It sounds like the author thinks the table is stored in an array rather than an extremely high performance balanced multi-way tree structure.
在我看来,你在问题中提到的那个评论不值得它所使用的电子。听起来作者认为表存储在一个数组中,而不是一个高性能的平衡多路树结构。
For a start, it's never necessary to store or get at the table sorted, just the index. And the index won't be stored sequentially, it'll be stored in an efficient manner to be able to be retrieved quickly.
首先,没有必要存储或获取已排序的表,只是索引。而且索引不会按顺序存储,它将以有效的方式存储,以便能够快速检索。
In addition, the vast majority of database tables are read far more often than written. That makes anything you do on the select side far more relevant than anything on the insert side.
此外,绝大多数数据库表的读取次数要比编写的多得多。这使得选择端所做的一切都比插入端更相关。
#2
13
No surrogate key is needed for link tables.
链接表不需要代理键。
One PK on (col1, col2) and another unique index on (col2, col1) is all you need
您只需要一个PK (col1, col2)和另一个惟一索引(col2, col1)
Unless you use an ORM that can't cope and dictates your DB design for you...
除非你使用一个ORM,它不能应付并指示你的DB设计。
Edit: I answered the same here: SQL: Do you need an auto-incremental primary key for Many-Many tables?
编辑:我在这里也回答了同样的问题:SQL:您是否需要一个自动递增的主键用于多个表?
#3
8
An incremental primary key could be needed if the table is referenced. There might be details in the many-to-many table which needed to be pulled up from another table using the incremental primary key.
如果表被引用,可能需要一个增量主键。多对多表中可能有一些细节需要使用增量主键从另一个表中提取。
for example
例如
PartDevice
----------
ID (PK/auto-increment)
PartID (FK)
DeviceID (FK)
Other Details
It's easy to pull the 'Other Details' using PartDevice.ID as the FK. Thus the use of incremental primary key is needed.
使用部件很容易拉出“其他细节”。颗ID。因此,需要使用增量主键。
#4
4
The shortest and most direct way I can answer your question is to say that there will be a performance impact if the two tables you are linking don't have sequential primary keys. As you stated/quoted, the index for the link table will either become fragmented, or the DBMS will work harder to insert records if the link table does not have its own sequential primary key. This is the reason most people put a sequentially incrementing primary key on link tables.
我可以回答您的问题的最简短和最直接的方法是,如果您所链接的两个表没有顺序主键,那么就会对性能产生影响。正如您所说的/引用的那样,链接表的索引要么会变得支离破碎,要么如果链接表没有自己的顺序主键,DBMS将会更加努力地插入记录。这就是为什么大多数人在链接表上按顺序递增主键的原因。