为什么在连接表上使用主键不好?

时间:2022-10-05 12:17:07

I was watching a screencast where the author said it is not good to have a primary key on a join table but didn't explain why.

我正在观看一段视频,作者说在连接表上使用主键不好,但没有解释原因。

The join table in the example had two columns defined in a Rails migration and the author added an index to each of the columns but no primary key.

示例中的join表有两个在Rails迁移中定义的列,作者为每个列添加了索引,但是没有主键。

Why is it not good to have a primary key in this example?

为什么在这个例子中使用主键不好?

create_table :categories_posts, :id => false do |t|
  t.column :category_id, :integer, :null => false
  t.column :post_id, :integer, :null => false
end
add_index :categories_posts, :category_id
add_index :categories_posts, :post_id

EDIT: As I mentioned to Cletus, I can understand the potential usefulness of an auto number field as a primary key even for a join table. However in the example I listed above, the author explicitly avoids creating an auto number field with the syntax ":id => false" in the "create table" statement. Normally Rails would automatically add an auto-number id field to a table created in a migration like this and this would become the primary key. But for this join table, the author specifically prevented it. I wasn't sure why he decided to follow this approach.

编辑:正如我对Cletus所提到的,我可以理解自动编号字段作为一个主键的潜在用处,即使对于连接表也是如此。但是,在上面列出的示例中,作者显式地避免在“create table”语句中使用语法“:id => false”创建一个自动编号字段。通常,Rails会自动向这样迁移中创建的表添加自动编号id字段,这将成为主键。但是对于这个连接表,作者特别阻止了它。我不知道他为什么决定采用这种方法。

8 个解决方案

#1


39  

Some notes:

一些注意事项:

  1. The combination of category_id and post_id is unique in of itself, so an additional ID column is redundant and wasteful
  2. category_id和post_id的组合本身是唯一的,所以额外的ID列是多余的和浪费的
  3. The phrase "not good to have a primary key" is incorrect in the screencast. You still have a Primary Key -- it is just made up of the two columns (e.g. CREATE TABLE foo( cid, pid, PRIMARY KEY( cid, pid ) ). For people who are used to tacking on ID values everywhere this may seem odd but in relational theory it is quite correct and natural; the screencast author would better have said it is "not good to have an implicit integer attribute called 'ID' as the primary key".
  4. “主键不好”这句话在屏幕上是不正确的。您仍然有一个主键——它仅仅由两列组成(例如,创建表foo(cid、pid、主键(cid、pid))。对于那些习惯在任何地方使用ID值的人来说,这可能看起来很奇怪,但在关系理论中,这是非常正确和自然的;screencast作者最好说,“把一个叫做‘ID’的隐式整数属性作为主键是不好的”。
  5. It is redundant to have the extra column because you will place a unique index on the combination of category_id and post_id anyway to ensure no duplicate rows are inserted
  6. 拥有额外的列是多余的,因为您将在category_id和post_id的组合上放置一个唯一的索引,以确保不会插入重复的行
  7. Finally, although common nomenclature is to call it a "composite key" this is also redundant. The term "key" in relational theory is actually the set of zero or more attributes that uniquely identify the row, so it is fine to say that the primary key is category_id, post_id
  8. 最后,尽管常用的命名法是称它为“复合键”,但这也是多余的。关系理论中的术语“key”实际上是唯一标识行的零个或多个属性的集合,所以可以这样说,主键是category_id, post_id
  9. Place the MOST SELECTIVE column FIRST in the primary key declaration. A discussion of the construction of b(+/*) trees is out of the scope of this answer ( for some lower-level discussion see: http://www.akadia.com/services/ora_index_selectivity.html ) but in your case, you'd probably want it on post_id, category_id since post_id will show up less often in the table and thus make the index more useful. Of course, since the table is so small and the index will be, essentially, the data rows, this is not very important. It would be in broader cases where the table is wider.
  10. 在主键声明中首先放置最具选择性的列。建设的讨论b(+ / *)树的这个答案的范围(对于一些低级的讨论见:http://www.akadia.com/services/ora_index_selectivity.html),但在你的情况下,你可能希望它post_id,category_id添加自post_id表中会出现较少,从而使该指数更有用。当然,由于表非常小,索引本质上是数据行,所以这并不重要。在更广泛的情况下,情况会更复杂。

#2


3  

A DBA would tell you that the primary key in this case is actually the combination of the two FK columns. Since Rails/ActiveRecord doesn't play nice with composite PKs (by default, at least), that may be the reason.

DBA会告诉您,本例中的主键实际上是两个FK列的组合。由于Rails/ActiveRecord与复合PKs的性能不太好(至少在默认情况下是这样),这可能就是原因所在。

#3


3  

The combination of foreign keys can be a primary key (called a composite primary key). Personally I favour using a technical primary key instead of that (auto number field, sequence, etc). Why? Well, it makes it much easier to identify the record, which you may need to do if you're going to delete it.

外键的组合可以是主键(称为复合主键)。我个人倾向于使用技术主键,而不是(自动编号字段、序列等)。为什么?它使识别记录变得更容易,如果你要删除它,你可能需要这么做。

Think about it: if you're going to present a Webpage of all the linkages, having a primary key to identify the record makes it much easier.

想想看:如果你要展示一个包含所有链接的网页,有一个主键来识别记录会更容易。

#4


3  

Basically because there's no need for it. The combination of the two foreign key field adequately uniquely identifies any row.

基本上是因为没有必要这么做。这两个外键字段的组合充分地标识了任何一行。

But that merely says why it's not a Good Idea.... but why would it be a Bad Idea?

但这只是说为什么不是一个好主意....但为什么这是个坏主意呢?

Consider the overhead adding a identity column would add. The table would take up 50% more disk space. Worse is the index situation. With a identity field, you have to maintain the identity count, plus a second index. You'll be tripling the disk space and tripling the work the needs to be performed on every insert. With the only advantage being a slightly shorter WHERE clause in a DELETE command.

考虑添加标识列的开销,表将占用50%的磁盘空间。更糟糕的是指数情况。对于标识字段,您必须维护标识计数,加上第二个索引。您将使磁盘空间增加两倍,并将对每个插入执行的工作增加三倍。唯一的优点是删除命令中的WHERE子句稍微短一些。

On the other hand, If the composite key fields are the entire table, then the index can be the table.

另一方面,如果组合键字段是整个表,则索引可以是表。

#5


3  

It is a bad idea not to have a primary key on any table, period (if the DBMS is a relational DBMS - or an SQL DBMS). Primary keys are a crucial part of the integrity of your database.

在任何表、周期(如果DBMS是关系数据库管理系统或SQL数据库管理系统)上都没有主键,这不是一个好主意。主键是数据库完整性的关键部分。

I suppose if you don't mind your database being inaccurate and providing incorrect answers every so often, then you could do without...but most people want accurate answers from their DBMS and for such people, primary keys are crucial.

我想,如果您不介意您的数据库不准确,并且经常提供不正确的答案,那么您可以不用……但是大多数人都想从他们的数据库管理系统中得到准确的答案,对于这些人来说,主键是至关重要的。

#6


2  

Placing the most selective column first should only be relevant in the INDEX declaration. In the KEY declaration, it should not matter (because, as has been correctly pointed out, the KEY is a SET, and inside a set, order doesn't matter - the set {a1,a2} is the same set as {a2,a1}).

将最具选择性的列放在前面应该只与索引声明相关。在关键的声明中,它不应该是重要的(因为,正如正确指出的那样,关键是一个集合,而在一个集合中,顺序无关紧要——集合{a1,a2}是与{a2,a1}相同的集合)。

If a DBMS product is such that ordering of attributes inside a KEY declaration makes a difference, then that DBMS product is guilty of not properly distinguishing between the logical design of a database (the part where you do the KEY declaration) and the physical design of the database (the part where you do the INDEX declaration).

如果一个DBMS产品是这样排序的属性在一个关键的声明一个区别,那DBMS产品是有罪不正确区分数据库的逻辑设计(在你声明的关键部分)和数据库的物理设计(做索引宣言的一部分)。

#7


2  

I wanted to comment on the following comment : "It is not correct to say zero or more".

我想评论一下下面的评论:“说零或更多是不正确的。”

I wanted to remark that the text to which this comment was added simply did not contain the text "zero or more", so the author of the comment I wanted to comment on was criticizing someone else for something that hadn't been said.

我想指出的是,添加这条评论的文本并不包含“零或更多”的文本,所以我想要评论的这条评论的作者是在批评别人没有说过的话。

I also wanted to comment that it is not correct to say that it is not correct say "zero or more". Relational theory as commonly known today among the few people who still bother to study the details of that theory, actually REQUIRES the possibility of a key with no attributes.

我还想说,说“零或更多”不正确是不对的。关系理论现在在少数几个还在研究这个理论的细节的人当中是众所周知的,它实际上需要一个没有属性的键的可能性。

But when I pressed the button "comment", the system responded to me that commenting requires a reputation score of 50 (or some such).

但当我按下“评论”按钮时,系统会回复我说评论需要50分(或者类似的分数)。

A sad illustration of how the world seems to have forgotten that science is not democracy, and that in science, the truth is not determined by whoever happens to be the majority, nor by whoever happens to have "enough reputation".

这是一个可悲的例子,说明世界似乎忘记了科学不是*,在科学中,真理不是由任何碰巧占多数的人决定的,也不是由任何碰巧有“足够声誉”的人决定的。

#8


1  

Pros of having a single PK

有一个PK的优点。

  • Uniquely identifies a row with a single value
  • 唯一地标识具有单个值的行
  • Makes it easy to reference the relationship from elsewhere if needed
  • 如果需要,可以很容易地从其他地方引用这种关系
  • Some tools want you to have a single integer value pk
  • 有些工具希望您有一个整数值pk

Cons of having a single PK

反对一个PK

  • Uses more disk space
  • 使用更多的磁盘空间
  • Need 3 indexes rather than 1
  • 需要3个索引而不是1个
  • Without a unique constraint you could end up with multiple rows for the same relationship
  • 如果没有唯一的约束,您可能会得到同一关系的多个行

Notes

笔记

  • You need to define a unique constraint if you want to avoid duplicates
  • 如果要避免重复,需要定义唯一约束
  • In my opinion don't use the single pk if you're table is going to be huge, otherwise trade off some disk space for the convenience. Yes it's wasteful, but who cares about a few MB on disk in real world applications.
  • 在我看来,如果你的表很大,不要使用单一的pk,否则为了方便而牺牲一些磁盘空间。是的,这是一种浪费,但是谁会关心实际应用程序中磁盘上的一些MB呢?

#1


39  

Some notes:

一些注意事项:

  1. The combination of category_id and post_id is unique in of itself, so an additional ID column is redundant and wasteful
  2. category_id和post_id的组合本身是唯一的,所以额外的ID列是多余的和浪费的
  3. The phrase "not good to have a primary key" is incorrect in the screencast. You still have a Primary Key -- it is just made up of the two columns (e.g. CREATE TABLE foo( cid, pid, PRIMARY KEY( cid, pid ) ). For people who are used to tacking on ID values everywhere this may seem odd but in relational theory it is quite correct and natural; the screencast author would better have said it is "not good to have an implicit integer attribute called 'ID' as the primary key".
  4. “主键不好”这句话在屏幕上是不正确的。您仍然有一个主键——它仅仅由两列组成(例如,创建表foo(cid、pid、主键(cid、pid))。对于那些习惯在任何地方使用ID值的人来说,这可能看起来很奇怪,但在关系理论中,这是非常正确和自然的;screencast作者最好说,“把一个叫做‘ID’的隐式整数属性作为主键是不好的”。
  5. It is redundant to have the extra column because you will place a unique index on the combination of category_id and post_id anyway to ensure no duplicate rows are inserted
  6. 拥有额外的列是多余的,因为您将在category_id和post_id的组合上放置一个唯一的索引,以确保不会插入重复的行
  7. Finally, although common nomenclature is to call it a "composite key" this is also redundant. The term "key" in relational theory is actually the set of zero or more attributes that uniquely identify the row, so it is fine to say that the primary key is category_id, post_id
  8. 最后,尽管常用的命名法是称它为“复合键”,但这也是多余的。关系理论中的术语“key”实际上是唯一标识行的零个或多个属性的集合,所以可以这样说,主键是category_id, post_id
  9. Place the MOST SELECTIVE column FIRST in the primary key declaration. A discussion of the construction of b(+/*) trees is out of the scope of this answer ( for some lower-level discussion see: http://www.akadia.com/services/ora_index_selectivity.html ) but in your case, you'd probably want it on post_id, category_id since post_id will show up less often in the table and thus make the index more useful. Of course, since the table is so small and the index will be, essentially, the data rows, this is not very important. It would be in broader cases where the table is wider.
  10. 在主键声明中首先放置最具选择性的列。建设的讨论b(+ / *)树的这个答案的范围(对于一些低级的讨论见:http://www.akadia.com/services/ora_index_selectivity.html),但在你的情况下,你可能希望它post_id,category_id添加自post_id表中会出现较少,从而使该指数更有用。当然,由于表非常小,索引本质上是数据行,所以这并不重要。在更广泛的情况下,情况会更复杂。

#2


3  

A DBA would tell you that the primary key in this case is actually the combination of the two FK columns. Since Rails/ActiveRecord doesn't play nice with composite PKs (by default, at least), that may be the reason.

DBA会告诉您,本例中的主键实际上是两个FK列的组合。由于Rails/ActiveRecord与复合PKs的性能不太好(至少在默认情况下是这样),这可能就是原因所在。

#3


3  

The combination of foreign keys can be a primary key (called a composite primary key). Personally I favour using a technical primary key instead of that (auto number field, sequence, etc). Why? Well, it makes it much easier to identify the record, which you may need to do if you're going to delete it.

外键的组合可以是主键(称为复合主键)。我个人倾向于使用技术主键,而不是(自动编号字段、序列等)。为什么?它使识别记录变得更容易,如果你要删除它,你可能需要这么做。

Think about it: if you're going to present a Webpage of all the linkages, having a primary key to identify the record makes it much easier.

想想看:如果你要展示一个包含所有链接的网页,有一个主键来识别记录会更容易。

#4


3  

Basically because there's no need for it. The combination of the two foreign key field adequately uniquely identifies any row.

基本上是因为没有必要这么做。这两个外键字段的组合充分地标识了任何一行。

But that merely says why it's not a Good Idea.... but why would it be a Bad Idea?

但这只是说为什么不是一个好主意....但为什么这是个坏主意呢?

Consider the overhead adding a identity column would add. The table would take up 50% more disk space. Worse is the index situation. With a identity field, you have to maintain the identity count, plus a second index. You'll be tripling the disk space and tripling the work the needs to be performed on every insert. With the only advantage being a slightly shorter WHERE clause in a DELETE command.

考虑添加标识列的开销,表将占用50%的磁盘空间。更糟糕的是指数情况。对于标识字段,您必须维护标识计数,加上第二个索引。您将使磁盘空间增加两倍,并将对每个插入执行的工作增加三倍。唯一的优点是删除命令中的WHERE子句稍微短一些。

On the other hand, If the composite key fields are the entire table, then the index can be the table.

另一方面,如果组合键字段是整个表,则索引可以是表。

#5


3  

It is a bad idea not to have a primary key on any table, period (if the DBMS is a relational DBMS - or an SQL DBMS). Primary keys are a crucial part of the integrity of your database.

在任何表、周期(如果DBMS是关系数据库管理系统或SQL数据库管理系统)上都没有主键,这不是一个好主意。主键是数据库完整性的关键部分。

I suppose if you don't mind your database being inaccurate and providing incorrect answers every so often, then you could do without...but most people want accurate answers from their DBMS and for such people, primary keys are crucial.

我想,如果您不介意您的数据库不准确,并且经常提供不正确的答案,那么您可以不用……但是大多数人都想从他们的数据库管理系统中得到准确的答案,对于这些人来说,主键是至关重要的。

#6


2  

Placing the most selective column first should only be relevant in the INDEX declaration. In the KEY declaration, it should not matter (because, as has been correctly pointed out, the KEY is a SET, and inside a set, order doesn't matter - the set {a1,a2} is the same set as {a2,a1}).

将最具选择性的列放在前面应该只与索引声明相关。在关键的声明中,它不应该是重要的(因为,正如正确指出的那样,关键是一个集合,而在一个集合中,顺序无关紧要——集合{a1,a2}是与{a2,a1}相同的集合)。

If a DBMS product is such that ordering of attributes inside a KEY declaration makes a difference, then that DBMS product is guilty of not properly distinguishing between the logical design of a database (the part where you do the KEY declaration) and the physical design of the database (the part where you do the INDEX declaration).

如果一个DBMS产品是这样排序的属性在一个关键的声明一个区别,那DBMS产品是有罪不正确区分数据库的逻辑设计(在你声明的关键部分)和数据库的物理设计(做索引宣言的一部分)。

#7


2  

I wanted to comment on the following comment : "It is not correct to say zero or more".

我想评论一下下面的评论:“说零或更多是不正确的。”

I wanted to remark that the text to which this comment was added simply did not contain the text "zero or more", so the author of the comment I wanted to comment on was criticizing someone else for something that hadn't been said.

我想指出的是,添加这条评论的文本并不包含“零或更多”的文本,所以我想要评论的这条评论的作者是在批评别人没有说过的话。

I also wanted to comment that it is not correct to say that it is not correct say "zero or more". Relational theory as commonly known today among the few people who still bother to study the details of that theory, actually REQUIRES the possibility of a key with no attributes.

我还想说,说“零或更多”不正确是不对的。关系理论现在在少数几个还在研究这个理论的细节的人当中是众所周知的,它实际上需要一个没有属性的键的可能性。

But when I pressed the button "comment", the system responded to me that commenting requires a reputation score of 50 (or some such).

但当我按下“评论”按钮时,系统会回复我说评论需要50分(或者类似的分数)。

A sad illustration of how the world seems to have forgotten that science is not democracy, and that in science, the truth is not determined by whoever happens to be the majority, nor by whoever happens to have "enough reputation".

这是一个可悲的例子,说明世界似乎忘记了科学不是*,在科学中,真理不是由任何碰巧占多数的人决定的,也不是由任何碰巧有“足够声誉”的人决定的。

#8


1  

Pros of having a single PK

有一个PK的优点。

  • Uniquely identifies a row with a single value
  • 唯一地标识具有单个值的行
  • Makes it easy to reference the relationship from elsewhere if needed
  • 如果需要,可以很容易地从其他地方引用这种关系
  • Some tools want you to have a single integer value pk
  • 有些工具希望您有一个整数值pk

Cons of having a single PK

反对一个PK

  • Uses more disk space
  • 使用更多的磁盘空间
  • Need 3 indexes rather than 1
  • 需要3个索引而不是1个
  • Without a unique constraint you could end up with multiple rows for the same relationship
  • 如果没有唯一的约束,您可能会得到同一关系的多个行

Notes

笔记

  • You need to define a unique constraint if you want to avoid duplicates
  • 如果要避免重复,需要定义唯一约束
  • In my opinion don't use the single pk if you're table is going to be huge, otherwise trade off some disk space for the convenience. Yes it's wasteful, but who cares about a few MB on disk in real world applications.
  • 在我看来,如果你的表很大,不要使用单一的pk,否则为了方便而牺牲一些磁盘空间。是的,这是一种浪费,但是谁会关心实际应用程序中磁盘上的一些MB呢?