多对多表中的一个或两个主键?

时间:2021-05-26 09:57:27

I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:

我的数据库中有以下表,它们具有多对多关系,由连接表表示,该连接表具有到每个主表的主键的外键:

  • Widget: WidgetID (PK), Title, Price
  • 小部件:WidgetID(PK),标题,价格
  • User: UserID (PK), FirstName, LastName
  • 用户:UserID(PK),FirstName,LastName

Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:

假设每个User-Widget组合都是唯一的。我可以看到两个选项来构建定义数据关系的连接表:

  1. UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
  2. UserWidgets1:UserWidgetID(PK),WidgetID(FK),UserID(FK)
  3. UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
  4. UserWidgets2:WidgetID(PK,FK),UserID(PK,FK)

Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.

选项1的主键有一列。但是,这似乎是不必要的,因为存储在表中的唯一数据是两个主表之间的关系,并且这种关系本身可以形成唯一键。因此导致选项2,其具有两列主键,但丢失了选项1具有的一列唯一标识符。我还可以选择向第一个表添加两列唯一索引(WidgetID,UserID)。

Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?

两种性能方面是否有任何真正的区别,或者是否有任何理由偏好一种方法而不是另一种方法来构建UserWidgets多对多表?

9 个解决方案

#1


24  

You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.

在任何一种情况下,您只有一个主键。第二个是所谓的复合键。引入新专栏没有充分的理由。实际上,您必须在所有候选键上保留唯一索引。添加新列只会为您节省维护费用。

Go with option 2.

选择2。

#2


5  

Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:

就个人而言,我会在多对多表中使用合成/代理键列,原因如下:

  • If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
  • 如果您在实体表中使用了数字合成键,那么在关系表上使用相同的合成键可以保持设计和命名约定的一致性。
  • It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
  • 将来可能的情况是,多对多表本身成为需要对单个行进行唯一引用的从属实体的父实体。
  • It's not really going to use that much additional disk space.
  • 它并没有真正使用那么多额外的磁盘空间。

The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.

合成键不是自然/复合键的替代,也不是因为它是表中的第一列而成为该表的PRIMARY KEY,所以我部分同意Josh Berkus的文章。但是,我不同意自然键总是PRIMARY KEY的合适候选者,如果要在其他表中用作外键,则不应该使用它们。

#3


5  

Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the lreational model in that it is a good candidate key.

选项2使用简单的复合键,选项1使用代理键。在大多数情况下,选项2是首选,并且接近于标准模型,因为它是一个很好的候选键。

There are situations where you may want to use a surrogate key (Option 1)

在某些情况下,您可能需要使用代理键(选项1)

  1. You are not that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
  2. 你不是说复合键随着时间的推移是一个很好的候选键。特别是对于时间数据(随时间变化的数据)。如果您想使用相同的UserId和WidgetId向UserWidget表添加另一行,该怎么办?考虑就业(EmployeeId,EmployeeId) - 它在大多数情况下都有效,除非有人在以后再次为同一雇主工作
  3. If you are creating messages/business transactions or something similar that require an easier key to use for integration. Replication maybe?
  4. 如果要创建消息/业务事务或类似的东西,需要更简单的密钥来用于集成。复制可能吗?
  5. If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
  6. 如果您想创建自己的审计机制(或类似),并且不希望密钥变得太长。

As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.

根据经验,在建模数据时,您会发现大多数关联实体(多对多)是事件的结果。人员占用就业,项目被添加到篮子等。大多数事件都对事件具有时间依赖性,其中日期或时间是相关的 - 在这种情况下,代理键可能是最佳选择。

So, take option 2, but make sure that you have the complete model.

因此,请选择选项2,但请确保您拥有完整的模型。

#4


3  

I agree with the previous answers but I have one remark to add. If you want to add more information to the relation and allow more relations between the same two entities you need option one.

我同意以前的答案,但我有一点要补充。如果要向关系添加更多信息并允许相同的两个实体之间存在更多关系,则需要选项1。

For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.

例如,如果要跟踪用户1在userwidget表中使用了widget 664的所有时间,则userid和widgetid不再是唯一的。

#5


2  

What is the benefit of a primary key in this scenario? Consider the option of no primary key: UserWidgets3: WidgetID (FK), UserID (FK)

在这种情况下,主键的好处是什么?考虑没有主键的选项:UserWidgets3:WidgetID(FK),UserID(FK)

If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.

如果您想要唯一性,请使用复合键(UserWidgets2)或唯一性约束。

The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.

拥有主键的通常性能优势是您经常通过主键查询表,这很快。对于多对多表,您通常不会通过主键进行查询,因此没有性能优势。通过外键查询多对多表,因此您应该考虑在WidgetID和UserID上添加索引。

#6


2  

Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).

选项2是正确的答案,除非您有充分的理由添加代理数字键(您已在选项1中完成)。

Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.

代理数字键列不是“主键”。主键在技术上是唯一标识表中记录的列组合之一。

Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.

建立数据库的任何人都应该阅读Josh Berkus撰写的这篇文章http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327,以了解代理数字键列和主键之间的区别。

In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.

根据我的经验,向表中添加代理数字键的唯一真正原因是,您的主键是复合键,需要在另一个表中用作外键引用。你才应该考虑在表格中添加一个额外的列。

Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.

每当我看到一个数据库结构,其中每个表都有一个'id'列时,它很可能是由不了解关系模型的人设计的,它总是会显示Josh文章中提到的一个或多个问题。

#7


1  

I would go with both.

我愿意和他们一起去。

Hear me out:

听我说:

The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.

就反映数据含义而言,复合键显然是很好的,正确的方法。没有问题。

However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.

但是:除非你使用一个生成的主键 - 一个代理键,否则我在使hibernate正常工作时会遇到各种麻烦。

So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.

所以我会使用逻辑和物理数据模型。逻辑关键具有复合键。物理模型 - 实现逻辑模型 - 具有代理键和外键。

#8


0  

Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.

由于每个User-Widget组合都是唯一的,因此您应该通过使组合唯一来表示您的表。换句话说,请使用选项2.否则,您可能有两个具有相同窗口小部件和用户ID但具有不同用户窗口小部件ID的条目。

#9


0  

The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.

不需要第一个表中的userwidgetid,就像你说的那样,唯一性来自于widgetid和userid的组合。

I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.

我会使用第二个表,保留foriegn键并在widgetid和userid上添加唯一索引。

So:

所以:

userwidgets( widgetid(fk), userid(fk),
             unique_index(widgetid, userid)
)

There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

没有额外的主键有一些性能提升,因为数据库不需要计算密钥的索引。在上面的模型中虽然这个索引(通过unique_index)仍在计算中,但我相信这更容易理解。

#1


24  

You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.

在任何一种情况下,您只有一个主键。第二个是所谓的复合键。引入新专栏没有充分的理由。实际上,您必须在所有候选键上保留唯一索引。添加新列只会为您节省维护费用。

Go with option 2.

选择2。

#2


5  

Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:

就个人而言,我会在多对多表中使用合成/代理键列,原因如下:

  • If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
  • 如果您在实体表中使用了数字合成键,那么在关系表上使用相同的合成键可以保持设计和命名约定的一致性。
  • It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
  • 将来可能的情况是,多对多表本身成为需要对单个行进行唯一引用的从属实体的父实体。
  • It's not really going to use that much additional disk space.
  • 它并没有真正使用那么多额外的磁盘空间。

The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.

合成键不是自然/复合键的替代,也不是因为它是表中的第一列而成为该表的PRIMARY KEY,所以我部分同意Josh Berkus的文章。但是,我不同意自然键总是PRIMARY KEY的合适候选者,如果要在其他表中用作外键,则不应该使用它们。

#3


5  

Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the lreational model in that it is a good candidate key.

选项2使用简单的复合键,选项1使用代理键。在大多数情况下,选项2是首选,并且接近于标准模型,因为它是一个很好的候选键。

There are situations where you may want to use a surrogate key (Option 1)

在某些情况下,您可能需要使用代理键(选项1)

  1. You are not that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
  2. 你不是说复合键随着时间的推移是一个很好的候选键。特别是对于时间数据(随时间变化的数据)。如果您想使用相同的UserId和WidgetId向UserWidget表添加另一行,该怎么办?考虑就业(EmployeeId,EmployeeId) - 它在大多数情况下都有效,除非有人在以后再次为同一雇主工作
  3. If you are creating messages/business transactions or something similar that require an easier key to use for integration. Replication maybe?
  4. 如果要创建消息/业务事务或类似的东西,需要更简单的密钥来用于集成。复制可能吗?
  5. If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
  6. 如果您想创建自己的审计机制(或类似),并且不希望密钥变得太长。

As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.

根据经验,在建模数据时,您会发现大多数关联实体(多对多)是事件的结果。人员占用就业,项目被添加到篮子等。大多数事件都对事件具有时间依赖性,其中日期或时间是相关的 - 在这种情况下,代理键可能是最佳选择。

So, take option 2, but make sure that you have the complete model.

因此,请选择选项2,但请确保您拥有完整的模型。

#4


3  

I agree with the previous answers but I have one remark to add. If you want to add more information to the relation and allow more relations between the same two entities you need option one.

我同意以前的答案,但我有一点要补充。如果要向关系添加更多信息并允许相同的两个实体之间存在更多关系,则需要选项1。

For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.

例如,如果要跟踪用户1在userwidget表中使用了widget 664的所有时间,则userid和widgetid不再是唯一的。

#5


2  

What is the benefit of a primary key in this scenario? Consider the option of no primary key: UserWidgets3: WidgetID (FK), UserID (FK)

在这种情况下,主键的好处是什么?考虑没有主键的选项:UserWidgets3:WidgetID(FK),UserID(FK)

If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.

如果您想要唯一性,请使用复合键(UserWidgets2)或唯一性约束。

The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.

拥有主键的通常性能优势是您经常通过主键查询表,这很快。对于多对多表,您通常不会通过主键进行查询,因此没有性能优势。通过外键查询多对多表,因此您应该考虑在WidgetID和UserID上添加索引。

#6


2  

Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).

选项2是正确的答案,除非您有充分的理由添加代理数字键(您已在选项1中完成)。

Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.

代理数字键列不是“主键”。主键在技术上是唯一标识表中记录的列组合之一。

Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.

建立数据库的任何人都应该阅读Josh Berkus撰写的这篇文章http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327,以了解代理数字键列和主键之间的区别。

In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.

根据我的经验,向表中添加代理数字键的唯一真正原因是,您的主键是复合键,需要在另一个表中用作外键引用。你才应该考虑在表格中添加一个额外的列。

Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.

每当我看到一个数据库结构,其中每个表都有一个'id'列时,它很可能是由不了解关系模型的人设计的,它总是会显示Josh文章中提到的一个或多个问题。

#7


1  

I would go with both.

我愿意和他们一起去。

Hear me out:

听我说:

The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.

就反映数据含义而言,复合键显然是很好的,正确的方法。没有问题。

However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.

但是:除非你使用一个生成的主键 - 一个代理键,否则我在使hibernate正常工作时会遇到各种麻烦。

So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.

所以我会使用逻辑和物理数据模型。逻辑关键具有复合键。物理模型 - 实现逻辑模型 - 具有代理键和外键。

#8


0  

Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.

由于每个User-Widget组合都是唯一的,因此您应该通过使组合唯一来表示您的表。换句话说,请使用选项2.否则,您可能有两个具有相同窗口小部件和用户ID但具有不同用户窗口小部件ID的条目。

#9


0  

The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.

不需要第一个表中的userwidgetid,就像你说的那样,唯一性来自于widgetid和userid的组合。

I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.

我会使用第二个表,保留foriegn键并在widgetid和userid上添加唯一索引。

So:

所以:

userwidgets( widgetid(fk), userid(fk),
             unique_index(widgetid, userid)
)

There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

没有额外的主键有一些性能提升,因为数据库不需要计算密钥的索引。在上面的模型中虽然这个索引(通过unique_index)仍在计算中,但我相信这更容易理解。

相关文章