I often see some database design like this:
我经常看到这样的数据库设计:
Case 1:
案例1:
UserTable
UserTable
--id[auto increase]
- - - id(自动增加)
--UserName
——用户名
--Password
——密码
——电子邮件
Case 2:
案例2:
UserTable
UserTable
--UserName
——用户名
--Password
——密码
——电子邮件
RoleTable:
RoleTable:
--RoleID
——RoleID
--RoleName
——RoleName
UserTableRole:
UserTableRole:
--id[auto increased]
- - - id(自动增加)
--Username
——用户名
--RoleID
——RoleID
I have questions as follows:
我有以下问题:
In Case 1: Why not use UserName field as primary key (PK)? why use another filed likes id [which is auto increased] as PK? In case of just UserName and Email, why not use Email as PK? So, What is the best approach?
案例1:为什么不使用用户名字段作为主键(PK)?为什么要使用另一个类id(自动增加)作为PK?如果只是用户名和电子邮件,为什么不使用电子邮件作为PK?那么,最好的方法是什么呢?
In Case 2: In the UserRoleTable, why not use both UserName and RoleID as PK? why use another filed likes id [which is auto increased] as PK? So, What is the best approach in this case?
在情形2:在UserRoleTable中,为什么不同时使用用户名和RoleID作为PK呢?为什么要使用另一个类id(自动增加)作为PK?那么,在这种情况下,最好的方法是什么呢?
4 个解决方案
#1
3
In Case 1: Why not use UserName field as primary key (PK)? why use another filed likes id [which is auto increased] as PK?
案例1:为什么不使用用户名字段作为主键(PK)?为什么要使用另一个类id(自动增加)作为PK?
The UserTable.UserName
has intrinsic meaning in this data model and is called "natural key". The UserTable.id
, on the other hand, is "surrogate key".
UserTable。用户名在这个数据模型中有内在的含义,称为“自然键”。UserTable。另一方面,id是“代理键”。
If there is a natural key in your model, you cannot eliminate it with the surrogate key, you can just supplant it. So the question is: do you just use the natural key, or the natural and surrogate key? Both strategies are actually valid and have their pros and cons.
如果您的模型中有一个自然的键,您不能用代理键删除它,您可以代替它。所以问题是:你是使用自然键,还是使用自然键和代理键?这两种策略实际上都是有效的,并且各有利弊。
Typical reasons for surrogate key:
代理键的典型原因:
- To keep FKs in child tables slimmer (integer vs. string in this case), for smaller storage and better caching.
- 为了在子表中保持FKs更小(在这种情况下是整数还是字符串),为了更小的存储空间和更好的缓存。
- Avoid the need for ON UPDATE CASCADE.
- 避免更新级联的需要。
- Friendliness toward ORM tools.
- 友善对ORM工具。
On the other hand:
另一方面:
- You now have two keys instead of one, requiring an extra index, making the parent table larger and less cache-friendly, and slowing down INSERT/UPDATE//DELETE due to index maintenance.1
- 现在有了两个键而不是一个键,需要一个额外的索引,使父表变得更大,对缓存不那么友好,并由于索引维护而减慢插入/更新/删除
- May require more JOIN-ing2.
- 可能需要更多的JOIN-ing2。
- And may not play well with clustering.3
- 而且可能不会很好地运用球技
In case of just UserName and Email, why not use Email as PK?
如果只是用户名和电子邮件,为什么不使用电子邮件作为PK?
The designer probably wanted to avoid ON CASCADE UPDATE that would be necessary if user changed the e-mail.
设计人员可能希望避免用户更改电子邮件时需要的级联更新。
In Case 2: In the UserRoleTable, why not use both UserName and RoleID as PK?
在情形2:在UserRoleTable中,为什么不同时使用用户名和RoleID作为PK呢?
If there cannot be multiple connections for the same user/role pair, you have to have a key on that in any case.
如果同一个用户/角色对不能有多个连接,那么在任何情况下都必须有一个键。
Unless there are child tables with FKs referencing UserTableRole
or an unfriendly ORM is used, there is no reason for an additional surrogate PK.
除非使用引用UserTableRole或不友好的ORM的FKs的子表,否则没有理由再添加代理PK。
1 And if clustering is used, the secondary index under the natural key may be extra "fat" (since it contains a copy of the clustering key, which is typically PK) and require a double-lookup when querying (since rows in clustered table don't have stable physical locations, so must be located through a clustering key, barring some DBMS-specific optimizations such as Oracle's "rowid guesses").
1,如果使用集群,自然下的二级索引键可能是额外的“脂肪”(因为它包含集群键的副本,这是典型的PK),需要double-lookup当查询(因为集群表没有稳定的物理位置,所以必须通过集群关键位置,除非一些DBMS-specific优化如甲骨文的“rowid猜测”)。
2 E.g. you wouldn't be able to find UserName
just by reading the junction table - you'd have to JOIN it with the UserTable
.
2 .例如,你不能仅仅通过读取连接表来找到用户名,你必须将它与UserTable连接。
3 Surrogates are typically ordered in a way that is not meaningful to the client applications. The auto-increment surrogate key's order depends on the order of INSERTs, and querying is not typically done on a "range of users by their order of insertion". Some surrogates such as GUIDs may be more-less randomly ordered.
3个代理通常以对客户端应用程序没有意义的方式排序。自动递增代理键的顺序取决于插入的顺序,并且查询通常不会根据用户的插入顺序来完成。一些代理(如GUIDs)可能更不随机。
#2
0
One reason I can think of for not using things like UserName as the primary key is that they could be subject to change. Having anything that's exposed to the outside world as a primary key runs the risk of those things being changed, and it's best to have a stable primary key.
我认为不使用用户名作为主键的一个原因是它们可能会发生变化。将任何作为主键对外公开的内容都存在更改的风险,最好有一个稳定的主键。
What if the user changes an email or username; do you really want to change your keys in all your relationships? IMO, it's best to have a stable key that never sees the outside world, about which everyone knows nothing, and therefore which can remain stable regardless of what changes may occur in your database.
如果用户更改了电子邮件或用户名怎么办?你真的想在所有的关系中改变你的钥匙吗?在我看来,最好有一把稳定的钥匙,它永远看不到外面的世界,每个人对它一无所知,因此无论你的数据库发生什么变化,它都可以保持稳定。
#3
0
Your question is essentially the advantages and disadvantages of using natural vs surrogate key.
你的问题本质上是使用自然vs代理键的优点和缺点。
Flexibility is the primary concern, with surrogates key you can change their username much more easily. And it might be possible in the future that you may need to allow duplicate usernames, e.g. mergers.
灵活性是最主要的问题,使用代理键可以更容易地更改用户名。在将来可能需要允许重复的用户名,例如合并。
Speed is another concern, on a frequently accessed table like the user table, it's generally faster to do a join on integers than on strings.
速度是另一个问题,在像用户表这样经常被访问的表上,在整数上执行联接通常比在字符串上执行更快。
Another is table size, when used as foreignkey, you'll have to store the whole key's value. Surrogates are very compact, and is much smaller than natural keys.
另一个是表大小,当用作foreignkey时,必须存储整个键的值。替代键非常紧凑,比自然键小得多。
Most ORM also requires the use of surrogate because it provides consistency between tables.
大多数ORM还需要使用代理,因为它提供了表之间的一致性。
Also, on many systems, it may not necessarily be safe to assume that email is unique.
此外,在许多系统中,假定电子邮件是唯一的并不一定是安全的。
I agree though that in a relationship table like UserRole, it's generally best to use a primary composite key from the foreign keys.
我同意在UserRole这样的关系表中,最好使用来自外键的主复合键。
#4
0
Several reason I can think of in your example for using a surrogate primary key (Id) over the username.
在您的示例中,我可以想到在用户名中使用代理主键(Id)的几个原因。
- The id field would be very rarely subject to updates if at all. If username was the primary key you would have to cascade on update to all tables where username was used as a foreign key.
- 如果有的话,id字段很少会被更新。如果用户名是主键,那么您必须在更新到所有的表中,以用户名作为外键使用。
- Performance. An int comparison beats a string comparison.
- 的性能。int比较胜过字符串比较。
- The id key would take up less storage space where it was a foreign key in other tables.
- id键将占用更少的存储空间,在其他表中它是外键。
- the id field allows you to not expose perhaps sensitive data. E.g. consider a web app url domain/posts/user/1242 vs domain/posts/user/myusername
- id字段允许您不公开敏感数据。例如,考虑一个web应用程序url域/post /user/1242 vs domain/posts/user/myusername
For your second question it would be better to use userid than the username in UserTableRole. Whether or not it is better to then also include a surrogate key for this many- to- many table is a matter of opinion. I hate using surrogate id keys for many to many tables and usually just make a compound primary key of the two foreign key ids. The only time I would consider a surrogate key here is if I needed to use it as a foreign key in yet another table.
对于第二个问题,最好使用userid,而不是UserTableRole中的用户名。是否还应该为这么多到这么多的表包含代理键是一个问题。我讨厌对许多表使用代理id键,通常只对两个外键id使用复合主键。我在这里考虑代理键的唯一时间是,如果我需要在另一个表中使用它作为外键。
#1
3
In Case 1: Why not use UserName field as primary key (PK)? why use another filed likes id [which is auto increased] as PK?
案例1:为什么不使用用户名字段作为主键(PK)?为什么要使用另一个类id(自动增加)作为PK?
The UserTable.UserName
has intrinsic meaning in this data model and is called "natural key". The UserTable.id
, on the other hand, is "surrogate key".
UserTable。用户名在这个数据模型中有内在的含义,称为“自然键”。UserTable。另一方面,id是“代理键”。
If there is a natural key in your model, you cannot eliminate it with the surrogate key, you can just supplant it. So the question is: do you just use the natural key, or the natural and surrogate key? Both strategies are actually valid and have their pros and cons.
如果您的模型中有一个自然的键,您不能用代理键删除它,您可以代替它。所以问题是:你是使用自然键,还是使用自然键和代理键?这两种策略实际上都是有效的,并且各有利弊。
Typical reasons for surrogate key:
代理键的典型原因:
- To keep FKs in child tables slimmer (integer vs. string in this case), for smaller storage and better caching.
- 为了在子表中保持FKs更小(在这种情况下是整数还是字符串),为了更小的存储空间和更好的缓存。
- Avoid the need for ON UPDATE CASCADE.
- 避免更新级联的需要。
- Friendliness toward ORM tools.
- 友善对ORM工具。
On the other hand:
另一方面:
- You now have two keys instead of one, requiring an extra index, making the parent table larger and less cache-friendly, and slowing down INSERT/UPDATE//DELETE due to index maintenance.1
- 现在有了两个键而不是一个键,需要一个额外的索引,使父表变得更大,对缓存不那么友好,并由于索引维护而减慢插入/更新/删除
- May require more JOIN-ing2.
- 可能需要更多的JOIN-ing2。
- And may not play well with clustering.3
- 而且可能不会很好地运用球技
In case of just UserName and Email, why not use Email as PK?
如果只是用户名和电子邮件,为什么不使用电子邮件作为PK?
The designer probably wanted to avoid ON CASCADE UPDATE that would be necessary if user changed the e-mail.
设计人员可能希望避免用户更改电子邮件时需要的级联更新。
In Case 2: In the UserRoleTable, why not use both UserName and RoleID as PK?
在情形2:在UserRoleTable中,为什么不同时使用用户名和RoleID作为PK呢?
If there cannot be multiple connections for the same user/role pair, you have to have a key on that in any case.
如果同一个用户/角色对不能有多个连接,那么在任何情况下都必须有一个键。
Unless there are child tables with FKs referencing UserTableRole
or an unfriendly ORM is used, there is no reason for an additional surrogate PK.
除非使用引用UserTableRole或不友好的ORM的FKs的子表,否则没有理由再添加代理PK。
1 And if clustering is used, the secondary index under the natural key may be extra "fat" (since it contains a copy of the clustering key, which is typically PK) and require a double-lookup when querying (since rows in clustered table don't have stable physical locations, so must be located through a clustering key, barring some DBMS-specific optimizations such as Oracle's "rowid guesses").
1,如果使用集群,自然下的二级索引键可能是额外的“脂肪”(因为它包含集群键的副本,这是典型的PK),需要double-lookup当查询(因为集群表没有稳定的物理位置,所以必须通过集群关键位置,除非一些DBMS-specific优化如甲骨文的“rowid猜测”)。
2 E.g. you wouldn't be able to find UserName
just by reading the junction table - you'd have to JOIN it with the UserTable
.
2 .例如,你不能仅仅通过读取连接表来找到用户名,你必须将它与UserTable连接。
3 Surrogates are typically ordered in a way that is not meaningful to the client applications. The auto-increment surrogate key's order depends on the order of INSERTs, and querying is not typically done on a "range of users by their order of insertion". Some surrogates such as GUIDs may be more-less randomly ordered.
3个代理通常以对客户端应用程序没有意义的方式排序。自动递增代理键的顺序取决于插入的顺序,并且查询通常不会根据用户的插入顺序来完成。一些代理(如GUIDs)可能更不随机。
#2
0
One reason I can think of for not using things like UserName as the primary key is that they could be subject to change. Having anything that's exposed to the outside world as a primary key runs the risk of those things being changed, and it's best to have a stable primary key.
我认为不使用用户名作为主键的一个原因是它们可能会发生变化。将任何作为主键对外公开的内容都存在更改的风险,最好有一个稳定的主键。
What if the user changes an email or username; do you really want to change your keys in all your relationships? IMO, it's best to have a stable key that never sees the outside world, about which everyone knows nothing, and therefore which can remain stable regardless of what changes may occur in your database.
如果用户更改了电子邮件或用户名怎么办?你真的想在所有的关系中改变你的钥匙吗?在我看来,最好有一把稳定的钥匙,它永远看不到外面的世界,每个人对它一无所知,因此无论你的数据库发生什么变化,它都可以保持稳定。
#3
0
Your question is essentially the advantages and disadvantages of using natural vs surrogate key.
你的问题本质上是使用自然vs代理键的优点和缺点。
Flexibility is the primary concern, with surrogates key you can change their username much more easily. And it might be possible in the future that you may need to allow duplicate usernames, e.g. mergers.
灵活性是最主要的问题,使用代理键可以更容易地更改用户名。在将来可能需要允许重复的用户名,例如合并。
Speed is another concern, on a frequently accessed table like the user table, it's generally faster to do a join on integers than on strings.
速度是另一个问题,在像用户表这样经常被访问的表上,在整数上执行联接通常比在字符串上执行更快。
Another is table size, when used as foreignkey, you'll have to store the whole key's value. Surrogates are very compact, and is much smaller than natural keys.
另一个是表大小,当用作foreignkey时,必须存储整个键的值。替代键非常紧凑,比自然键小得多。
Most ORM also requires the use of surrogate because it provides consistency between tables.
大多数ORM还需要使用代理,因为它提供了表之间的一致性。
Also, on many systems, it may not necessarily be safe to assume that email is unique.
此外,在许多系统中,假定电子邮件是唯一的并不一定是安全的。
I agree though that in a relationship table like UserRole, it's generally best to use a primary composite key from the foreign keys.
我同意在UserRole这样的关系表中,最好使用来自外键的主复合键。
#4
0
Several reason I can think of in your example for using a surrogate primary key (Id) over the username.
在您的示例中,我可以想到在用户名中使用代理主键(Id)的几个原因。
- The id field would be very rarely subject to updates if at all. If username was the primary key you would have to cascade on update to all tables where username was used as a foreign key.
- 如果有的话,id字段很少会被更新。如果用户名是主键,那么您必须在更新到所有的表中,以用户名作为外键使用。
- Performance. An int comparison beats a string comparison.
- 的性能。int比较胜过字符串比较。
- The id key would take up less storage space where it was a foreign key in other tables.
- id键将占用更少的存储空间,在其他表中它是外键。
- the id field allows you to not expose perhaps sensitive data. E.g. consider a web app url domain/posts/user/1242 vs domain/posts/user/myusername
- id字段允许您不公开敏感数据。例如,考虑一个web应用程序url域/post /user/1242 vs domain/posts/user/myusername
For your second question it would be better to use userid than the username in UserTableRole. Whether or not it is better to then also include a surrogate key for this many- to- many table is a matter of opinion. I hate using surrogate id keys for many to many tables and usually just make a compound primary key of the two foreign key ids. The only time I would consider a surrogate key here is if I needed to use it as a foreign key in yet another table.
对于第二个问题,最好使用userid,而不是UserTableRole中的用户名。是否还应该为这么多到这么多的表包含代理键是一个问题。我讨厌对许多表使用代理id键,通常只对两个外键id使用复合主键。我在这里考虑代理键的唯一时间是,如果我需要在另一个表中使用它作为外键。