I'm designing a small SQL database to be used by a web application.
我正在设计一个由Web应用程序使用的小型SQL数据库。
Let's say a particular table has a Name field for which no two rows will be allowed to have the same value. However, users will be able to change the Name field at any time.
假设一个特定的表有一个Name字段,其中不允许两行具有相同的值。但是,用户可以随时更改“名称”字段。
The primary key from this table will be used as a foreign key in other tables. So if the Name field was used as the primary key, any changes would need to be propagated to those other tables. On the other hand, the uniqueness requirement would be handled automatically.
此表中的主键将用作其他表中的外键。因此,如果将Name字段用作主键,则需要将任何更改传播到其他表。另一方面,唯一性要求将自动处理。
My instinct would be to add an integer field to act as the primary key, which could be automatically populated by the database. Is there any point in having this field or would it be a waste of time?
我的直觉是添加一个整数字段作为主键,可以由数据库自动填充。拥有这个领域还有什么意义,还是浪费时间?
11 个解决方案
#1
25
I would use a generated PK myself, just for the reasons you mentioned. Also, indexing and comparing by integer is faster than comparing by strings. You can put a unique index on the name field too without making it a primary key.
我会自己使用生成的PK,只是出于你提到的原因。此外,通过整数进行索引和比较比通过字符串进行比较更快。您也可以在名称字段上添加唯一索引,而不必将其作为主键。
#2
10
What you are describing is called a surrogate key. See the Wikipedia article for the long answer.
您所描述的内容称为代理键。有关详细答案,请参阅*文章。
#3
6
Though it's faster to search and join on an integer column (as many have pointed out), it's even faster to never join in the first place. By storing a natural key, you can often eliminate the need for a join.
虽然搜索和加入整数列的速度更快(正如许多人所指出的那样),但是从来没有加入到第一位的速度更快。通过存储自然键,您通常可以消除对连接的需要。
For a smallish database, the CASCADE updates to the foreign key references wouldn't have much performance impact, unless they were changing extremely often.
对于较小的数据库,CASCADE对外键引用的更新不会对性能产生太大影响,除非它们经常更改。
That being said, you should probably use an integer or GUID as a surrogate key in this case. An updateable-by-design primary key isn't the best idea, and unless your application has a very compelling business reason to be unique by name - you will inevitably have conflicts.
话虽这么说,在这种情况下,您应该使用整数或GUID作为代理键。可更新的主键并不是最好的主意,除非您的应用程序有一个非常引人注目的商业理由,因此名称是唯一的 - 您将不可避免地遇到冲突。
#4
2
Yes - and as a rule of thumb, always, for every table.
是的 - 根据经验,总是,对于每张桌子。
You should definitely not use a changeable field as a primary key and in the vast majority of circumstances you don't want to use a field that has any other purpose as a primary key.
绝对不应该使用可更改字段作为主键,并且在绝大多数情况下,您不希望使用具有任何其他目的的字段作为主键。
This is basic good practice for db schemas.
这是db模式的基本良好实践。
#5
2
Have an integer primary key is always a good thing from the performance prospective. All of your relationships will be much more efficient with an integer primary key. For example, JOINs will be very much faster (SQL Server).
从性能预期来看,使用整数主键始终是一件好事。使用整数主键,您的所有关系都将更加高效。例如,JOIN将非常快(SQL Server)。
It will also allow you future modifications of the database. Quite often you have a unique name column only to find out later that the name it is not unique at all.
它还允许您将来修改数据库。通常,您只有一个唯一的名称列,以便稍后找出它根本不是唯一的名称。
Right now, you could enforce the uniqueness of the column Name by having an index on it as well.
现在,您可以通过在列名上添加索引来强制执行列名称的唯一性。
#6
2
I would use an auto-generated ID field for the primary key. It's easier to join with tables based off integer IDs than text. Also, if field Name is updated often, if it were a primary key, the database would be put under stress for updating the index on that field much more often.
我会使用自动生成的ID字段作为主键。基于整数ID而不是文本连接表更容易。此外,如果字段名称经常更新,如果它是主键,则数据库将处于压力之下,以更频繁地更新该字段上的索引。
If field Name is always unique, you should still mark it as unique in the database. However, often there will be a possibility (maybe not currently but possibly in the future in your case) of two same names, so I do not recommend it.
如果字段名称始终是唯一的,您仍应在数据库中将其标记为唯一。但是,通常会有两种相同名称的可能性(目前可能不会,但可能在将来可能),所以我不推荐它。
Another advantage for using IDs is in the case you have a reporting need on your database. If you have a report you want for a given set of names, the ID filter on the report would stay consistent even when the names might change.
使用ID的另一个好处是,您的数据库需要报告。如果您有一组给定名称的报告,即使名称可能更改,报告上的ID过滤器也会保持一致。
#7
1
If you're living in the rarefied circles of theoretical mathematicians (like C. Date does in the-land-where-there-are-no-nulls, because all data values are known and correct), then primary keys can be built from the components of the data that identify the idealized platonic entity to which you are referring (i.e. name+birthday+place of birth+parent's names), but in the messy real world "synthetic keys" that can identify your real-world entities within the context of your database are a much more practical way to do things. (And nullable fields can be very useful to. Take that, relational-design-theory people!)
如果你生活在理论数学家的稀疏圈子里(比如C. Date在那里 - 那里没有空,因为所有的数据值都是已知和正确的),那么主键可以从识别您所指的理想化柏拉图实体的数据组成部分(即名称+生日+出生地点+父母姓名),但在凌乱的现实世界中,“合成键”可以识别您的真实世界中的实体。数据库的上下文是一种更实用的方法。 (并且可以为空的字段非常有用。那就是关系设计理论的人!)
#8
1
If your name column will be changing it isn't really a good candidate for a primary key. A primary key should define a unique row of a table. If it can be changed it's not really doing that. Without knowing more specifics about your system I can't say, but this might be a good time for a surrogate key.
如果您的名称列将会更改,那么它实际上不是主键的候选者。主键应定义表的唯一行。如果它可以改变它不是真的这样做。在不知道有关您的系统的更多细节的情况下,我不能说,但这可能是代理密钥的好时机。
I'll also add this in hopes of dispelling the myths of using auto-incrementing integers for all of your primary keys. It is NOT always a performance gain to use them. In fact, quite often it's the exact opposite. If you have an auto-incrementing column that means that every INSERT in the system now has that added overhead of generating a new value.
我还要添加它,希望消除为所有主键使用自动递增整数的神话。使用它们并不总是性能提升。事实上,通常情况恰恰相反。如果您有一个自动递增列,这意味着系统中的每个INSERT现在都具有生成新值的额外开销。
Also, as Mark points out, with surrogate IDs on all of your tables if you have a chain of tables that are related, to get from one to another you might have to join all of those tables together to traverse them. With natural primary keys that is usually not the case. Joining 6 tables with integers is going to usually be slower than joining 2 tables with a string.
此外,正如Mark所指出的那样,如果你有一个相关的表链,所有表上都有代理ID,要从一个表到另一个表,你可能必须将所有这些表连接在一起来遍历它们。使用自然主键通常不是这种情况。使用整数连接6个表通常比使用字符串连接2个表要慢。
You also often loose the ability to do set-based operations when you have auto-incrementing IDs on all of your tables. Instead of insert 1000 rows into a parent table, then inserting 5000 rows into a child table, you now have to insert the parent rows one at a time in a cursor or some other loop just to get the generated IDs so that you can assign them to the related children. I've seen a 30 second process turned into a 20 minute process because someone insisted on using auto-incrementing IDs on all of the tables in a database.
当您在所有表上都有自动递增ID时,您通常也无法执行基于集合的操作。不是在父表中插入1000行,而是在子表中插入5000行,而是现在必须在游标或其他循环中一次插入一行父行,以获取生成的ID,以便您可以分配它们给相关的孩子。我已经看到一个30秒的过程变成了一个20分钟的过程,因为有人坚持在数据库的所有表上使用自动递增ID。
Finally (at least for reasons I'm listing here - there are certainly others), using auto-incrementing IDs on all of your tables promotes poor design. When the designer no longer has to think about what a natural key might be for a table it usually results in erroneous duplicates ending up in the data. You can try to avoid the problem with unique indexes, but in my experience developers and designers don't go through that extra effort and after a year of using their new system they find that the data is a mess because the database didn't have proper constraints on the data through natural keys.
最后(至少出于我在这里列出的原因 - 肯定还有其他原因),在所有表上使用自动递增ID会促使设计不佳。当设计者不再需要考虑表格的自然键时,通常会导致数据中出现错误的重复。您可以尝试避免使用唯一索引的问题,但根据我的经验,开发人员和设计人员不会花费额外的努力,并且在使用他们的新系统一年后,他们发现数据很乱,因为数据库没有通过自然键对数据进行适当的约束。
There's certainly a time for using surrogate keys, but using them blindly on all tables is almost always a mistake.
肯定有时间使用代理键,但盲目地在所有表上使用它们几乎总是一个错误。
#9
1
The primary key for a record must be unique and permanent. If a record naturally has a simple key which fulfills both of those, then use it. However, they don't come around very often. For a person record, the person's name is neither unique nor permanent, so you pretty much have to use a auto-increment.
记录的主键必须是唯一且永久的。如果一个记录自然有一个简单的键来完成这两个,那么使用它。但是,它们并不经常出现。对于个人记录,该人的姓名既不是唯一的也不是永久性的,因此您几乎必须使用自动增量。
The one place where natural keys do work is on a code table, for example, a table mapping a status value to its description. There is little sense to give "Active" a primary key of 1, "Delay" a primary key of 2, etc. When it is just as easy to give "Active" a primary key of "ACT"; "Delayed", "DLY"; "On Hold", "HLD" and so on.
自然键可以工作的一个位置是代码表,例如,将状态值映射到其描述的表。给“主动”主键1,“延迟”主键2等等几乎没有意义。当“主动”给主动键“ACT”同样容易; “延迟”,“DLY”; “暂停”,“HLD”等。
Note also, some say you should use integers over strings because they compare faster. Not really true. A comparing two 4-byte character fields will take exactly as long as comparing two 4-byte integer fields. Longer string will, of course take longer, but if you keep the codes short, there's no difference.
还要注意,有人说你应该在字符串上使用整数,因为它们比较快。不是真的。比较两个4字节字符字段将花费与比较两个4字节整数字段完全相同的时间。更长的字符串当然需要更长的时间,但如果你保持代码简短,那就没有区别了。
#10
0
The primary key must be unique for every row. The auto_increment Integer is very good idea, and if you don't have other ideas about populating the primary key then this is the best way.
主键必须对每一行都是唯一的。 auto_increment Integer是个好主意,如果你没有关于填充主键的其他想法,那么这是最好的方法。
#11
0
In addition to what is all said, consider using a UUID as PK. It will allow you to create keys that are uniq spanning multiple databases.
除了所说的,还可以考虑使用UUID作为PK。它将允许您创建跨越多个数据库的密钥。
If you ever need to export/merge data with other database, then the data will always stay unique and relationships can be easily maintained.
如果您需要将数据导出/合并到其他数据库,那么数据将始终保持唯一,并且可以轻松维护关系。
#1
25
I would use a generated PK myself, just for the reasons you mentioned. Also, indexing and comparing by integer is faster than comparing by strings. You can put a unique index on the name field too without making it a primary key.
我会自己使用生成的PK,只是出于你提到的原因。此外,通过整数进行索引和比较比通过字符串进行比较更快。您也可以在名称字段上添加唯一索引,而不必将其作为主键。
#2
10
What you are describing is called a surrogate key. See the Wikipedia article for the long answer.
您所描述的内容称为代理键。有关详细答案,请参阅*文章。
#3
6
Though it's faster to search and join on an integer column (as many have pointed out), it's even faster to never join in the first place. By storing a natural key, you can often eliminate the need for a join.
虽然搜索和加入整数列的速度更快(正如许多人所指出的那样),但是从来没有加入到第一位的速度更快。通过存储自然键,您通常可以消除对连接的需要。
For a smallish database, the CASCADE updates to the foreign key references wouldn't have much performance impact, unless they were changing extremely often.
对于较小的数据库,CASCADE对外键引用的更新不会对性能产生太大影响,除非它们经常更改。
That being said, you should probably use an integer or GUID as a surrogate key in this case. An updateable-by-design primary key isn't the best idea, and unless your application has a very compelling business reason to be unique by name - you will inevitably have conflicts.
话虽这么说,在这种情况下,您应该使用整数或GUID作为代理键。可更新的主键并不是最好的主意,除非您的应用程序有一个非常引人注目的商业理由,因此名称是唯一的 - 您将不可避免地遇到冲突。
#4
2
Yes - and as a rule of thumb, always, for every table.
是的 - 根据经验,总是,对于每张桌子。
You should definitely not use a changeable field as a primary key and in the vast majority of circumstances you don't want to use a field that has any other purpose as a primary key.
绝对不应该使用可更改字段作为主键,并且在绝大多数情况下,您不希望使用具有任何其他目的的字段作为主键。
This is basic good practice for db schemas.
这是db模式的基本良好实践。
#5
2
Have an integer primary key is always a good thing from the performance prospective. All of your relationships will be much more efficient with an integer primary key. For example, JOINs will be very much faster (SQL Server).
从性能预期来看,使用整数主键始终是一件好事。使用整数主键,您的所有关系都将更加高效。例如,JOIN将非常快(SQL Server)。
It will also allow you future modifications of the database. Quite often you have a unique name column only to find out later that the name it is not unique at all.
它还允许您将来修改数据库。通常,您只有一个唯一的名称列,以便稍后找出它根本不是唯一的名称。
Right now, you could enforce the uniqueness of the column Name by having an index on it as well.
现在,您可以通过在列名上添加索引来强制执行列名称的唯一性。
#6
2
I would use an auto-generated ID field for the primary key. It's easier to join with tables based off integer IDs than text. Also, if field Name is updated often, if it were a primary key, the database would be put under stress for updating the index on that field much more often.
我会使用自动生成的ID字段作为主键。基于整数ID而不是文本连接表更容易。此外,如果字段名称经常更新,如果它是主键,则数据库将处于压力之下,以更频繁地更新该字段上的索引。
If field Name is always unique, you should still mark it as unique in the database. However, often there will be a possibility (maybe not currently but possibly in the future in your case) of two same names, so I do not recommend it.
如果字段名称始终是唯一的,您仍应在数据库中将其标记为唯一。但是,通常会有两种相同名称的可能性(目前可能不会,但可能在将来可能),所以我不推荐它。
Another advantage for using IDs is in the case you have a reporting need on your database. If you have a report you want for a given set of names, the ID filter on the report would stay consistent even when the names might change.
使用ID的另一个好处是,您的数据库需要报告。如果您有一组给定名称的报告,即使名称可能更改,报告上的ID过滤器也会保持一致。
#7
1
If you're living in the rarefied circles of theoretical mathematicians (like C. Date does in the-land-where-there-are-no-nulls, because all data values are known and correct), then primary keys can be built from the components of the data that identify the idealized platonic entity to which you are referring (i.e. name+birthday+place of birth+parent's names), but in the messy real world "synthetic keys" that can identify your real-world entities within the context of your database are a much more practical way to do things. (And nullable fields can be very useful to. Take that, relational-design-theory people!)
如果你生活在理论数学家的稀疏圈子里(比如C. Date在那里 - 那里没有空,因为所有的数据值都是已知和正确的),那么主键可以从识别您所指的理想化柏拉图实体的数据组成部分(即名称+生日+出生地点+父母姓名),但在凌乱的现实世界中,“合成键”可以识别您的真实世界中的实体。数据库的上下文是一种更实用的方法。 (并且可以为空的字段非常有用。那就是关系设计理论的人!)
#8
1
If your name column will be changing it isn't really a good candidate for a primary key. A primary key should define a unique row of a table. If it can be changed it's not really doing that. Without knowing more specifics about your system I can't say, but this might be a good time for a surrogate key.
如果您的名称列将会更改,那么它实际上不是主键的候选者。主键应定义表的唯一行。如果它可以改变它不是真的这样做。在不知道有关您的系统的更多细节的情况下,我不能说,但这可能是代理密钥的好时机。
I'll also add this in hopes of dispelling the myths of using auto-incrementing integers for all of your primary keys. It is NOT always a performance gain to use them. In fact, quite often it's the exact opposite. If you have an auto-incrementing column that means that every INSERT in the system now has that added overhead of generating a new value.
我还要添加它,希望消除为所有主键使用自动递增整数的神话。使用它们并不总是性能提升。事实上,通常情况恰恰相反。如果您有一个自动递增列,这意味着系统中的每个INSERT现在都具有生成新值的额外开销。
Also, as Mark points out, with surrogate IDs on all of your tables if you have a chain of tables that are related, to get from one to another you might have to join all of those tables together to traverse them. With natural primary keys that is usually not the case. Joining 6 tables with integers is going to usually be slower than joining 2 tables with a string.
此外,正如Mark所指出的那样,如果你有一个相关的表链,所有表上都有代理ID,要从一个表到另一个表,你可能必须将所有这些表连接在一起来遍历它们。使用自然主键通常不是这种情况。使用整数连接6个表通常比使用字符串连接2个表要慢。
You also often loose the ability to do set-based operations when you have auto-incrementing IDs on all of your tables. Instead of insert 1000 rows into a parent table, then inserting 5000 rows into a child table, you now have to insert the parent rows one at a time in a cursor or some other loop just to get the generated IDs so that you can assign them to the related children. I've seen a 30 second process turned into a 20 minute process because someone insisted on using auto-incrementing IDs on all of the tables in a database.
当您在所有表上都有自动递增ID时,您通常也无法执行基于集合的操作。不是在父表中插入1000行,而是在子表中插入5000行,而是现在必须在游标或其他循环中一次插入一行父行,以获取生成的ID,以便您可以分配它们给相关的孩子。我已经看到一个30秒的过程变成了一个20分钟的过程,因为有人坚持在数据库的所有表上使用自动递增ID。
Finally (at least for reasons I'm listing here - there are certainly others), using auto-incrementing IDs on all of your tables promotes poor design. When the designer no longer has to think about what a natural key might be for a table it usually results in erroneous duplicates ending up in the data. You can try to avoid the problem with unique indexes, but in my experience developers and designers don't go through that extra effort and after a year of using their new system they find that the data is a mess because the database didn't have proper constraints on the data through natural keys.
最后(至少出于我在这里列出的原因 - 肯定还有其他原因),在所有表上使用自动递增ID会促使设计不佳。当设计者不再需要考虑表格的自然键时,通常会导致数据中出现错误的重复。您可以尝试避免使用唯一索引的问题,但根据我的经验,开发人员和设计人员不会花费额外的努力,并且在使用他们的新系统一年后,他们发现数据很乱,因为数据库没有通过自然键对数据进行适当的约束。
There's certainly a time for using surrogate keys, but using them blindly on all tables is almost always a mistake.
肯定有时间使用代理键,但盲目地在所有表上使用它们几乎总是一个错误。
#9
1
The primary key for a record must be unique and permanent. If a record naturally has a simple key which fulfills both of those, then use it. However, they don't come around very often. For a person record, the person's name is neither unique nor permanent, so you pretty much have to use a auto-increment.
记录的主键必须是唯一且永久的。如果一个记录自然有一个简单的键来完成这两个,那么使用它。但是,它们并不经常出现。对于个人记录,该人的姓名既不是唯一的也不是永久性的,因此您几乎必须使用自动增量。
The one place where natural keys do work is on a code table, for example, a table mapping a status value to its description. There is little sense to give "Active" a primary key of 1, "Delay" a primary key of 2, etc. When it is just as easy to give "Active" a primary key of "ACT"; "Delayed", "DLY"; "On Hold", "HLD" and so on.
自然键可以工作的一个位置是代码表,例如,将状态值映射到其描述的表。给“主动”主键1,“延迟”主键2等等几乎没有意义。当“主动”给主动键“ACT”同样容易; “延迟”,“DLY”; “暂停”,“HLD”等。
Note also, some say you should use integers over strings because they compare faster. Not really true. A comparing two 4-byte character fields will take exactly as long as comparing two 4-byte integer fields. Longer string will, of course take longer, but if you keep the codes short, there's no difference.
还要注意,有人说你应该在字符串上使用整数,因为它们比较快。不是真的。比较两个4字节字符字段将花费与比较两个4字节整数字段完全相同的时间。更长的字符串当然需要更长的时间,但如果你保持代码简短,那就没有区别了。
#10
0
The primary key must be unique for every row. The auto_increment Integer is very good idea, and if you don't have other ideas about populating the primary key then this is the best way.
主键必须对每一行都是唯一的。 auto_increment Integer是个好主意,如果你没有关于填充主键的其他想法,那么这是最好的方法。
#11
0
In addition to what is all said, consider using a UUID as PK. It will allow you to create keys that are uniq spanning multiple databases.
除了所说的,还可以考虑使用UUID作为PK。它将允许您创建跨越多个数据库的密钥。
If you ever need to export/merge data with other database, then the data will always stay unique and relationships can be easily maintained.
如果您需要将数据导出/合并到其他数据库,那么数据将始终保持唯一,并且可以轻松维护关系。