I realize this question is very likely to have been asked before, but I've searched around a little among questions on *, and I didn't really find an answer to mine, so here goes. If you find a duplicate, please link to it.
我意识到这个问题之前很可能已被问过,但我在*上搜索了一些问题,我没有真正找到答案,所以这里有。如果您发现重复,请链接到它。
For some reason I prefer to use Guid
s (uniqueidentifier
in MsSql) for my primary key fields, but I really don't know why this would be better. In many of tutorials I've walked myself through lately an automatically incremented int
has been used. I can see pro's and cons with both:
出于某种原因,我更喜欢使用Guids(MsSql中的uniqueidentifier)作为我的主键字段,但我真的不知道为什么这会更好。在许多教程中,我最近走了一段时间,使用了自动递增的int。我可以看到两者的亲和缺点:
- A
Guid
is always of the same size and length, and there is no reason to worry about running out of them, whereas there is a limit to how many records you could have before you'd run out of numbers that fit in anint
. -
int
is (at least in C#) a nullable type, which opens for a couple of shortcuts when querying for data. - And
int
is easier to read. - I bet you could come up with at least a couple of more things here.
Guid总是具有相同的大小和长度,并且没有理由担心耗尽它们,而在用完int之前的数字用完之前,你可以拥有多少条记录。
int(至少在C#中)是一个可空类型,在查询数据时会打开几个快捷方式。
int更容易阅读。
我打赌你可以在这里提出至少几件事。
So, as simple as the title says it: What is the recommended data type for ID (primary key) columns in a database?
因此,就像标题所说的那样简单:数据库中ID(主键)列的推荐数据类型是什么?
EDIT: After recieving a couple of short answer, I must also add this follow-up question. Without it, your answer is neither compelling nor educating... ;) Why do you think so, and what are the cons of the other option that make you not choose that instead?
编辑:收到几个简短的答案后,我还必须添加这个后续问题。没有它,你的答案既不引人注目也不教育...;)为什么你这么认为,另一种选择的缺点是什么让你不选择呢?
8 个解决方案
#1
Any integer type of sufficient size to store anticipated data ranges. Generally 32 bit ints are viewed as too small (rightly or wrongly) for tables with a lot of rows or changes. A 64 bit int is plenty. Many databases won't have or won't use that integer type but will use a NUMBER type with specified scale and precision. 10-15 digits is a fairly common size.
任何足以存储预期数据范围的整数类型。对于具有大量行或更改的表,通常将32位整数视为太小(正确或错误)。一个64位的int就足够了。许多数据库不会或不会使用该整数类型,但会使用具有指定比例和精度的NUMBER类型。 10-15位是相当常见的大小。
The reason for choosing integer types is twofold:
选择整数类型的原因有两个:
- Size; and
- Speed.
The size of an integer is:
整数的大小是:
- 32 bit: 4 bytes;
- 64 bit: 8 bytes;
- Binary coded decimal: two digits per byte plus as much as a byte for sign, scale and/or precision.
32位:4个字节;
64位:8个字节;
二进制编码的十进制:每个字节两位数加上符号,比例和/或精度的字节数。
Compare that to a GUID, which is 128 bits or a normal string, which is at least one byte per character (more in certain character encodings) plus an overhead that might be as little as one byte (terminating null) or could be much more in some cases.
将其与GUID(128位或普通字符串)进行比较,每个字符至少有一个字节(某些字符编码更多)加上可能只有一个字节的开销(终止为空)或者可能更多在某些情况下。
Sorting integers is trivial and, assuming they are unique and the range is sufficiently small, can actually be done in O(n) time, compared to, at best, O(n log n).
对整数进行排序是微不足道的,假设它们是唯一的并且范围足够小,实际上可以在O(n)时间内完成,与最多O(n log n)相比。
also, just as importantly, most databases can generate unique IDs by means of auto-increment columns and/or sequences. Guaranteeing uniqueness in an application is otherwise actually quite hard and tends to result in bloated keys.
同样重要的是,大多数数据库可以通过自动增量列和/或序列生成唯一ID。实际上,保证应用程序的唯一性实际上非常困难并且往往会导致密钥膨胀。
Plus auto-generated integer keys are typically either loosely or absolutely ordered (depending on database and configuration), which is a useful quality. Randomly generated GUIDs are basically unordered, which is far less useful.
加上自动生成的整数键通常是松散或绝对有序的(取决于数据库和配置),这是一种有用的质量。随机生成的GUID基本上是无序的,这远没那么有用。
#2
Popular databases allow for larger autoincrement fields for years now, so it's much less of an issue.
流行的数据库现在允许更大的自动增量字段,所以它不是一个问题。
As for what to use, it's always a choice. One is not clearly better than the other, they have different characteristics and each is good in different scenarios. I have used both over time, and the next schema I work with I'll consider both.
至于使用什么,它总是一个选择。一个并不明显优于另一个,它们具有不同的特征,并且在不同的情况下各自都很好。随着时间的推移,我已经使用了两个,而我使用的下一个模式我将考虑两者。
Pros for GUID:
GUID的优点:
- Should be unique across computers.
- Random, unmemorable goo means people are likely to use this only for its intended purpose of an opaque identifier.
在计算机上应该是唯一的。
随机,不可取的粘性意味着人们可能仅将其用于不透明标识符的预期目的。
Pros for autoincrement:
自动增量的优点:
- Human understandable.
- Sequential assignment means you can use a clustered index and impact performance.
- Suitable for data partitioning.
顺序分配意味着您可以使用聚簇索引和影响性能。
适用于数据分区。
#3
A big disadvantage of using GUID keys is that it is difficult to perform "ad-hoc" queries by hand. Sometimes it is very useful that you can do this:
使用GUID密钥的一大缺点是难以手动执行“临时”查询。有时你可以这样做是非常有用的:
SELECT * FROM User where UserID=452245
SELECT * FROM User,其中UserID = 452245
With GUID keys this can become very annoying.
使用GUID键,这会变得非常烦人。
I would recommend 64 bit integers
我推荐64位整数
#4
Tell me what criteria you think are important.
告诉我你认为重要的标准。
What's required is to be unique within the table.
所需要的是在表格中是独一无二的。
A GUID is a global probabilistically-unique identifier. It's also big. If you need your indices to be unique to within epsilon over every other database installation in the universe, it's a good choice. Otherwise, it's using lots of space unnecessarily.
GUID是全局概率唯一标识符。它也很大。如果您需要索引在epsilon中对于Universe中的每个其他数据库安装都是唯一的,那么这是一个不错的选择。否则,它会不必要地使用大量空间。
An autoincrement number is good; it's small, and sure to be unique within the table. On the other hand, it gives you no protection against duplication; two entries, identical except for the magic number, are easy to create.
自动增量数是好的;它很小,并且肯定在表中是独一无二的。另一方面,它无法防止重复;除了幻数之外,两个条目相同,很容易创建。
Using some value that is tied to the entity being describes avoids that, but you have the problem of dealing with uniqueness.
使用与正在描述的实体相关联的某些值可以避免这种情况,但是您遇到了处理唯一性的问题。
#5
If you use a long, you could create over 1000 a second and not run out of primary keys for 29 million years.
如果你使用long,你可以创建超过1000秒,而不是用完2900万年的主键。
Others have already mentioned some of the advantages of using an integer type instead of a UUID/GUID. One of the big advantages is the speed and compactness of the indexes.
其他人已经提到了使用整数类型而不是UUID / GUID的一些优点。其中一个重要优点是索引的速度和紧凑性。
An application I was recently involved in where I did the database design, I needed UUIDs, but didn't want to give up the advantages of using longs for primary keys, so I had a "allIds" table that mapped every primary key in the system to a UUID. All my primary keys were generated from a single sequence, so they were all unique across all tables.
我最近参与了数据库设计的应用程序,我需要UUID,但不想放弃使用longs作为主键的优点,所以我有一个“allIds”表映射了每个主键。系统到UUID。我的所有主键都是从单个序列生成的,因此它们在所有表中都是唯一的。
#6
If the database is distributed, where you could get records from other databases, the primary key needs to be unique within a table across all the databases. GUID solves this issue, albeit at the cost of space. A combination of autoincrement and namespace would be a good tradeoff.
如果数据库是分布式的,您可以从其他数据库获取记录,则主键在所有数据库的表中必须是唯一的。 GUID解决了这个问题,尽管是以空间为代价。自动增量和命名空间的组合将是一个很好的权衡。
It would be nice if databases could provide inbuild support for autoincrements with "prefixes". So in one database, I get IDs like X1,X2,X3 ... and so on whereas in the other database it could be Y1,Y2,Y3 ... and so on.
如果数据库可以为带有“前缀”的自动增量提供内部支持,那将是很好的。因此,在一个数据库中,我获得了像X1,X2,X3 ......等ID,而在另一个数据库中,它可能是Y1,Y2,Y3 ......等等。
#7
I asked a similar question which has a few answers that might help. Replication seems to be the biggest advantage of using GUIDs.
我问了一个类似的问题,其中有一些答案可能有所帮助。复制似乎是使用GUID的最大优势。
Reasons not to use an auto-incrementing number for a primary key
不为主键使用自动递增编号的原因
#8
Follow Cletus's advice, with the additional caveat of it largely depends on what your storting. Never, ever, use a GUID. GUID's have a whole bundle of downsides, and only one or two upsides.
遵循Cletus的建议,另外需要注意的是它在很大程度上取决于你的反应。永远不要使用GUID。 GUID有一大堆缺点,只有一两个上行空间。
#1
Any integer type of sufficient size to store anticipated data ranges. Generally 32 bit ints are viewed as too small (rightly or wrongly) for tables with a lot of rows or changes. A 64 bit int is plenty. Many databases won't have or won't use that integer type but will use a NUMBER type with specified scale and precision. 10-15 digits is a fairly common size.
任何足以存储预期数据范围的整数类型。对于具有大量行或更改的表,通常将32位整数视为太小(正确或错误)。一个64位的int就足够了。许多数据库不会或不会使用该整数类型,但会使用具有指定比例和精度的NUMBER类型。 10-15位是相当常见的大小。
The reason for choosing integer types is twofold:
选择整数类型的原因有两个:
- Size; and
- Speed.
The size of an integer is:
整数的大小是:
- 32 bit: 4 bytes;
- 64 bit: 8 bytes;
- Binary coded decimal: two digits per byte plus as much as a byte for sign, scale and/or precision.
32位:4个字节;
64位:8个字节;
二进制编码的十进制:每个字节两位数加上符号,比例和/或精度的字节数。
Compare that to a GUID, which is 128 bits or a normal string, which is at least one byte per character (more in certain character encodings) plus an overhead that might be as little as one byte (terminating null) or could be much more in some cases.
将其与GUID(128位或普通字符串)进行比较,每个字符至少有一个字节(某些字符编码更多)加上可能只有一个字节的开销(终止为空)或者可能更多在某些情况下。
Sorting integers is trivial and, assuming they are unique and the range is sufficiently small, can actually be done in O(n) time, compared to, at best, O(n log n).
对整数进行排序是微不足道的,假设它们是唯一的并且范围足够小,实际上可以在O(n)时间内完成,与最多O(n log n)相比。
also, just as importantly, most databases can generate unique IDs by means of auto-increment columns and/or sequences. Guaranteeing uniqueness in an application is otherwise actually quite hard and tends to result in bloated keys.
同样重要的是,大多数数据库可以通过自动增量列和/或序列生成唯一ID。实际上,保证应用程序的唯一性实际上非常困难并且往往会导致密钥膨胀。
Plus auto-generated integer keys are typically either loosely or absolutely ordered (depending on database and configuration), which is a useful quality. Randomly generated GUIDs are basically unordered, which is far less useful.
加上自动生成的整数键通常是松散或绝对有序的(取决于数据库和配置),这是一种有用的质量。随机生成的GUID基本上是无序的,这远没那么有用。
#2
Popular databases allow for larger autoincrement fields for years now, so it's much less of an issue.
流行的数据库现在允许更大的自动增量字段,所以它不是一个问题。
As for what to use, it's always a choice. One is not clearly better than the other, they have different characteristics and each is good in different scenarios. I have used both over time, and the next schema I work with I'll consider both.
至于使用什么,它总是一个选择。一个并不明显优于另一个,它们具有不同的特征,并且在不同的情况下各自都很好。随着时间的推移,我已经使用了两个,而我使用的下一个模式我将考虑两者。
Pros for GUID:
GUID的优点:
- Should be unique across computers.
- Random, unmemorable goo means people are likely to use this only for its intended purpose of an opaque identifier.
在计算机上应该是唯一的。
随机,不可取的粘性意味着人们可能仅将其用于不透明标识符的预期目的。
Pros for autoincrement:
自动增量的优点:
- Human understandable.
- Sequential assignment means you can use a clustered index and impact performance.
- Suitable for data partitioning.
顺序分配意味着您可以使用聚簇索引和影响性能。
适用于数据分区。
#3
A big disadvantage of using GUID keys is that it is difficult to perform "ad-hoc" queries by hand. Sometimes it is very useful that you can do this:
使用GUID密钥的一大缺点是难以手动执行“临时”查询。有时你可以这样做是非常有用的:
SELECT * FROM User where UserID=452245
SELECT * FROM User,其中UserID = 452245
With GUID keys this can become very annoying.
使用GUID键,这会变得非常烦人。
I would recommend 64 bit integers
我推荐64位整数
#4
Tell me what criteria you think are important.
告诉我你认为重要的标准。
What's required is to be unique within the table.
所需要的是在表格中是独一无二的。
A GUID is a global probabilistically-unique identifier. It's also big. If you need your indices to be unique to within epsilon over every other database installation in the universe, it's a good choice. Otherwise, it's using lots of space unnecessarily.
GUID是全局概率唯一标识符。它也很大。如果您需要索引在epsilon中对于Universe中的每个其他数据库安装都是唯一的,那么这是一个不错的选择。否则,它会不必要地使用大量空间。
An autoincrement number is good; it's small, and sure to be unique within the table. On the other hand, it gives you no protection against duplication; two entries, identical except for the magic number, are easy to create.
自动增量数是好的;它很小,并且肯定在表中是独一无二的。另一方面,它无法防止重复;除了幻数之外,两个条目相同,很容易创建。
Using some value that is tied to the entity being describes avoids that, but you have the problem of dealing with uniqueness.
使用与正在描述的实体相关联的某些值可以避免这种情况,但是您遇到了处理唯一性的问题。
#5
If you use a long, you could create over 1000 a second and not run out of primary keys for 29 million years.
如果你使用long,你可以创建超过1000秒,而不是用完2900万年的主键。
Others have already mentioned some of the advantages of using an integer type instead of a UUID/GUID. One of the big advantages is the speed and compactness of the indexes.
其他人已经提到了使用整数类型而不是UUID / GUID的一些优点。其中一个重要优点是索引的速度和紧凑性。
An application I was recently involved in where I did the database design, I needed UUIDs, but didn't want to give up the advantages of using longs for primary keys, so I had a "allIds" table that mapped every primary key in the system to a UUID. All my primary keys were generated from a single sequence, so they were all unique across all tables.
我最近参与了数据库设计的应用程序,我需要UUID,但不想放弃使用longs作为主键的优点,所以我有一个“allIds”表映射了每个主键。系统到UUID。我的所有主键都是从单个序列生成的,因此它们在所有表中都是唯一的。
#6
If the database is distributed, where you could get records from other databases, the primary key needs to be unique within a table across all the databases. GUID solves this issue, albeit at the cost of space. A combination of autoincrement and namespace would be a good tradeoff.
如果数据库是分布式的,您可以从其他数据库获取记录,则主键在所有数据库的表中必须是唯一的。 GUID解决了这个问题,尽管是以空间为代价。自动增量和命名空间的组合将是一个很好的权衡。
It would be nice if databases could provide inbuild support for autoincrements with "prefixes". So in one database, I get IDs like X1,X2,X3 ... and so on whereas in the other database it could be Y1,Y2,Y3 ... and so on.
如果数据库可以为带有“前缀”的自动增量提供内部支持,那将是很好的。因此,在一个数据库中,我获得了像X1,X2,X3 ......等ID,而在另一个数据库中,它可能是Y1,Y2,Y3 ......等等。
#7
I asked a similar question which has a few answers that might help. Replication seems to be the biggest advantage of using GUIDs.
我问了一个类似的问题,其中有一些答案可能有所帮助。复制似乎是使用GUID的最大优势。
Reasons not to use an auto-incrementing number for a primary key
不为主键使用自动递增编号的原因
#8
Follow Cletus's advice, with the additional caveat of it largely depends on what your storting. Never, ever, use a GUID. GUID's have a whole bundle of downsides, and only one or two upsides.
遵循Cletus的建议,另外需要注意的是它在很大程度上取决于你的反应。永远不要使用GUID。 GUID有一大堆缺点,只有一两个上行空间。