I know it is better to use primary key when design tables.
我知道在设计表时使用主键更好。
But I want to know when design a primary key, it is needed to setting auto_increment?
但我想知道何时设计一个主键,需要设置auto_increment?
If done, what's the benefit?
如果完成,有什么好处?
I listen, that can keep b-tree's stable, but i don't know why?
我听,这可以保持b-tree的稳定,但我不知道为什么?
If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?
如果table具有唯一列,那么将唯一列设置为主键或将新列'id'添加为auto_increment主键更好?
Can you help me? Thanks.
你可以帮我吗?谢谢。
4 个解决方案
#1
3
I want to know when design a primary key, it is needed to setting auto_increment?
我想知道何时设计主键,需要设置auto_increment?
No, it's not strictly necessary. There are cases when a natural key is fine.
不,这不是绝对必要的。有些情况下,自然键很好。
If done, what's the benefit?
如果完成,有什么好处?
Advantages of using an auto-increment surrogate key:
使用自动增量代理键的优点:
- Surrogate keys never need to change, even if all other columns in your table are possible to change.
- It's easier for the RDBMS to ensure uniqueness of an auto-increment key without locking and without race conditions, when you have multiple users inserting concurrently.
- Using an integer is the most compact data type you can use for a primary key, so it results in a smaller index than using a long string, for example.
- Efficiency of inserting into B-tree indexes (see below).
- It's a little easier and tidier to reference a row with a single column than multiple columns, when the only other candidate key consisted of several columns.
代理键永远不需要更改,即使表中的所有其他列都可以更改。
当您有多个用户同时插入时,RDBMS更容易确保自动增量键的唯一性,而无需锁定和没有竞争条件。
使用整数是可用于主键的最紧凑的数据类型,因此它导致的索引比使用长字符串的索引小。
插入B树索引的效率(见下文)。
当唯一的其他候选键由多列组成时,引用具有单列而不是多列的行更容易和更整洁。
Advantages of using a natural key:
使用自然键的优点:
- The column has some meaning for the entity, for example a phone number. You don't need to store an extra column for the surrogate key.
- Other tables using foreign keys to reference a natural primary key get a meaningful value, so they can avoid a join. For example, a table of
shoes
referencingcolors
would need to do a join if you wanted to get the color name. But if you use the color name as the primary key ofcolors
, then that value would already be part of theshoes
table.
该列对于实体具有一些含义,例如电话号码。您不需要为代理键存储额外的列。
使用外键引用自然主键的其他表获得有意义的值,因此它们可以避免连接。例如,如果您想获取颜色名称,则需要使用引用颜色的鞋子表来进行连接。但是如果你使用颜色名称作为颜色的主键,那么该值就已经成为了鞋子表的一部分。
Other cases when a surrogate auto-increment key is not needed:
不需要代理自动增量键的其他情况:
- You already have a combination of other columns (whether they are surrogate keys or natural keys) that provides a candidate key for the table. A good example is found in many-to-many tables. If a table maps movies to actors, even if both movies and actors are referenced by primary keys, then you already have a candidate key over those two columns, and you don't need yet another auto-increment column.
您已经拥有了为表提供候选键的其他列(无论是代理键还是自然键)的组合。在多对多表中可以找到一个很好的例子。如果表将电影映射到演员,即使主键引用了电影和演员,那么您已经拥有这两列的候选键,并且您不需要另一个自动增量列。
I listen, that can keep b-tree's stable, but i don't know why?
我听,这可以保持b-tree的稳定,但我不知道为什么?
Inserting a value into an arbitrary place in the middle of a B-tree may cause a costly restructuring of the index.
将值插入B树中间的任意位置可能会导致索引的重组成本高昂。
There's an animated example here: http://www.bluerwhite.org/btree/
这里有一个动画示例:http://www.bluerwhite.org/btree/
Look at the example "Inserting Key 33 into a B-Tree (w/ Split)" where it shows the steps of inserting a value into a B-tree node that overfills it, and what the B-tree does in response.
查看示例“将密钥33插入B树(带拆分)”,其中显示了将值插入到溢出它的B树节点中的步骤,以及B树响应的内容。
Now imagine that the example illustration only shows the bottom part of a B-tree that is much deeper (as would be in the case of an index B-tree has millions of entries), and filling the parent node can itself be an overflow, and force the splitting operation to continue up the the higher level in the tree. This can continue all the way to the very top of the tree if all the ancestor nodes to the top of the tree were already filled.
现在想象一下,示例说明只显示B树的底部更深(就像索引B树的情况一样有数百万个条目),填充父节点本身就是溢出,并强制拆分操作继续向上移动树中的更高级别。如果已经填充了树顶部的所有祖先节点,则这可以一直持续到树的最顶层。
As the nodes split and have to be restructured, they may require more space, but they're stored on some page of the database file where there's no spare space. So the storage engine has to relocate parts of the index to another part of the file, and potentially re-write a lot of pages of index just for a single INSERT.
由于节点分裂并且必须重新构建,它们可能需要更多空间,但它们存储在数据库文件的某些页面上,其中没有空余空间。因此,存储引擎必须将部分索引重定位到文件的另一部分,并且可能仅针对单个INSERT重写大量索引页。
Auto-increment values are naturally always inserted at the very rightmost edge of the B-tree. As @ BrankoDimitrijevic points out in a comment below, this does not make it less likely that they'll cause such laborious node-splitting and restructuring to the index. But the B-tree implementation code can optimize for this case in other ways, and some do.
自动增量值自然总是插入B树的最右边。正如@ BrankoDimitrijevic在下面的评论中指出的那样,这并不会降低它们导致如此费力的节点分裂和重组到索引的可能性。但是B树实现代码可以通过其他方式针对这种情况进行优化,而有些则可以。
If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?
如果table具有唯一列,那么将唯一列设置为主键或将新列'id'添加为auto_increment主键更好?
If the unique column is also non-nullable, then you can use it as a primary key. Primary keys require that all of their columns are non-nullable.
如果唯一列也是非可空的,则可以将其用作主键。主键要求所有列都不可为空。
#2
6
I know it is better to use primary key when design tables.
我知道在设计表时使用主键更好。
In fact, a key-less table is a multiset (since it allows duplicates), and therefore not strictly a relation (which is a set), and therefore your database would not really be "relational".
事实上,无密钥表是多集(因为它允许重复),因此不是严格的关系(这是一个集合),因此您的数据库实际上不是“关系”。
Please note that "primary" (PRIMARY KEY) and "alternate" (UNIQUE constraint) keys are logically equivalent.
请注意,“primary”(PRIMARY KEY)和“alternate”(UNIQUE约束)键在逻辑上是等效的。
But I want to know when design a primary key, it is needed to setting auto_increment?
但我想知道何时设计一个主键,需要设置auto_increment?
You are actually asking multiple questions:
您实际上是在问多个问题:
- Should I create a key?
- If yes, should I create a surrogate key?
- If yes, should it be integer?
- If yes, should I make it auto-incremented?
我应该创建一个密钥吗?
如果是,我应该创建一个代理键吗?
如果是,它应该是整数吗?
如果是,我应该让它自动递增吗?
The answer to (1) is "almost always". There are some very rare cases when data is not "important" and you might skip it for performance reasons, but that's exceedingly rare.
(1)的答案是“几乎总是”。在数据不“重要”的情况下,有一些非常罕见的情况,出于性能原因,您可能会跳过它,但这种情况非常罕见。
The answer to (2) is "it depends" - main pros and cons can be found here.
(2)的答案是“它取决于” - 主要的优点和缺点可以在这里找到。
The answer to (3) depends on whether you need to generate keys independently from the database (e.g. while disconnected, or while connected to a different database). If yes, you could use GUIDs (which obviously cannot be auto-incremented, but can be uniquely generated in isolation). If no, then you can just use integers - they are more compact and generally faster.
(3)的答案取决于您是否需要独立于数据库生成密钥(例如,在断开连接时,或在连接到不同的数据库时)。如果是,您可以使用GUID(显然不能自动递增,但可以独立生成)。如果不是,那么你可以使用整数 - 它们更紧凑,通常更快。
And finally, if you reached (4), then you'll almost certainly want to make it auto-incremented, for the reasons discussed below.
最后,如果你达到(4),那么你几乎肯定会想让它自动递增,原因如下所述。
If done, what's the benefit?
如果完成,有什么好处?
- The main benefit of making an integer surrogate key auto-incremented is that multiple concurrent clients will never receive the same generated value. If you just try to
SELECT MAX(ID) + 1 FROM ...
, there is no guarantee some other client will not try to do the same thing at the same time, and end-up with the same result (subsequently causing a key violation). - The other benefit is that the DBMS will use a highly optimized code path for generating new unique values.
- The drawback is that auto-increment mechanism is typically not transaction-aware: if you generate a new ID value and then ROLLBACK the transaction, that value will not be generated again. That being said, surrogate keys don't have any meaning (if they had, they would not be surrogate), so such "holes" are inconsequential.
使整数代理键自动递增的主要好处是多个并发客户端永远不会收到相同的生成值。如果您只是尝试SELECT MAX(ID)+ 1 FROM ...,则无法保证其他客户端不会同时尝试执行相同的操作,并以相同的结果结束(随后导致密钥)违反)。
另一个好处是DBMS将使用高度优化的代码路径来生成新的唯一值。
缺点是自动增量机制通常不是事务感知的:如果生成新的ID值然后ROLLBACK事务,则不会再次生成该值。话虽如此,代理键没有任何意义(如果有的话,它们不会是代理),所以这样的“漏洞”是无关紧要的。
If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?
如果table具有唯一列,那么将唯一列设置为主键或将新列'id'添加为auto_increment主键更好?
If the attribute is intrinsically unique at the "logical level", then the corresponding table column must be made unique (through PRIMARY KEY or UNIQUE constraint), whether you later decide to add a surrogate key or not.
如果属性在“逻辑级别”本质上是唯一的,则必须使相应的表列唯一(通过PRIMARY KEY或UNIQUE约束),无论您是否稍后决定添加代理键。
#3
0
To have an auto-increment PK makes it easy to create a key that never needs to change, which in turn makes it easy to reference in other tables.
要拥有自动增量PK,可以轻松创建永远不需要更改的密钥,从而可以轻松地在其他表中引用。
If your data is such that you have natural columns that are unique and can never change you can use them just as well. Mind you most things that "will never change" tends to do so anyway given enough time, like someones social security number...
如果您的数据具有唯一且无法更改的自然列,则您也可以使用它们。请注意,大多数“永远不会改变”的事情往往会在给定足够时间的情况下这样做,就像某人的社会保障号码一样......
For simplicity I always use auto-increment (identity) columns for PK.
为简单起见,我总是使用PK的自动增量(标识)列。
#4
-1
Support for Using Surrogate Keys in Table Definitions
Thanks to @Branko Dimitrijevic for opening the topic of relational database primary keys (PK's) by describing the role of SURROGATE KEYS
and getting to the center of the discussion. Surrogate keys are by definition devoid of any intrinsic meaning aside from their uniqueness between their values in each record of their table.
感谢@Branko Dimitrijevic通过描述SURROGATE KEYS的角色并进入讨论的中心来打开关系数据库主键(PK)的主题。根据定义,代理键除了它们在表中每个记录中的值之间的唯一性之外,没有任何内在含义。
Thanks also to @Mattias Åslund for your additional wisdom:
还要感谢@MattiasÅslund的额外智慧:
Mind you most things that "will never change" tends to do so anyway given enough time, like someones social security number...
请注意,大多数“永远不会改变”的事情往往会在给定足够时间的情况下这样做,就像某人的社会保障号码一样......
I add that even if the value chosen as "unchangeable" really does not change, it is also just as likely for the rules of the supported business or organization itself to drift and change over time in ways that may affect the core assumptions of given design.
我补充说,即使选择为“不可更改”的值确实没有改变,支持的业务或组织本身的规则也可能随着时间的推移而变化,从而可能影响给定设计的核心假设。 。
A useful discussion on the integration of demographic and biometric key values for tracking individuals can be found in this section on Choosing an Appropriate Key for New Databases put out by the Computer Professionals for Social Responsibility.
有关跟踪个人的人口统计和生物识别关键值的集成的有用讨论可以在计算机专业人员为社会责任选择的“为新数据库选择适当的密钥”一节中找到。
Cases of Alternate Unique Database Keys: A Sample Schema
I plan to approach the comments on this post with a discussion around a specific example design to explain the kinds of things that can go wrong with assigning a Primary Key that isn't a surrogate key. Many of these assumptions are taken from observations from real applications. They are remembered well because of the complexity their design introduced into other business processes as other systems and data sources became dependent on their assumptions.
我计划通过围绕特定示例设计的讨论来处理这篇文章的评论,以解释分配不是代理键的主键可能出错的事情。其中许多假设来自实际应用的观察。由于其他系统和数据源依赖于其假设而将其设计引入其他业务流程的复杂性,因此很好地记住了它们。
The design and sample data is below, borrowed loosely from Oracle's infamous Scott/TIGER database design.
设计和样本数据如下,从臭名昭着的臭名昭着的Scott / TIGER数据库设计中借用。
MySQL 5.5.32 Schema Setup:
MySQL 5.5.32架构设置:
CREATE TABLE employee
(
fake_ssn varchar(15) primary key,
last_name varchar(40),
first_name varchar(40),
dept_id varchar(15),
hire_date date,
salary int,
email varchar(100)
);
INSERT INTO employee
(fake_ssn, last_name, first_name, dept_id, hire_date, salary,
email)
VALUES
('130-60-0101', 'MARLOWE', 'JACOB', '1200-05', date('2009/01/25'),
8000, 'jacob@some-company.com'),
('967-22-5025', 'CRACHITT', 'BOB', '1200-05', date('2010/02/05'),
500, 'bobc@some-company.com'),
('040-36-5555', 'PERRY', 'VICTORIA', '1200-02', date('2011/05/25'),
2700, 'vperry@some-company.com'),
('203-89-1010', 'STEVENS', 'KEVIN', '2955-03', date('2007/04/25'),
1800, 'kevin.stevens@some-company.com'),
('409-99-1111', 'MCLANE', 'JONATHAN', '2955-03', date('2009/03/02'),
4200, 'jon.j.mclane@some-company.com');
CREATE TABLE department
(
dept_id varchar(15) primary key,
dept_manager varchar(40),
dept_title varchar(40)
);
INSERT INTO department
(dept_id, dept_manager, dept_title)
VALUES
('1200-05', 'MARLOWE', 'FINANCE'),
('1200-02', null, 'HR'),
('2955-03', 'JOHNM', 'MARKETING');
COMMIT;
Query 1:
SELECT fake_ssn, last_name, first_name, dept_id, hire_date,
salary, email
FROM employee
| FAKE_SSN | LAST_NAME | FIRST_NAME | DEPT_ID | HIRE_DATE | SALARY | EMAIL |
|-------------|-----------|------------|---------|---------------------------------|--------|--------------------------------|
| 040-36-5555 | PERRY | VICTORIA | 1200-02 | May, 25 2011 00:00:00+0000 | 2700 | vperry@some-company.com |
| 130-60-0101 | MARLOWE | JACOB | 1200-05 | January, 25 2009 00:00:00+0000 | 8000 | jacob@some-company.com |
| 203-89-1010 | STEVENS | KEVIN | 2955-03 | April, 25 2007 00:00:00+0000 | 1800 | kevin.stevens@some-company.com |
| 409-99-1111 | MCLANE | JONATHAN | 2955-03 | March, 02 2009 00:00:00+0000 | 4200 | jon.j.mclane@some-company.com |
| 967-22-5025 | CRACHITT | BOB | 1200-05 | February, 05 2010 00:00:00+0000 | 500 | bobc@some-company.com |
Query 2:
SELECT dept_id, dept_manager, dept_title
FROM department
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
Fake Social Security Number
(The FAKE
name is just a reminder that these are all randomly generated values.) Although this is often a popular "unqiue" value in personnel records and databases, according to the U.S. Social Security Administration, this value is not unique. It is also problematic because this value and its transfer is subject to strict regulation because of recently passed privacy laws.
(FAKE名称只是提醒人们,这些都是随机生成的值。)虽然这在人事记录和数据库中通常是一种流行的“unqiue”值,但根据美国社会保障局的说法,这个值并不是唯一的。这也是有问题的,因为这个价值及其转让受到最近通过的隐私法的严格监管。
Name Combinations
Even with the additional combinations created by including the middle initial, somehow there are still just too many people with the same name. A look at what the Social Security Administration has to say about the registered names of babies born in 2012:
即使使用包含中间首字母创建的其他组合,某种程度上仍然只有太多具有相同名称的人。看看社会保障管理局对2012年出生的婴儿的注册名称所说的话:
Two decades from now, when the JACOB's and SOPHIA's of 2012 graduate from school, they will flood into the workforce beside thousands of others like them...
二十年后,当JACOB和2012年的SOPHIA从学校毕业时,他们将涌入劳动力市场,成千上万像他们一样......
Name changes by marriage or legal reasons are also threats to the referential integrity of database records relying on their values as business key values.
通过婚姻或法律原因更改名称也是对依赖于其值作为业务键值的数据库记录的引用完整性的威胁。
Department ID
Some companies will try to derive keys from other values to make SMART KEYS
. These types of keys when observed in practice are NOT smart at all. The values in the example: 1200-02
, 1200-05
, 2955-03
were intended to resemble a "smart key". The first value might be a street address or building number for a company campus or multiple-location business. The second value, ("-02", "-03", "-05") might be the floor of the building where the department is located.
有些公司会尝试从其他值派生出密钥来制作SMART KEYS。在实践中观察这些类型的钥匙根本不聪明。示例中的值:1200-02,1200-05,2955-03旨在类似于“智能钥匙”。第一个值可能是公司园区或多地点业务的街道地址或建筑物编号。第二个值(“-02”,“ - 03”,“ - 05”)可能是部门所在建筑物的楼层。
Changing buildings, moving a department or relocating a business completely would render this location dependency of the DEPARTMENT ID
useless.
更改建筑物,移动部门或完全重新定位业务会使DEPARTMENT ID的此位置依赖性变得无用。
Department Manager
This one is subtle, but there is a hole in this relational connection. A MANAGER
is ALSO an employee, which makes the relational join between EMPLOYEE
and DEPARTMENT
a circular one:
这个很微妙,但这种关系连接有一个漏洞。经理也是一名员工,它使雇员和部门之间的关系联合成为循环:
- Is the
MANAGER
(fromDEPARTMENT
) a foreign key constraint on theEMPLOYEE
table, or - Is
DEPT_ID
(FROMEMPLOYEE
) a foreign key constraint on theDEPARTMENT
table?
MANAGER(来自DEPARTMENT)是EMPLOYEE表上的外键约束,还是
DEPTMENT表上的DEPT_ID(FROM EMPLOYEE)是外键约束吗?
If you give up the Foreign Key constraint between MANAGER
and some key column on EMPLOYEE
(LAST_NAME
or FIRST_NAME
+ LAST_NAME
), you risk non uniform values for MANAGER
.
如果您放弃MANAGER与EMPLOYEE(LAST_NAME或FIRST_NAME + LAST_NAME)上的某个键列之间的外键约束,则您可能面临MANAGER的非统一值。
... Looking at
... 看着
Query of The DEPARTMENT
Table:
查询部门表:
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
The misplacement of DEPT_MANAGER
within the DEPARTMENT
table, as there are three different ways the name of the Department Manager are represented: none (null), ALL-CAPS Last Name, ALL-CAPS First Name, Last Initial.
DEPARTMENT表中DEPT_MANAGER的错位,因为部门管理器的名称有三种不同的表示方式:none(null),ALL-CAPS Last Name,ALL-CAPS First Name,Last Initial。
Conclusions
An important lesson to take from this posting is that making a key more than a key either by integrating derived values, making values based on assumptions about business rules tie down the flexibility of a database design because if the business rules change, so will the values such as Primary Keys or Joining Key values.
从这篇文章中得出的一个重要教训是,通过整合派生值来创建密钥而不是密钥,根据业务规则的假设创建值会降低数据库设计的灵活性,因为如果业务规则发生变化,那么值也会变化例如主键或连接键值。
As the developer or maintainer of the business application, you (or your team) are better able to support the prevailing business conditions if you have taken control and own the parts that represent the internal structure of the business application itself. The primary key might never actually present itself in a customer or user-facing situation, but it should be protected so that the relations it represents does not change with time.
作为业务应用程序的开发人员或维护人员,如果您已控制并拥有代表业务应用程序本身内部结构的部分,则您(或您的团队)能够更好地支持当前的业务条件。主键实际上可能永远不会出现在客户或面向用户的情况中,但它应该受到保护,以便它所代表的关系不会随时间而变化。
Special Thanks:
Image Credit from the popular Baby Names page of 2012:
来自2012年流行婴儿名字页面的图片来源:
#1
3
I want to know when design a primary key, it is needed to setting auto_increment?
我想知道何时设计主键,需要设置auto_increment?
No, it's not strictly necessary. There are cases when a natural key is fine.
不,这不是绝对必要的。有些情况下,自然键很好。
If done, what's the benefit?
如果完成,有什么好处?
Advantages of using an auto-increment surrogate key:
使用自动增量代理键的优点:
- Surrogate keys never need to change, even if all other columns in your table are possible to change.
- It's easier for the RDBMS to ensure uniqueness of an auto-increment key without locking and without race conditions, when you have multiple users inserting concurrently.
- Using an integer is the most compact data type you can use for a primary key, so it results in a smaller index than using a long string, for example.
- Efficiency of inserting into B-tree indexes (see below).
- It's a little easier and tidier to reference a row with a single column than multiple columns, when the only other candidate key consisted of several columns.
代理键永远不需要更改,即使表中的所有其他列都可以更改。
当您有多个用户同时插入时,RDBMS更容易确保自动增量键的唯一性,而无需锁定和没有竞争条件。
使用整数是可用于主键的最紧凑的数据类型,因此它导致的索引比使用长字符串的索引小。
插入B树索引的效率(见下文)。
当唯一的其他候选键由多列组成时,引用具有单列而不是多列的行更容易和更整洁。
Advantages of using a natural key:
使用自然键的优点:
- The column has some meaning for the entity, for example a phone number. You don't need to store an extra column for the surrogate key.
- Other tables using foreign keys to reference a natural primary key get a meaningful value, so they can avoid a join. For example, a table of
shoes
referencingcolors
would need to do a join if you wanted to get the color name. But if you use the color name as the primary key ofcolors
, then that value would already be part of theshoes
table.
该列对于实体具有一些含义,例如电话号码。您不需要为代理键存储额外的列。
使用外键引用自然主键的其他表获得有意义的值,因此它们可以避免连接。例如,如果您想获取颜色名称,则需要使用引用颜色的鞋子表来进行连接。但是如果你使用颜色名称作为颜色的主键,那么该值就已经成为了鞋子表的一部分。
Other cases when a surrogate auto-increment key is not needed:
不需要代理自动增量键的其他情况:
- You already have a combination of other columns (whether they are surrogate keys or natural keys) that provides a candidate key for the table. A good example is found in many-to-many tables. If a table maps movies to actors, even if both movies and actors are referenced by primary keys, then you already have a candidate key over those two columns, and you don't need yet another auto-increment column.
您已经拥有了为表提供候选键的其他列(无论是代理键还是自然键)的组合。在多对多表中可以找到一个很好的例子。如果表将电影映射到演员,即使主键引用了电影和演员,那么您已经拥有这两列的候选键,并且您不需要另一个自动增量列。
I listen, that can keep b-tree's stable, but i don't know why?
我听,这可以保持b-tree的稳定,但我不知道为什么?
Inserting a value into an arbitrary place in the middle of a B-tree may cause a costly restructuring of the index.
将值插入B树中间的任意位置可能会导致索引的重组成本高昂。
There's an animated example here: http://www.bluerwhite.org/btree/
这里有一个动画示例:http://www.bluerwhite.org/btree/
Look at the example "Inserting Key 33 into a B-Tree (w/ Split)" where it shows the steps of inserting a value into a B-tree node that overfills it, and what the B-tree does in response.
查看示例“将密钥33插入B树(带拆分)”,其中显示了将值插入到溢出它的B树节点中的步骤,以及B树响应的内容。
Now imagine that the example illustration only shows the bottom part of a B-tree that is much deeper (as would be in the case of an index B-tree has millions of entries), and filling the parent node can itself be an overflow, and force the splitting operation to continue up the the higher level in the tree. This can continue all the way to the very top of the tree if all the ancestor nodes to the top of the tree were already filled.
现在想象一下,示例说明只显示B树的底部更深(就像索引B树的情况一样有数百万个条目),填充父节点本身就是溢出,并强制拆分操作继续向上移动树中的更高级别。如果已经填充了树顶部的所有祖先节点,则这可以一直持续到树的最顶层。
As the nodes split and have to be restructured, they may require more space, but they're stored on some page of the database file where there's no spare space. So the storage engine has to relocate parts of the index to another part of the file, and potentially re-write a lot of pages of index just for a single INSERT.
由于节点分裂并且必须重新构建,它们可能需要更多空间,但它们存储在数据库文件的某些页面上,其中没有空余空间。因此,存储引擎必须将部分索引重定位到文件的另一部分,并且可能仅针对单个INSERT重写大量索引页。
Auto-increment values are naturally always inserted at the very rightmost edge of the B-tree. As @ BrankoDimitrijevic points out in a comment below, this does not make it less likely that they'll cause such laborious node-splitting and restructuring to the index. But the B-tree implementation code can optimize for this case in other ways, and some do.
自动增量值自然总是插入B树的最右边。正如@ BrankoDimitrijevic在下面的评论中指出的那样,这并不会降低它们导致如此费力的节点分裂和重组到索引的可能性。但是B树实现代码可以通过其他方式针对这种情况进行优化,而有些则可以。
If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?
如果table具有唯一列,那么将唯一列设置为主键或将新列'id'添加为auto_increment主键更好?
If the unique column is also non-nullable, then you can use it as a primary key. Primary keys require that all of their columns are non-nullable.
如果唯一列也是非可空的,则可以将其用作主键。主键要求所有列都不可为空。
#2
6
I know it is better to use primary key when design tables.
我知道在设计表时使用主键更好。
In fact, a key-less table is a multiset (since it allows duplicates), and therefore not strictly a relation (which is a set), and therefore your database would not really be "relational".
事实上,无密钥表是多集(因为它允许重复),因此不是严格的关系(这是一个集合),因此您的数据库实际上不是“关系”。
Please note that "primary" (PRIMARY KEY) and "alternate" (UNIQUE constraint) keys are logically equivalent.
请注意,“primary”(PRIMARY KEY)和“alternate”(UNIQUE约束)键在逻辑上是等效的。
But I want to know when design a primary key, it is needed to setting auto_increment?
但我想知道何时设计一个主键,需要设置auto_increment?
You are actually asking multiple questions:
您实际上是在问多个问题:
- Should I create a key?
- If yes, should I create a surrogate key?
- If yes, should it be integer?
- If yes, should I make it auto-incremented?
我应该创建一个密钥吗?
如果是,我应该创建一个代理键吗?
如果是,它应该是整数吗?
如果是,我应该让它自动递增吗?
The answer to (1) is "almost always". There are some very rare cases when data is not "important" and you might skip it for performance reasons, but that's exceedingly rare.
(1)的答案是“几乎总是”。在数据不“重要”的情况下,有一些非常罕见的情况,出于性能原因,您可能会跳过它,但这种情况非常罕见。
The answer to (2) is "it depends" - main pros and cons can be found here.
(2)的答案是“它取决于” - 主要的优点和缺点可以在这里找到。
The answer to (3) depends on whether you need to generate keys independently from the database (e.g. while disconnected, or while connected to a different database). If yes, you could use GUIDs (which obviously cannot be auto-incremented, but can be uniquely generated in isolation). If no, then you can just use integers - they are more compact and generally faster.
(3)的答案取决于您是否需要独立于数据库生成密钥(例如,在断开连接时,或在连接到不同的数据库时)。如果是,您可以使用GUID(显然不能自动递增,但可以独立生成)。如果不是,那么你可以使用整数 - 它们更紧凑,通常更快。
And finally, if you reached (4), then you'll almost certainly want to make it auto-incremented, for the reasons discussed below.
最后,如果你达到(4),那么你几乎肯定会想让它自动递增,原因如下所述。
If done, what's the benefit?
如果完成,有什么好处?
- The main benefit of making an integer surrogate key auto-incremented is that multiple concurrent clients will never receive the same generated value. If you just try to
SELECT MAX(ID) + 1 FROM ...
, there is no guarantee some other client will not try to do the same thing at the same time, and end-up with the same result (subsequently causing a key violation). - The other benefit is that the DBMS will use a highly optimized code path for generating new unique values.
- The drawback is that auto-increment mechanism is typically not transaction-aware: if you generate a new ID value and then ROLLBACK the transaction, that value will not be generated again. That being said, surrogate keys don't have any meaning (if they had, they would not be surrogate), so such "holes" are inconsequential.
使整数代理键自动递增的主要好处是多个并发客户端永远不会收到相同的生成值。如果您只是尝试SELECT MAX(ID)+ 1 FROM ...,则无法保证其他客户端不会同时尝试执行相同的操作,并以相同的结果结束(随后导致密钥)违反)。
另一个好处是DBMS将使用高度优化的代码路径来生成新的唯一值。
缺点是自动增量机制通常不是事务感知的:如果生成新的ID值然后ROLLBACK事务,则不会再次生成该值。话虽如此,代理键没有任何意义(如果有的话,它们不会是代理),所以这样的“漏洞”是无关紧要的。
If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?
如果table具有唯一列,那么将唯一列设置为主键或将新列'id'添加为auto_increment主键更好?
If the attribute is intrinsically unique at the "logical level", then the corresponding table column must be made unique (through PRIMARY KEY or UNIQUE constraint), whether you later decide to add a surrogate key or not.
如果属性在“逻辑级别”本质上是唯一的,则必须使相应的表列唯一(通过PRIMARY KEY或UNIQUE约束),无论您是否稍后决定添加代理键。
#3
0
To have an auto-increment PK makes it easy to create a key that never needs to change, which in turn makes it easy to reference in other tables.
要拥有自动增量PK,可以轻松创建永远不需要更改的密钥,从而可以轻松地在其他表中引用。
If your data is such that you have natural columns that are unique and can never change you can use them just as well. Mind you most things that "will never change" tends to do so anyway given enough time, like someones social security number...
如果您的数据具有唯一且无法更改的自然列,则您也可以使用它们。请注意,大多数“永远不会改变”的事情往往会在给定足够时间的情况下这样做,就像某人的社会保障号码一样......
For simplicity I always use auto-increment (identity) columns for PK.
为简单起见,我总是使用PK的自动增量(标识)列。
#4
-1
Support for Using Surrogate Keys in Table Definitions
Thanks to @Branko Dimitrijevic for opening the topic of relational database primary keys (PK's) by describing the role of SURROGATE KEYS
and getting to the center of the discussion. Surrogate keys are by definition devoid of any intrinsic meaning aside from their uniqueness between their values in each record of their table.
感谢@Branko Dimitrijevic通过描述SURROGATE KEYS的角色并进入讨论的中心来打开关系数据库主键(PK)的主题。根据定义,代理键除了它们在表中每个记录中的值之间的唯一性之外,没有任何内在含义。
Thanks also to @Mattias Åslund for your additional wisdom:
还要感谢@MattiasÅslund的额外智慧:
Mind you most things that "will never change" tends to do so anyway given enough time, like someones social security number...
请注意,大多数“永远不会改变”的事情往往会在给定足够时间的情况下这样做,就像某人的社会保障号码一样......
I add that even if the value chosen as "unchangeable" really does not change, it is also just as likely for the rules of the supported business or organization itself to drift and change over time in ways that may affect the core assumptions of given design.
我补充说,即使选择为“不可更改”的值确实没有改变,支持的业务或组织本身的规则也可能随着时间的推移而变化,从而可能影响给定设计的核心假设。 。
A useful discussion on the integration of demographic and biometric key values for tracking individuals can be found in this section on Choosing an Appropriate Key for New Databases put out by the Computer Professionals for Social Responsibility.
有关跟踪个人的人口统计和生物识别关键值的集成的有用讨论可以在计算机专业人员为社会责任选择的“为新数据库选择适当的密钥”一节中找到。
Cases of Alternate Unique Database Keys: A Sample Schema
I plan to approach the comments on this post with a discussion around a specific example design to explain the kinds of things that can go wrong with assigning a Primary Key that isn't a surrogate key. Many of these assumptions are taken from observations from real applications. They are remembered well because of the complexity their design introduced into other business processes as other systems and data sources became dependent on their assumptions.
我计划通过围绕特定示例设计的讨论来处理这篇文章的评论,以解释分配不是代理键的主键可能出错的事情。其中许多假设来自实际应用的观察。由于其他系统和数据源依赖于其假设而将其设计引入其他业务流程的复杂性,因此很好地记住了它们。
The design and sample data is below, borrowed loosely from Oracle's infamous Scott/TIGER database design.
设计和样本数据如下,从臭名昭着的臭名昭着的Scott / TIGER数据库设计中借用。
MySQL 5.5.32 Schema Setup:
MySQL 5.5.32架构设置:
CREATE TABLE employee
(
fake_ssn varchar(15) primary key,
last_name varchar(40),
first_name varchar(40),
dept_id varchar(15),
hire_date date,
salary int,
email varchar(100)
);
INSERT INTO employee
(fake_ssn, last_name, first_name, dept_id, hire_date, salary,
email)
VALUES
('130-60-0101', 'MARLOWE', 'JACOB', '1200-05', date('2009/01/25'),
8000, 'jacob@some-company.com'),
('967-22-5025', 'CRACHITT', 'BOB', '1200-05', date('2010/02/05'),
500, 'bobc@some-company.com'),
('040-36-5555', 'PERRY', 'VICTORIA', '1200-02', date('2011/05/25'),
2700, 'vperry@some-company.com'),
('203-89-1010', 'STEVENS', 'KEVIN', '2955-03', date('2007/04/25'),
1800, 'kevin.stevens@some-company.com'),
('409-99-1111', 'MCLANE', 'JONATHAN', '2955-03', date('2009/03/02'),
4200, 'jon.j.mclane@some-company.com');
CREATE TABLE department
(
dept_id varchar(15) primary key,
dept_manager varchar(40),
dept_title varchar(40)
);
INSERT INTO department
(dept_id, dept_manager, dept_title)
VALUES
('1200-05', 'MARLOWE', 'FINANCE'),
('1200-02', null, 'HR'),
('2955-03', 'JOHNM', 'MARKETING');
COMMIT;
Query 1:
SELECT fake_ssn, last_name, first_name, dept_id, hire_date,
salary, email
FROM employee
| FAKE_SSN | LAST_NAME | FIRST_NAME | DEPT_ID | HIRE_DATE | SALARY | EMAIL |
|-------------|-----------|------------|---------|---------------------------------|--------|--------------------------------|
| 040-36-5555 | PERRY | VICTORIA | 1200-02 | May, 25 2011 00:00:00+0000 | 2700 | vperry@some-company.com |
| 130-60-0101 | MARLOWE | JACOB | 1200-05 | January, 25 2009 00:00:00+0000 | 8000 | jacob@some-company.com |
| 203-89-1010 | STEVENS | KEVIN | 2955-03 | April, 25 2007 00:00:00+0000 | 1800 | kevin.stevens@some-company.com |
| 409-99-1111 | MCLANE | JONATHAN | 2955-03 | March, 02 2009 00:00:00+0000 | 4200 | jon.j.mclane@some-company.com |
| 967-22-5025 | CRACHITT | BOB | 1200-05 | February, 05 2010 00:00:00+0000 | 500 | bobc@some-company.com |
Query 2:
SELECT dept_id, dept_manager, dept_title
FROM department
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
Fake Social Security Number
(The FAKE
name is just a reminder that these are all randomly generated values.) Although this is often a popular "unqiue" value in personnel records and databases, according to the U.S. Social Security Administration, this value is not unique. It is also problematic because this value and its transfer is subject to strict regulation because of recently passed privacy laws.
(FAKE名称只是提醒人们,这些都是随机生成的值。)虽然这在人事记录和数据库中通常是一种流行的“unqiue”值,但根据美国社会保障局的说法,这个值并不是唯一的。这也是有问题的,因为这个价值及其转让受到最近通过的隐私法的严格监管。
Name Combinations
Even with the additional combinations created by including the middle initial, somehow there are still just too many people with the same name. A look at what the Social Security Administration has to say about the registered names of babies born in 2012:
即使使用包含中间首字母创建的其他组合,某种程度上仍然只有太多具有相同名称的人。看看社会保障管理局对2012年出生的婴儿的注册名称所说的话:
Two decades from now, when the JACOB's and SOPHIA's of 2012 graduate from school, they will flood into the workforce beside thousands of others like them...
二十年后,当JACOB和2012年的SOPHIA从学校毕业时,他们将涌入劳动力市场,成千上万像他们一样......
Name changes by marriage or legal reasons are also threats to the referential integrity of database records relying on their values as business key values.
通过婚姻或法律原因更改名称也是对依赖于其值作为业务键值的数据库记录的引用完整性的威胁。
Department ID
Some companies will try to derive keys from other values to make SMART KEYS
. These types of keys when observed in practice are NOT smart at all. The values in the example: 1200-02
, 1200-05
, 2955-03
were intended to resemble a "smart key". The first value might be a street address or building number for a company campus or multiple-location business. The second value, ("-02", "-03", "-05") might be the floor of the building where the department is located.
有些公司会尝试从其他值派生出密钥来制作SMART KEYS。在实践中观察这些类型的钥匙根本不聪明。示例中的值:1200-02,1200-05,2955-03旨在类似于“智能钥匙”。第一个值可能是公司园区或多地点业务的街道地址或建筑物编号。第二个值(“-02”,“ - 03”,“ - 05”)可能是部门所在建筑物的楼层。
Changing buildings, moving a department or relocating a business completely would render this location dependency of the DEPARTMENT ID
useless.
更改建筑物,移动部门或完全重新定位业务会使DEPARTMENT ID的此位置依赖性变得无用。
Department Manager
This one is subtle, but there is a hole in this relational connection. A MANAGER
is ALSO an employee, which makes the relational join between EMPLOYEE
and DEPARTMENT
a circular one:
这个很微妙,但这种关系连接有一个漏洞。经理也是一名员工,它使雇员和部门之间的关系联合成为循环:
- Is the
MANAGER
(fromDEPARTMENT
) a foreign key constraint on theEMPLOYEE
table, or - Is
DEPT_ID
(FROMEMPLOYEE
) a foreign key constraint on theDEPARTMENT
table?
MANAGER(来自DEPARTMENT)是EMPLOYEE表上的外键约束,还是
DEPTMENT表上的DEPT_ID(FROM EMPLOYEE)是外键约束吗?
If you give up the Foreign Key constraint between MANAGER
and some key column on EMPLOYEE
(LAST_NAME
or FIRST_NAME
+ LAST_NAME
), you risk non uniform values for MANAGER
.
如果您放弃MANAGER与EMPLOYEE(LAST_NAME或FIRST_NAME + LAST_NAME)上的某个键列之间的外键约束,则您可能面临MANAGER的非统一值。
... Looking at
... 看着
Query of The DEPARTMENT
Table:
查询部门表:
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
The misplacement of DEPT_MANAGER
within the DEPARTMENT
table, as there are three different ways the name of the Department Manager are represented: none (null), ALL-CAPS Last Name, ALL-CAPS First Name, Last Initial.
DEPARTMENT表中DEPT_MANAGER的错位,因为部门管理器的名称有三种不同的表示方式:none(null),ALL-CAPS Last Name,ALL-CAPS First Name,Last Initial。
Conclusions
An important lesson to take from this posting is that making a key more than a key either by integrating derived values, making values based on assumptions about business rules tie down the flexibility of a database design because if the business rules change, so will the values such as Primary Keys or Joining Key values.
从这篇文章中得出的一个重要教训是,通过整合派生值来创建密钥而不是密钥,根据业务规则的假设创建值会降低数据库设计的灵活性,因为如果业务规则发生变化,那么值也会变化例如主键或连接键值。
As the developer or maintainer of the business application, you (or your team) are better able to support the prevailing business conditions if you have taken control and own the parts that represent the internal structure of the business application itself. The primary key might never actually present itself in a customer or user-facing situation, but it should be protected so that the relations it represents does not change with time.
作为业务应用程序的开发人员或维护人员,如果您已控制并拥有代表业务应用程序本身内部结构的部分,则您(或您的团队)能够更好地支持当前的业务条件。主键实际上可能永远不会出现在客户或面向用户的情况中,但它应该受到保护,以便它所代表的关系不会随时间而变化。
Special Thanks:
Image Credit from the popular Baby Names page of 2012:
来自2012年流行婴儿名字页面的图片来源: