What are the pros/cons from a performance/indexing/data management perspective of creating a one-to-one relationship between tables using the primary key on the child as foreign key, versus a pure surrogate primary key on the child? The first approach seems to reduce redundancy and nicely constrains the one-to-one implicitly, while the second approach seems to be favored by DBAs, even though it creates a second index:
从性能/索引/数据管理的角度来看,使用子表上的主键作为外键在表之间创建一对一的关系,与使用子表上的纯代理主键相比,有什么优缺点?第一个方法似乎减少了冗余,并巧妙地限制了一对一的方式,而第二种方法似乎得到了dba的青睐,尽管它创建了第二个索引:
create table parent (
id integer primary key,
data varchar(50)
)
create table child (
id integer primary key references parent(id),
data varchar(50)
)
pure surrogate key:
纯代理键:
create table parent (
id integer primary key,
data varchar(50)
)
create table child (
id integer primary key,
parent_id integer unique references parent(id),
data varchar(50)
)
the platforms of interest here are Postgresql, Microsoft SQL Server.
这里感兴趣的平台是Postgresql、Microsoft SQL Server。
Edit:
编辑:
So here is the basic idea from an actual DBA. The main concern is index fragmentation on the child table. Suppose records with primary keys 1-1000000 are inserted into the parent table, nothing in the child table. Later, ad-hoc operations begin to populate the child table with rows that correspond to those in the parent table, but in a random order. The concern is that this will cause page splits on inserts, cause index fragmentation, and cause the "swiss cheese" effect for deletes. I will admit that these are not terms I am deeply familiar with, and when googling for them, the hits seem to be all Microsoft SQL server related. Are these MS-specific concerns (i.e., does PG's ANALYZE and such mitigate the issue on PG)? If so then this is yet another reason to use a database like Postgresql.
这就是DBA的基本思想。主要关注的是子表上的索引碎片。假设将主键为1-1000000的记录插入到父表中,子表中没有任何项。稍后,ad-hoc操作开始用与父表中对应的行填充子表,但顺序是随机的。令人担心的是,这会导致插入的页面分割,导致索引碎片化,并导致“瑞士奶酪”对删除的影响。我得承认,这些术语我不是很熟悉,当我搜索它们时,它们的点击量似乎都是与Microsoft SQL server相关的。这些是特定于ms的关注点吗?,PG的分析和这样的缓解对PG的问题吗?如果是的话,这也是使用Postgresql这样的数据库的另一个原因。
2 个解决方案
#1
5
If it's a strict 1-1 relationship, I see no reason not to use the first option.
如果是严格的1-1关系,我认为没有理由不使用第一个选项。
The second option provides some flexibility to make it a 1-many relationship later though, which is probably why DBAs might favor that option.
第二个选项提供了一些灵活性,使它在以后成为一个1-many关系,这可能是dba喜欢这个选项的原因。
#2
1
First, if you have a 1:1 relationship, there is no problem with the primary key of a table also representing a foreign key to another table and in fact I would suggest that this is the preferred approach.
首先,如果您有一个1:1的关系,那么表的主键也代表另一个表的外键就没有问题,事实上,我认为这是首选的方法。
Second, with any 1:1 relationship, the first question should obviously be whether the relationship is needed as typically you can simply include the columns in the child table into the main table. That said, there are times when a 1:1 relationship obviously makes sense.
其次,对于任何1:1的关系,第一个问题显然应该是是否需要这种关系,因为通常您可以将子表中的列包含到主表中。也就是说,有时1:1的关系显然是有意义的。
#1
5
If it's a strict 1-1 relationship, I see no reason not to use the first option.
如果是严格的1-1关系,我认为没有理由不使用第一个选项。
The second option provides some flexibility to make it a 1-many relationship later though, which is probably why DBAs might favor that option.
第二个选项提供了一些灵活性,使它在以后成为一个1-many关系,这可能是dba喜欢这个选项的原因。
#2
1
First, if you have a 1:1 relationship, there is no problem with the primary key of a table also representing a foreign key to another table and in fact I would suggest that this is the preferred approach.
首先,如果您有一个1:1的关系,那么表的主键也代表另一个表的外键就没有问题,事实上,我认为这是首选的方法。
Second, with any 1:1 relationship, the first question should obviously be whether the relationship is needed as typically you can simply include the columns in the child table into the main table. That said, there are times when a 1:1 relationship obviously makes sense.
其次,对于任何1:1的关系,第一个问题显然应该是是否需要这种关系,因为通常您可以将子表中的列包含到主表中。也就是说,有时1:1的关系显然是有意义的。