复合主键中的可空列有什么问题?

时间:2022-09-10 12:32:26

ORACLE does not permit NULL values in any of the columns that comprise a primary key. It appears that the same is true of most other "enterprise-level" systems.

ORACLE不允许包含主键的任何列中有空值。似乎大多数其他“企业级”系统也是如此。

At the same time, most systems also allow unique contraints on nullable columns.

与此同时,大多数系统还允许对可空列进行独特的限制。

Why is it that unique constraints can have NULLs but primary keys can not? Is there a fundamental logical reason for this, or is this more of a technical limitation?

为什么唯一的约束可以有null而主键不能?这是否有一个基本的逻辑原因,或者这更多的是技术限制?

6 个解决方案

#1


172  

Primary keys are for uniquely identifying rows. This is done by comparing all parts of a key to the input.

主键用于唯一标识行。这是通过比较输入的键的所有部分来完成的。

Per definition, NULL cannot be part of a successful comparison. Even a comparison to itself (NULL = NULL) will fail. This means a key containing NULL would not work.

对于每个定义,NULL都不是成功比较的一部分。甚至与自身的比较(NULL = NULL)也会失败。这意味着包含NULL的键不能工作。

Additonally, NULL is allowed in a foreign key, to mark an optional relationship.(*) Allowing it in the PK as well would break this.

另外,在外键中可以使用NULL来标记一个可选的关系。


(*)A word of caution: Having nullable foreign keys is not clean relational database design.

(*)注意:具有可空外键并不是干净的关系数据库设计。

If there are two entities A and B where A can optionally be related to B, the clean solution is to create a resolution table (let's say AB). That table would link A with B: If there is a relationship then it would contain a record, if there isn't then it would not.

如果有两个实体A和B,其中A可选地与B相关,那么干净的解决方案是创建一个解析表(比如AB)。该表将A与B连接:如果有关系,那么它将包含一条记录,如果没有,那么它将不会。

#2


53  

A primary key defines a unique identifier for every row in a table: when a table has a primary key, you have a guranteed way to select any row from it.

主键为表中的每一行定义一个唯一的标识符:当一个表有一个主键时,您可以使用一种有保证的方式从中选择任何一行。

A unique constraint does not necessarily identify every row; it just specifies that if a row has values in its columns, then they must be unique. This is not sufficient to uniquely identify every row, which is what a primary key must do.

唯一约束不一定标识每一行;它只是指定如果一行在其列中有值,那么它们必须是惟一的。这不足以唯一标识每一行,这是主键必须做的事情。

#3


39  

Fundamentally speaking nothing is wrong with a NULL in a multi-column primary key. But having one has implications the designer likely did not intend, which is why many systems throw an error when you try this.

从根本上说,在多列主键中NULL是错误的。但是,如果有一个这样的系统,设计者很可能并不打算这样做,这就是为什么当您尝试这个系统时,会抛出一个错误。

Consider the case of module/package versions stored as a series of fields:

考虑将模块/包版本存储为一系列字段的情况:

CREATE TABLE module
  (name        varchar(20) PRIMARY KEY,
   description text DEFAULT '' NOT NULL);

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20),
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

The first 5 elements of the primary key are regularly defined parts of a release version, but some packages have a customized extension that is usually not an integer (like "rc-foo" or "vanilla" or "beta" or whatever else someone for whom four fields is insufficient might dream up). If a package does not have an extension, then it is NULL in the above model, and no harm would be done by leaving things that way.

主键的前5个元素通常被定义为发布版本的部分,但是有些包有一个自定义的扩展,通常不是整数(比如“rc-foo”或“vanilla”或“beta”,或者其他任何一个对四个字段不足够的人可能会想到的东西)。如果一个包没有扩展,那么在上面的模型中它是空的,这样就不会有任何伤害。

But what is a NULL? It is supposed to represent a lack of information, an unknown. That said, perhaps this makes more sense:

但什么是零呢?它被认为是缺乏信息,一个未知的东西。也就是说,也许这更有意义:

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20) DEFAULT '' NOT NULL,
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

In this version the "ext" part of the tuple is NOT NULL but defaults to an empty string -- which is semantically (and practically) different from a NULL. A NULL is an unknown, whereas an empty string is a deliberate definition of "something not being present". In other words, "empty" and "null" are different things. Its the difference between "I don't have a value here" and "I don't know what the value here is."

在这个版本中,tuple的“ext”部分不是NULL,而是默认为空字符串——这在语义上(实际上)与NULL是不同的。空字符串是未知的,而空字符串是“不存在的东西”的故意定义。换句话说,“空”和“空”是不同的东西。这是“我这里没有值”和“我不知道这里的值是多少”的区别。

When you register a package that lacks a version extension you know it lacks an extension, so an empty string is actually the correct value. A NULL would only be correct if you didn't know whether it had an extension or not. This situation is easier to deal with in systems where string values are the norm, because there is no way to represent an "empty integer" other than inserting 0 or 1, which will wind up being rolled up in any comparisons made later (which has its own implications).

当您注册一个缺少版本扩展名的包时,您就知道它没有扩展名,所以一个空字符串实际上是正确的值。只有当你不知道它是否有扩展时,NULL才会是正确的。在以字符串值为范数的系统中,这种情况更容易处理,因为除了插入0或1之外,没有其他方法来表示“空整数”,这将导致在以后进行的任何比较中卷起来(这有它自己的含义)。

Incidentally, both ways are valid in Postgres (since we're discussing "enterprise" RDMBSs), but comparison results can vary quite a bit when you throw a NULL into the mix -- because NULL == "don't know" so all results of a comparison involving a NULL wind up being NULL since you can't know something that is unknown. So this can be a source of subtle bugs when sorting, comparing, etc. Postgres assumes you're an adult and can make this decision for yourself. Oracle and DB2 assume you didn't realize you were doing something silly and throw an error. This is usually the right thing, but not always -- you might actually not know and have a NULL in some cases and therefore leaving a row with an unknown element against which meaningful comparisons are impossible is correct behavior.

Postgres顺便说一下,两种方法都是有效的(因为我们讨论RDMBSs“企业”),但比较的结果可以相当多样化,当你把一个空的——因为NULL = =“不知道”所有的结果比较涉及零风是零,因为你不知道是未知的东西。因此,这可能是在排序、比较等时产生的一个微妙的bug。Postgres假设你是一个成年人,可以自己做这个决定。Oracle和DB2假设您没有意识到您在做一些愚蠢的事情并抛出错误。这通常是正确的,但也不总是正确的——在某些情况下,您可能实际上不知道并且拥有一个NULL,因此留下一行包含一个未知元素,与之进行有意义的比较是不可能的,这就是正确的行为。

In any case you should strive to eliminate the number of NULL fields you permit across the entire schema and doubly so when it comes to fields that are part of a primary key. In the vast majority of cases the presence of NULL columns is an indication of un-normalized (as opposed to deliberately de-normalized) schema design and should be thought very hard about before being accepted.

在任何情况下,您都应该努力消除跨整个模式允许的空字段的数量,当涉及到作为主键一部分的字段时更是如此。在绝大多数情况下,空列的存在表明了非规范化(而不是有意的非规范化)模式设计,在被接受之前应该非常仔细地考虑。

#4


17  

NULL == NULL -> false (at least in DBMSs)

空== NULL -> false(至少在DBMSs中)

So you wouldn't be able to retrieve any relationships using a NULL value even with additional columns with real values.

因此,即使有实值的附加列,也不能使用空值检索任何关系。

#5


3  

The answer by Tony Andrews is a decent one. But the real answer is that this has been a convention used by relational database community and is NOT a necessity. Maybe it is a good convention, maybe not.

托尼·安德鲁斯的答案很不错。但真正的答案是,这是关系数据库社区使用的约定,不是必需的。也许这是一个很好的惯例,也许不是。

Comparing anything to NULL results in UNKNOWN (3rd truth value). So as has been suggested with nulls all traditional wisdom concerning equality goes out the window. Well that's how it seems at first glance.

将任何东西与空值进行比较会导致未知(第3真值)。因此,正如人们所建议的那样,所有关于平等的传统智慧都被打破了。乍一看似乎是这样。

But I don't think this is necessarily so and even SQL databases don't think that NULL destroys all possibility for comparison.

但我不认为这是必然的,甚至SQL数据库也不认为NULL会破坏所有的比较可能性。

Run in your database the query SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

在数据库中运行查询SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

What you see is just one tuple with one attribute that has the value NULL. So the union recognized here the two NULL values as equal.

您看到的只是一个带有一个属性值为NULL的元组。所以这里的union认出这两个零值是相等的。

When comparing a composite key that has 3 components to a tuple with 3 attributes (1, 3, NULL) = (1, 3, NULL) <=> 1 = 1 AND 3 = 3 AND NULL = NULL The result of this is UNKNOWN.

当将具有3个组件的复合键与具有3个属性的元组(1、3、NULL) =(1、3、NULL) <=> 1 = 1和3 = 3,NULL = NULL时,结果是未知的。

But we could define a new kind of comparison operator eg. ==. X == Y <=> X = Y OR (X IS NULL AND Y IS NULL)

但是我们可以定义一种新的比较算子。= =。X = Y <=> X = Y或(X为空,Y为空)

Having this kind of equality operator would make composite keys with null components or non-composite key with null value unproblematic.

具有这种等式运算符将使具有空组件的组合键或具有空值的非组合键变得没有问题。

#6


0  

I still believe this is a fundamental / functional flaw brought about by a technicality. If you have an optional field by which you can identify a customer you now have to hack a dummy value into it, just because NULL != NULL, not particularly elegant yet it is an "industry standard"

我仍然认为这是一个技术上的基础/功能缺陷。如果您有一个可选字段,您可以通过该字段来标识客户,那么您现在必须将一个虚拟值插入其中,因为NULL != NULL,虽然不是特别优雅,但它是一个“行业标准”

#1


172  

Primary keys are for uniquely identifying rows. This is done by comparing all parts of a key to the input.

主键用于唯一标识行。这是通过比较输入的键的所有部分来完成的。

Per definition, NULL cannot be part of a successful comparison. Even a comparison to itself (NULL = NULL) will fail. This means a key containing NULL would not work.

对于每个定义,NULL都不是成功比较的一部分。甚至与自身的比较(NULL = NULL)也会失败。这意味着包含NULL的键不能工作。

Additonally, NULL is allowed in a foreign key, to mark an optional relationship.(*) Allowing it in the PK as well would break this.

另外,在外键中可以使用NULL来标记一个可选的关系。


(*)A word of caution: Having nullable foreign keys is not clean relational database design.

(*)注意:具有可空外键并不是干净的关系数据库设计。

If there are two entities A and B where A can optionally be related to B, the clean solution is to create a resolution table (let's say AB). That table would link A with B: If there is a relationship then it would contain a record, if there isn't then it would not.

如果有两个实体A和B,其中A可选地与B相关,那么干净的解决方案是创建一个解析表(比如AB)。该表将A与B连接:如果有关系,那么它将包含一条记录,如果没有,那么它将不会。

#2


53  

A primary key defines a unique identifier for every row in a table: when a table has a primary key, you have a guranteed way to select any row from it.

主键为表中的每一行定义一个唯一的标识符:当一个表有一个主键时,您可以使用一种有保证的方式从中选择任何一行。

A unique constraint does not necessarily identify every row; it just specifies that if a row has values in its columns, then they must be unique. This is not sufficient to uniquely identify every row, which is what a primary key must do.

唯一约束不一定标识每一行;它只是指定如果一行在其列中有值,那么它们必须是惟一的。这不足以唯一标识每一行,这是主键必须做的事情。

#3


39  

Fundamentally speaking nothing is wrong with a NULL in a multi-column primary key. But having one has implications the designer likely did not intend, which is why many systems throw an error when you try this.

从根本上说,在多列主键中NULL是错误的。但是,如果有一个这样的系统,设计者很可能并不打算这样做,这就是为什么当您尝试这个系统时,会抛出一个错误。

Consider the case of module/package versions stored as a series of fields:

考虑将模块/包版本存储为一系列字段的情况:

CREATE TABLE module
  (name        varchar(20) PRIMARY KEY,
   description text DEFAULT '' NOT NULL);

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20),
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

The first 5 elements of the primary key are regularly defined parts of a release version, but some packages have a customized extension that is usually not an integer (like "rc-foo" or "vanilla" or "beta" or whatever else someone for whom four fields is insufficient might dream up). If a package does not have an extension, then it is NULL in the above model, and no harm would be done by leaving things that way.

主键的前5个元素通常被定义为发布版本的部分,但是有些包有一个自定义的扩展,通常不是整数(比如“rc-foo”或“vanilla”或“beta”,或者其他任何一个对四个字段不足够的人可能会想到的东西)。如果一个包没有扩展,那么在上面的模型中它是空的,这样就不会有任何伤害。

But what is a NULL? It is supposed to represent a lack of information, an unknown. That said, perhaps this makes more sense:

但什么是零呢?它被认为是缺乏信息,一个未知的东西。也就是说,也许这更有意义:

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20) DEFAULT '' NOT NULL,
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

In this version the "ext" part of the tuple is NOT NULL but defaults to an empty string -- which is semantically (and practically) different from a NULL. A NULL is an unknown, whereas an empty string is a deliberate definition of "something not being present". In other words, "empty" and "null" are different things. Its the difference between "I don't have a value here" and "I don't know what the value here is."

在这个版本中,tuple的“ext”部分不是NULL,而是默认为空字符串——这在语义上(实际上)与NULL是不同的。空字符串是未知的,而空字符串是“不存在的东西”的故意定义。换句话说,“空”和“空”是不同的东西。这是“我这里没有值”和“我不知道这里的值是多少”的区别。

When you register a package that lacks a version extension you know it lacks an extension, so an empty string is actually the correct value. A NULL would only be correct if you didn't know whether it had an extension or not. This situation is easier to deal with in systems where string values are the norm, because there is no way to represent an "empty integer" other than inserting 0 or 1, which will wind up being rolled up in any comparisons made later (which has its own implications).

当您注册一个缺少版本扩展名的包时,您就知道它没有扩展名,所以一个空字符串实际上是正确的值。只有当你不知道它是否有扩展时,NULL才会是正确的。在以字符串值为范数的系统中,这种情况更容易处理,因为除了插入0或1之外,没有其他方法来表示“空整数”,这将导致在以后进行的任何比较中卷起来(这有它自己的含义)。

Incidentally, both ways are valid in Postgres (since we're discussing "enterprise" RDMBSs), but comparison results can vary quite a bit when you throw a NULL into the mix -- because NULL == "don't know" so all results of a comparison involving a NULL wind up being NULL since you can't know something that is unknown. So this can be a source of subtle bugs when sorting, comparing, etc. Postgres assumes you're an adult and can make this decision for yourself. Oracle and DB2 assume you didn't realize you were doing something silly and throw an error. This is usually the right thing, but not always -- you might actually not know and have a NULL in some cases and therefore leaving a row with an unknown element against which meaningful comparisons are impossible is correct behavior.

Postgres顺便说一下,两种方法都是有效的(因为我们讨论RDMBSs“企业”),但比较的结果可以相当多样化,当你把一个空的——因为NULL = =“不知道”所有的结果比较涉及零风是零,因为你不知道是未知的东西。因此,这可能是在排序、比较等时产生的一个微妙的bug。Postgres假设你是一个成年人,可以自己做这个决定。Oracle和DB2假设您没有意识到您在做一些愚蠢的事情并抛出错误。这通常是正确的,但也不总是正确的——在某些情况下,您可能实际上不知道并且拥有一个NULL,因此留下一行包含一个未知元素,与之进行有意义的比较是不可能的,这就是正确的行为。

In any case you should strive to eliminate the number of NULL fields you permit across the entire schema and doubly so when it comes to fields that are part of a primary key. In the vast majority of cases the presence of NULL columns is an indication of un-normalized (as opposed to deliberately de-normalized) schema design and should be thought very hard about before being accepted.

在任何情况下,您都应该努力消除跨整个模式允许的空字段的数量,当涉及到作为主键一部分的字段时更是如此。在绝大多数情况下,空列的存在表明了非规范化(而不是有意的非规范化)模式设计,在被接受之前应该非常仔细地考虑。

#4


17  

NULL == NULL -> false (at least in DBMSs)

空== NULL -> false(至少在DBMSs中)

So you wouldn't be able to retrieve any relationships using a NULL value even with additional columns with real values.

因此,即使有实值的附加列,也不能使用空值检索任何关系。

#5


3  

The answer by Tony Andrews is a decent one. But the real answer is that this has been a convention used by relational database community and is NOT a necessity. Maybe it is a good convention, maybe not.

托尼·安德鲁斯的答案很不错。但真正的答案是,这是关系数据库社区使用的约定,不是必需的。也许这是一个很好的惯例,也许不是。

Comparing anything to NULL results in UNKNOWN (3rd truth value). So as has been suggested with nulls all traditional wisdom concerning equality goes out the window. Well that's how it seems at first glance.

将任何东西与空值进行比较会导致未知(第3真值)。因此,正如人们所建议的那样,所有关于平等的传统智慧都被打破了。乍一看似乎是这样。

But I don't think this is necessarily so and even SQL databases don't think that NULL destroys all possibility for comparison.

但我不认为这是必然的,甚至SQL数据库也不认为NULL会破坏所有的比较可能性。

Run in your database the query SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

在数据库中运行查询SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

What you see is just one tuple with one attribute that has the value NULL. So the union recognized here the two NULL values as equal.

您看到的只是一个带有一个属性值为NULL的元组。所以这里的union认出这两个零值是相等的。

When comparing a composite key that has 3 components to a tuple with 3 attributes (1, 3, NULL) = (1, 3, NULL) <=> 1 = 1 AND 3 = 3 AND NULL = NULL The result of this is UNKNOWN.

当将具有3个组件的复合键与具有3个属性的元组(1、3、NULL) =(1、3、NULL) <=> 1 = 1和3 = 3,NULL = NULL时,结果是未知的。

But we could define a new kind of comparison operator eg. ==. X == Y <=> X = Y OR (X IS NULL AND Y IS NULL)

但是我们可以定义一种新的比较算子。= =。X = Y <=> X = Y或(X为空,Y为空)

Having this kind of equality operator would make composite keys with null components or non-composite key with null value unproblematic.

具有这种等式运算符将使具有空组件的组合键或具有空值的非组合键变得没有问题。

#6


0  

I still believe this is a fundamental / functional flaw brought about by a technicality. If you have an optional field by which you can identify a customer you now have to hack a dummy value into it, just because NULL != NULL, not particularly elegant yet it is an "industry standard"

我仍然认为这是一个技术上的基础/功能缺陷。如果您有一个可选字段,您可以通过该字段来标识客户,那么您现在必须将一个虚拟值插入其中,因为NULL != NULL,虽然不是特别优雅,但它是一个“行业标准”