PostgreSQL独特约束中的多个可空列

时间:2022-04-03 04:28:04

We have a legacy database schema that has some interesting design decisions. Until recently, we have only supported Oracle and SQL Server, but we are trying to add support for PostgreSQL, which has brought up an interesting problem. I have searched Stack Overflow and the rest of the internet and I don't believe this particular situation is a duplicate.

我们有一个遗留数据库模式,它有一些有趣的设计决策。直到最近,我们才支持Oracle和SQL Server,但我们正在尝试添加对PostgreSQL的支持,这引发了一个有趣的问题。我搜索过Stack Overflow和其他互联网,我不相信这种特殊情况是重复的。

Oracle and SQL Server both behave the same when it comes to nullable columns in a unique constraint, which is to essentially ignore the columns that are NULL when performing the unique check.

对于唯一约束中的可空列,Oracle和SQL Server的行为相同,这实际上是在执行唯一检查时忽略NULL列。

Let's say I have the following table and constraint:

假设我有以下表格和约束:

CREATE TABLE EXAMPLE
(
    ID TEXT NOT NULL PRIMARY KEY,
    FIELD1 TEXT NULL,
    FIELD2 TEXT NULL,
    FIELD3 TEXT NULL,
    FIELD4 TEXT NULL,
    FIELD5 TEXT NULL,
    ...
);

CREATE UNIQUE INDEX EXAMPLE_INDEX ON EXAMPLE
(
    FIELD1 ASC,
    FIELD2 ASC,
    FIELD3 ASC,
    FIELD4 ASC,
    FIELD5 ASC
);

On both Oracle and SQL Server, leaving any of the nullable columns NULL will result in only performing a uniqueness check on the non-null columns. So the following inserts can only be done once:

在Oracle和SQL Server上,将任何可为空的列保留为NULL将导致仅对非空列执行唯一性检查。因此,以下插入只能执行一次:

INSERT INTO EXAMPLE VALUES ('1','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('2','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');
-- These will succeed when they should violate the unique constraint:
INSERT INTO EXAMPLE VALUES ('3','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('4','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');

However, because PostgreSQL (correctly) adheres to the SQL Standard, those insertions (and any other combination of values as long as one of them is NULL) will not throw an error and be inserted correctly no problem. Unfortunately, because of our legacy schema and the supporting code, we need PostgreSQL to behave the same as SQL Server and Oracle.

但是,因为PostgreSQL(正确)遵守SQL标准,那些插入(以及任何其他值的组合,只要其中一个为NULL)将不会抛出错误并正确插入没有问题。不幸的是,由于我们的遗留架构和支持代码,我们需要PostgreSQL的行为与SQL Server和Oracle相同。

I am aware of the following Stack Overflow question and its answers: Create unique constraint with null columns. From my understanding, there are two strategies to solve this problem:

我知道以下Stack Overflow问题及其答案:使用空列创建唯一约束。根据我的理解,有两种策略可以解决这个问题:

  1. Create partial indexes that describe the index in cases where the nullable columns are both NULL and NOT NULL (which results in exponential growth of the number of partial indexes)
  2. 在可空列为NULL和NOT NULL(导致部分索引数量呈指数增长)的情况下,创建描述索引的部分索引
  3. Use COAELSCE with a sentinel value on the nullable columns in the index.
  4. 将COAELSCE与索引中可为空的列上的标记值一起使用。

The problem with (1) is that the number of partial indexes we'd need to create grows exponentially with each additional nullable column we'd like to add to the constraint (2^N if I am not mistaken). The problems with (2) are that sentinel values reduces the number of available values for that column and all of the potential performance problems.

(1)的问题是我们需要创建的部分索引的数量随着我们想要添加到约束的每个附加可空列而呈指数增长(如果我没有记错,则为2 ^ N)。 (2)的问题是sentinel值减少了该列的可用值的数量和所有潜在的性能问题。

My question: are these the only two solutions to this problem? If so, what are the tradeoffs between them for this particular use case? A good answer would discuss the performance of each solution, the maintainability, how PostgreSQL would utilize these indexes in simple SELECT statements, and any other "gotchas" or things to be aware of. Keep in mind that 5 nullable columns was only for an example; we have some tables in our schema with up to 10 (yes, I cry every time I see it, but it is what it is).

我的问题:这是解决这个问题的唯一两个解决方案吗?如果是这样,对于这个特定的用例,它们之间的权衡是什么?一个好的答案将讨论每个解决方案的性能,可维护性,PostgreSQL如何在简单的SELECT语句中使用这些索引,以及任何其他“陷阱”或要注意的事情。请记住,5个可空列只是一个例子;我们的模式中有一些表格,最多10个(是的,我每次看到它时都会哭,但它就是这样)。

4 个解决方案

#1


6  

You are striving for compatibility with your existing Oracle and SQL Server implementations.
Here is a presentation comparing physical row storage formats of the three involved RDBS.

您正在努力与现有的Oracle和SQL Server实现兼容。这是一个比较三个相关RDBS的物理行存储格式的演示文稿。

Since Oracle does not implement NULL values at all in row storage, it can't tell the difference between an empty string and NULL anyway. So wouldn't it be prudent to use empty strings ('') instead of NULL values in Postgres as well - for this particular use case?

由于Oracle在行存储中根本没有实现NULL值,因此无论如何它都无法区分空字符串和NULL。因此,对于这个特定的用例,在Postgres中使用空字符串('')而不是NULL值是不谨慎的?

Define columns included in the unique constraint as NOT NULL DEFAULT '', problem solved:

将唯一约束中包含的列定义为NOT NULL DEFAULT'',问题已解决:

CREATE TABLE example (
   example_id serial PRIMARY KEY
 , field1 text NOT NULL DEFAULT ''
 , field2 text NOT NULL DEFAULT ''
 , field3 text NOT NULL DEFAULT ''
 , field4 text NOT NULL DEFAULT ''
 , field5 text NOT NULL DEFAULT ''
 , CONSTRAINT example_index UNIQUE (field1, field2, field3, field4, field5)
);

Notes

  • What you demonstrate in the question is a unique index:

    您在问题中展示的是一个独特的索引:

    CREATE UNIQUE INDEX ...
    

    not the unique constraint you keep talking about. There are subtle, important differences!

    不是你一直在谈论的独特约束。有微妙的,重要的差异!

    I changed that to an actual constraint like you made it the subject of the post.

    我把它改成了一个实际的约束,比如你把它作为帖子的主题。

  • The keyword ASC is just noise, since that is the default sort order. I left it away.

    关键字ASC只是噪音,因为这是默认的排序顺序。我离开了。

  • Using a serial PK column for simplicity which is totally optional but typically better than numbers stored as text.

    使用串行PK列来简化,这是完全可选的,但通常比存储为文本的数字更好。

Working with it

Just omit empty / null fields from the INSERT:

只需省略INSERT中的空/空字段:

INSERT INTO example(field1) VALUES ('F1_DATA');
INSERT INTO example(field1, field2, field5) VALUES ('F1_DATA', 'F2_DATA', 'F5_DATA');

Repeating any of theses inserts would violate the unique constraint.

重复任何这些插入都会违反唯一约束。

Or if you insist on omitting target columns (which is a bit of antipattern in persisted INSERT statements):
Or for bulk inserts where all columns need to be listed:

或者,如果您坚持省略目标列(在持久性INSERT语句中有一些反模式):或者对于需要列出所有列的批量插入:

INSERT INTO example VALUES
  ('1', 'F1_DATA', DEFAULT, DEFAULT, DEFAULT, DEFAULT)
, ('2', 'F1_DATA','F2_DATA', DEFAULT, DEFAULT,'F5_DATA');

Or simply:

或者干脆:

INSERT INTO example VALUES
  ('1', 'F1_DATA', '', '', '', '')
, ('2', 'F1_DATA','F2_DATA', '', '','F5_DATA');

Or you can write a trigger BEFORE INSERT OR UPDATE that converts NULL to ''.

或者您可以在INSERT或UPDATE之前编写一个触发器,将NULL转换为''。

Alternative solutions

If you need to use actual NULL values I would suggest the unique index with COALESCE like you mentioned as option (2) and @wildplasser provided as his last example.

如果你需要使用实际的NULL值,我会建议使用COALESCE的唯一索引,如你提到的选项(2)和@wildplasser作为他的最后一个例子。

The index on an array like @Rudolfo presented is simple, but considerably more expensive. Array handling isn't very cheap in Postgres and there is an array overhead similar to that of a row (24 bytes):

像@Rudolfo这样的数组索引很简单,但要贵得多。 Postgres中的数组处理不是很便宜,并且存在类似于行(24字节)的数组开销:

Arrays are limited to columns of the same data type. You could cast all columns to text if some are not, but it will typically further increase storage requirements. Or you could use a well-known row type for heterogeneous data types ...

数组仅限于相同数据类型的列。如果某些列没有,您可以将所有列强制转换为文本,但这通常会进一步增加存储要求。或者您可以使用众所周知的行类型来处理异构数据类型...

A corner case: array (or row) types with all NULL values are considered equal (!), so there can only be 1 row with all involved columns NULL. May or may not be as desired. If you want to disallow all columns NULL:

一个极端情况:具有所有NULL值的数组(或行)类型被认为是相等的(!),因此只有1行所有涉及的列为NULL。可能是也可能不是。如果要禁止所有列NULL:

#2


5  

Third method: use IS NOT DISTINCT FROM insted of = for comparing the key columns. (This could make use of the existing index on the candidate natural key) Example (look at the last column)

第三种方法:使用IS NOT DISTINCT FROM insted =来比较关键列。 (这可以利用候选自然键上的现有索引)示例(查看最后一列)

SELECT *
    , EXISTS (SELECT * FROM example x
     WHERE x.FIELD1 IS NOT DISTINCT FROM e.FIELD1
     AND x.FIELD2 IS NOT DISTINCT FROM e.FIELD2
     AND x.FIELD3 IS NOT DISTINCT FROM e.FIELD3
     AND x.FIELD4 IS NOT DISTINCT FROM e.FIELD4
     AND x.FIELD5 IS NOT DISTINCT FROM e.FIELD5
     AND x.ID <> e.ID
    ) other_exists
FROM example e
    ;

Next step would be to put that into a trigger function, and put a trigger on it. (don't have the time now, maybe later)

下一步是将其置于触发器功能中,并在其上设置触发器。 (现在没时间,也许以后)


And here is the trigger-function (which is not perfect yet, but appears to work):

这里是触发器功能(它还不完美,但似乎有效):


CREATE FUNCTION example_check() RETURNS trigger AS $func$
BEGIN
    -- Check that empname and salary are given
    IF EXISTS (
     SELECT 666 FROM example x
     WHERE x.FIELD1 IS NOT DISTINCT FROM NEW.FIELD1
     AND x.FIELD2 IS NOT DISTINCT FROM NEW.FIELD2
     AND x.FIELD3 IS NOT DISTINCT FROM NEW.FIELD3
     AND x.FIELD4 IS NOT DISTINCT FROM NEW.FIELD4
     AND x.FIELD5 IS NOT DISTINCT FROM NEW.FIELD5
     AND x.ID <> NEW.ID
            ) THEN
        RAISE EXCEPTION 'MultiLul BV';
    END IF;


    RETURN NEW;
END;
$func$ LANGUAGE plpgsql;

CREATE TRIGGER example_check BEFORE INSERT OR UPDATE ON example
  FOR EACH ROW EXECUTE PROCEDURE example_check();

UPDATE: a unique index can sometimes be wrapped into a constraint (see postgres-9.4 docs, final example ) You do need to invent a sentinel value; I used the empty string '' here.

更新:有时可以将唯一索引包装到约束中(请参阅postgres-9.4 docs,最后示例)您需要创建一个sentinel值;我在这里使用了空字符串''。


CREATE UNIQUE INDEX ex_12345 ON example
        (coalesce(FIELD1, '')
        , coalesce(FIELD2, '')
        , coalesce(FIELD3, '')
        , coalesce(FIELD4, '')
        , coalesce(FIELD5, '')
        )
        ;

ALTER TABLE example
        ADD CONSTRAINT con_ex_12345
        USING INDEX ex_12345;

But the "functional" index on coalesce() is not allowed in this construct. The unique index (OP's option 2) still works, though:

但是这个构造中不允许coalesce()的“功能”索引。尽管如此,唯一索引(OP的选项2)仍然有效:


ERROR:  index "ex_12345" contains expressions
LINE 2:  ADD CONSTRAINT con_ex_12345
             ^
DETAIL:  Cannot create a primary key or unique constraint using such an index.
INSERT 0 1
INSERT 0 1
ERROR:  duplicate key value violates unique constraint "ex_12345"

#3


3  

This actually worked well for me:

这对我来说效果很好:

CREATE UNIQUE INDEX index_name ON table_name ((
   ARRAY[field1, field2, field3, field4]
));

I don't know about how performance is affected, but it should be close to ideal (depending on how well optimized arrays are in postres)

我不知道性能如何受到影响,但它应该接近理想状态(取决于优化的数组在postres中的程度)

#4


0  

You can create a rule to insert ALL NULL values instead of original table to partitions like partition_field1_nullable, partition_fiend2_nullable, etc. This way you create a unique index on original table only (with no nulls). This will allow you to insert not null only to orig table (having uniqness), and as many not null (and not unique accordingly) values to "nullable partitions". And you can apply COALESCE or trigger method against nullable partitions only, to avoid many scattered partial indexes and trigger against every DML on original table...

您可以创建一个规则,将所有NULL值而不是原始表插入分区,如partition_field1_nullable,partition_fiend2_nullable等。这样,您只能在原始表上创建唯一索引(没有空值)。这将允许您仅向orig表(具有uniqness)插入非空值,并且将“非空”(而不是唯一的)值插入“可空分区”。并且您可以仅对可空分区应用COALESCE或触发器方法,以避免许多分散的部分索引并触发原始表上的每个DML ...

#1


6  

You are striving for compatibility with your existing Oracle and SQL Server implementations.
Here is a presentation comparing physical row storage formats of the three involved RDBS.

您正在努力与现有的Oracle和SQL Server实现兼容。这是一个比较三个相关RDBS的物理行存储格式的演示文稿。

Since Oracle does not implement NULL values at all in row storage, it can't tell the difference between an empty string and NULL anyway. So wouldn't it be prudent to use empty strings ('') instead of NULL values in Postgres as well - for this particular use case?

由于Oracle在行存储中根本没有实现NULL值,因此无论如何它都无法区分空字符串和NULL。因此,对于这个特定的用例,在Postgres中使用空字符串('')而不是NULL值是不谨慎的?

Define columns included in the unique constraint as NOT NULL DEFAULT '', problem solved:

将唯一约束中包含的列定义为NOT NULL DEFAULT'',问题已解决:

CREATE TABLE example (
   example_id serial PRIMARY KEY
 , field1 text NOT NULL DEFAULT ''
 , field2 text NOT NULL DEFAULT ''
 , field3 text NOT NULL DEFAULT ''
 , field4 text NOT NULL DEFAULT ''
 , field5 text NOT NULL DEFAULT ''
 , CONSTRAINT example_index UNIQUE (field1, field2, field3, field4, field5)
);

Notes

  • What you demonstrate in the question is a unique index:

    您在问题中展示的是一个独特的索引:

    CREATE UNIQUE INDEX ...
    

    not the unique constraint you keep talking about. There are subtle, important differences!

    不是你一直在谈论的独特约束。有微妙的,重要的差异!

    I changed that to an actual constraint like you made it the subject of the post.

    我把它改成了一个实际的约束,比如你把它作为帖子的主题。

  • The keyword ASC is just noise, since that is the default sort order. I left it away.

    关键字ASC只是噪音,因为这是默认的排序顺序。我离开了。

  • Using a serial PK column for simplicity which is totally optional but typically better than numbers stored as text.

    使用串行PK列来简化,这是完全可选的,但通常比存储为文本的数字更好。

Working with it

Just omit empty / null fields from the INSERT:

只需省略INSERT中的空/空字段:

INSERT INTO example(field1) VALUES ('F1_DATA');
INSERT INTO example(field1, field2, field5) VALUES ('F1_DATA', 'F2_DATA', 'F5_DATA');

Repeating any of theses inserts would violate the unique constraint.

重复任何这些插入都会违反唯一约束。

Or if you insist on omitting target columns (which is a bit of antipattern in persisted INSERT statements):
Or for bulk inserts where all columns need to be listed:

或者,如果您坚持省略目标列(在持久性INSERT语句中有一些反模式):或者对于需要列出所有列的批量插入:

INSERT INTO example VALUES
  ('1', 'F1_DATA', DEFAULT, DEFAULT, DEFAULT, DEFAULT)
, ('2', 'F1_DATA','F2_DATA', DEFAULT, DEFAULT,'F5_DATA');

Or simply:

或者干脆:

INSERT INTO example VALUES
  ('1', 'F1_DATA', '', '', '', '')
, ('2', 'F1_DATA','F2_DATA', '', '','F5_DATA');

Or you can write a trigger BEFORE INSERT OR UPDATE that converts NULL to ''.

或者您可以在INSERT或UPDATE之前编写一个触发器,将NULL转换为''。

Alternative solutions

If you need to use actual NULL values I would suggest the unique index with COALESCE like you mentioned as option (2) and @wildplasser provided as his last example.

如果你需要使用实际的NULL值,我会建议使用COALESCE的唯一索引,如你提到的选项(2)和@wildplasser作为他的最后一个例子。

The index on an array like @Rudolfo presented is simple, but considerably more expensive. Array handling isn't very cheap in Postgres and there is an array overhead similar to that of a row (24 bytes):

像@Rudolfo这样的数组索引很简单,但要贵得多。 Postgres中的数组处理不是很便宜,并且存在类似于行(24字节)的数组开销:

Arrays are limited to columns of the same data type. You could cast all columns to text if some are not, but it will typically further increase storage requirements. Or you could use a well-known row type for heterogeneous data types ...

数组仅限于相同数据类型的列。如果某些列没有,您可以将所有列强制转换为文本,但这通常会进一步增加存储要求。或者您可以使用众所周知的行类型来处理异构数据类型...

A corner case: array (or row) types with all NULL values are considered equal (!), so there can only be 1 row with all involved columns NULL. May or may not be as desired. If you want to disallow all columns NULL:

一个极端情况:具有所有NULL值的数组(或行)类型被认为是相等的(!),因此只有1行所有涉及的列为NULL。可能是也可能不是。如果要禁止所有列NULL:

#2


5  

Third method: use IS NOT DISTINCT FROM insted of = for comparing the key columns. (This could make use of the existing index on the candidate natural key) Example (look at the last column)

第三种方法:使用IS NOT DISTINCT FROM insted =来比较关键列。 (这可以利用候选自然键上的现有索引)示例(查看最后一列)

SELECT *
    , EXISTS (SELECT * FROM example x
     WHERE x.FIELD1 IS NOT DISTINCT FROM e.FIELD1
     AND x.FIELD2 IS NOT DISTINCT FROM e.FIELD2
     AND x.FIELD3 IS NOT DISTINCT FROM e.FIELD3
     AND x.FIELD4 IS NOT DISTINCT FROM e.FIELD4
     AND x.FIELD5 IS NOT DISTINCT FROM e.FIELD5
     AND x.ID <> e.ID
    ) other_exists
FROM example e
    ;

Next step would be to put that into a trigger function, and put a trigger on it. (don't have the time now, maybe later)

下一步是将其置于触发器功能中,并在其上设置触发器。 (现在没时间,也许以后)


And here is the trigger-function (which is not perfect yet, but appears to work):

这里是触发器功能(它还不完美,但似乎有效):


CREATE FUNCTION example_check() RETURNS trigger AS $func$
BEGIN
    -- Check that empname and salary are given
    IF EXISTS (
     SELECT 666 FROM example x
     WHERE x.FIELD1 IS NOT DISTINCT FROM NEW.FIELD1
     AND x.FIELD2 IS NOT DISTINCT FROM NEW.FIELD2
     AND x.FIELD3 IS NOT DISTINCT FROM NEW.FIELD3
     AND x.FIELD4 IS NOT DISTINCT FROM NEW.FIELD4
     AND x.FIELD5 IS NOT DISTINCT FROM NEW.FIELD5
     AND x.ID <> NEW.ID
            ) THEN
        RAISE EXCEPTION 'MultiLul BV';
    END IF;


    RETURN NEW;
END;
$func$ LANGUAGE plpgsql;

CREATE TRIGGER example_check BEFORE INSERT OR UPDATE ON example
  FOR EACH ROW EXECUTE PROCEDURE example_check();

UPDATE: a unique index can sometimes be wrapped into a constraint (see postgres-9.4 docs, final example ) You do need to invent a sentinel value; I used the empty string '' here.

更新:有时可以将唯一索引包装到约束中(请参阅postgres-9.4 docs,最后示例)您需要创建一个sentinel值;我在这里使用了空字符串''。


CREATE UNIQUE INDEX ex_12345 ON example
        (coalesce(FIELD1, '')
        , coalesce(FIELD2, '')
        , coalesce(FIELD3, '')
        , coalesce(FIELD4, '')
        , coalesce(FIELD5, '')
        )
        ;

ALTER TABLE example
        ADD CONSTRAINT con_ex_12345
        USING INDEX ex_12345;

But the "functional" index on coalesce() is not allowed in this construct. The unique index (OP's option 2) still works, though:

但是这个构造中不允许coalesce()的“功能”索引。尽管如此,唯一索引(OP的选项2)仍然有效:


ERROR:  index "ex_12345" contains expressions
LINE 2:  ADD CONSTRAINT con_ex_12345
             ^
DETAIL:  Cannot create a primary key or unique constraint using such an index.
INSERT 0 1
INSERT 0 1
ERROR:  duplicate key value violates unique constraint "ex_12345"

#3


3  

This actually worked well for me:

这对我来说效果很好:

CREATE UNIQUE INDEX index_name ON table_name ((
   ARRAY[field1, field2, field3, field4]
));

I don't know about how performance is affected, but it should be close to ideal (depending on how well optimized arrays are in postres)

我不知道性能如何受到影响,但它应该接近理想状态(取决于优化的数组在postres中的程度)

#4


0  

You can create a rule to insert ALL NULL values instead of original table to partitions like partition_field1_nullable, partition_fiend2_nullable, etc. This way you create a unique index on original table only (with no nulls). This will allow you to insert not null only to orig table (having uniqness), and as many not null (and not unique accordingly) values to "nullable partitions". And you can apply COALESCE or trigger method against nullable partitions only, to avoid many scattered partial indexes and trigger against every DML on original table...

您可以创建一个规则,将所有NULL值而不是原始表插入分区,如partition_field1_nullable,partition_fiend2_nullable等。这样,您只能在原始表上创建唯一索引(没有空值)。这将允许您仅向orig表(具有uniqness)插入非空值,并且将“非空”(而不是唯一的)值插入“可空分区”。并且您可以仅对可空分区应用COALESCE或触发器方法,以避免许多分散的部分索引并触发原始表上的每个DML ...