如何在postgresql中模拟“插入忽略”和“on duplicate key update”(sql merge) ?

时间:2021-12-14 09:26:40

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.

有些SQL服务器有一个特性,如果插入违反主/惟一键约束,则跳过它。例如,MySQL有INSERT IGNORE。

What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?

模拟插入忽略的最佳方式是什么,以及使用PostgreSQL的重复键更新?

11 个解决方案

#1


28  

Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.

尝试做一个更新。如果它不修改任何表示它不存在的行,那么进行插入。显然,这是在事务中进行的。

You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.

如果不想在客户端上放置额外的代码,当然可以将其封装到函数中。你还需要一个循环,在这个想法中非常罕见的种族条件。

There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.

在文档中有一个这样的例子:http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html,示例40-2在底部。

That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.

这通常是最简单的方法。你可以用规则做一些魔术,但它可能会更混乱。我建议在任何一天都使用包装功能方法。

This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)

这适用于单个行或少数行值。如果要处理大量的行(例如来自子查询的行),最好将其分为两个查询,一个用于插入,一个用于更新(当然作为适当的连接/子选择—不需要编写主过滤器两次)

#2


94  

With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):

对于PostgreSQL 9.5,这是现在的本地功能(就像MySQL几年前的功能一样):

INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")

插入……关于冲突,什么都不做/更新(“UPSERT”)

9.5 brings support for "UPSERT" operations. INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.

9.5支持“维护”操作。插入扩展为接受一个ON CONFLICT DO UPDATE/IGNORE子句。该条款指定了在可能重复违反的情况下要采取的另一项行动。

...

Further example of new syntax:

新语法的进一步例子:

INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1) 
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;

#3


91  

Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!

编辑:如果你错过了warren的答案,PG9.5现在有了这个本地版本;时间升级!


Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):

以Bill Karwin的回答为基础,阐明基于规则的方法是什么样子的(从同一个数据库中的另一个模式转移,并使用多列主键):

CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
  WHERE EXISTS(SELECT 1 FROM my_table 
                WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
  DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";

Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.

注意:该规则适用于所有的插入操作,直到规则被删除,所以不是非常特别。

#4


22  

To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.

要获得插入忽略逻辑,您可以执行如下操作。我发现简单地从一个文本值的select语句中插入效果最好,然后您可以用一个不存在的子句来屏蔽重复的键。为了获得对重复逻辑的更新,我怀疑有必要使用pl/pgsql循环。

INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
  ('935',' Citroën Brazil','Citroën'),
  ('ABC', 'Toyota', 'Toyota'),
  ('ZOM',' OM','OM')
  ) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
  WHERE NOT EXISTS (
    --ignore anything that has already been inserted
    SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)

#5


18  

INSERT INTO mytable(col1,col2) 
    SELECT 'val1','val2' 
    WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')

#6


12  

Looks like PostgreSQL supports a schema object called a rule.

看起来PostgreSQL支持一个称为规则的模式对象。

http://www.postgresql.org/docs/current/static/rules-update.html

http://www.postgresql.org/docs/current/static/rules-update.html

You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.

您可以为给定的表创建一个INSERT的规则,如果一行存在给定的主键值,那么它将不做任何事情,或者如果一行存在给定的主键值,那么它将进行更新,而不是插入。

I haven't tried this myself, so I can't speak from experience or offer an example.

我自己还没有尝试过,所以我不能从经验中说话,也不能举个例子。

#7


12  

For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:

对于那些有9.5或更高的Postgres的人来说,新的关于冲突的语法应该不会起作用:

INSERT INTO target_table (field_one, field_two, field_three ) 
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;

For those of us who have an earlier version, this right join will work instead:

对于我们这些有较早版本的人来说,这个正确的连接将会起作用:

INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table 
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;

#8


2  

This solution avoids using rules:

此解决方案避免使用规则:

BEGIN
   INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION 
   WHEN unique_violation THEN
     UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;

but it has a performance drawback (see PostgreSQL.org):

但是它有一个性能缺陷(参见PostgreSQL.org):

A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need.

包含异常子句的块比没有异常子句的块要昂贵得多。因此,不需要就不要使用异常。

#9


1  

On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.

在批量上,您可以在插入前删除行。删除不存在的行不会导致错误,因此它被安全地跳过。

#10


1  

As @hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:

正如@hanmari在他的评论中提到的。当插入到postgres表时,on conflict(.. .)是不插入重复数据的最佳代码。

query = "INSERT INTO db_table_name(column_name)
         VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"

The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table. I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1. Example:

冲突代码行允许insert语句仍然插入数据行。查询和值代码是将日期从Excel插入到postgres db表的示例。我在postgres表中添加了一些约束,以确保ID字段是惟一的。我没有对相同的数据行执行删除操作,而是添加了一行sql代码,从1开始重新对ID列进行编号。例子:

q = 'ALTER id_column serial RESTART WITH 1'

If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial. I hope this information is helpful to everyone. *I have no college degree in software development/coding. Everything I know in coding, I study on my own.

如果我的数据有一个ID字段,我不使用它作为主ID/串行ID,我创建一个ID列并将其设置为串行。我希望这些信息对大家都有帮助。*我没有软件开发/编码方面的大学学位。我在编码中所知道的一切,都是我自己研究的。

#11


-1  

For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:

对于数据导入脚本,要替换“如果不存在”,在某种程度上,有一个稍微有点笨拙的公式,但仍然有效:

DO
$do$
BEGIN
PERFORM id
FROM whatever_table;

IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;

#1


28  

Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.

尝试做一个更新。如果它不修改任何表示它不存在的行,那么进行插入。显然,这是在事务中进行的。

You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.

如果不想在客户端上放置额外的代码,当然可以将其封装到函数中。你还需要一个循环,在这个想法中非常罕见的种族条件。

There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.

在文档中有一个这样的例子:http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html,示例40-2在底部。

That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.

这通常是最简单的方法。你可以用规则做一些魔术,但它可能会更混乱。我建议在任何一天都使用包装功能方法。

This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)

这适用于单个行或少数行值。如果要处理大量的行(例如来自子查询的行),最好将其分为两个查询,一个用于插入,一个用于更新(当然作为适当的连接/子选择—不需要编写主过滤器两次)

#2


94  

With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):

对于PostgreSQL 9.5,这是现在的本地功能(就像MySQL几年前的功能一样):

INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")

插入……关于冲突,什么都不做/更新(“UPSERT”)

9.5 brings support for "UPSERT" operations. INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.

9.5支持“维护”操作。插入扩展为接受一个ON CONFLICT DO UPDATE/IGNORE子句。该条款指定了在可能重复违反的情况下要采取的另一项行动。

...

Further example of new syntax:

新语法的进一步例子:

INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1) 
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;

#3


91  

Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!

编辑:如果你错过了warren的答案,PG9.5现在有了这个本地版本;时间升级!


Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):

以Bill Karwin的回答为基础,阐明基于规则的方法是什么样子的(从同一个数据库中的另一个模式转移,并使用多列主键):

CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
  WHERE EXISTS(SELECT 1 FROM my_table 
                WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
  DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";

Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.

注意:该规则适用于所有的插入操作,直到规则被删除,所以不是非常特别。

#4


22  

To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.

要获得插入忽略逻辑,您可以执行如下操作。我发现简单地从一个文本值的select语句中插入效果最好,然后您可以用一个不存在的子句来屏蔽重复的键。为了获得对重复逻辑的更新,我怀疑有必要使用pl/pgsql循环。

INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
  ('935',' Citroën Brazil','Citroën'),
  ('ABC', 'Toyota', 'Toyota'),
  ('ZOM',' OM','OM')
  ) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
  WHERE NOT EXISTS (
    --ignore anything that has already been inserted
    SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)

#5


18  

INSERT INTO mytable(col1,col2) 
    SELECT 'val1','val2' 
    WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')

#6


12  

Looks like PostgreSQL supports a schema object called a rule.

看起来PostgreSQL支持一个称为规则的模式对象。

http://www.postgresql.org/docs/current/static/rules-update.html

http://www.postgresql.org/docs/current/static/rules-update.html

You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.

您可以为给定的表创建一个INSERT的规则,如果一行存在给定的主键值,那么它将不做任何事情,或者如果一行存在给定的主键值,那么它将进行更新,而不是插入。

I haven't tried this myself, so I can't speak from experience or offer an example.

我自己还没有尝试过,所以我不能从经验中说话,也不能举个例子。

#7


12  

For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:

对于那些有9.5或更高的Postgres的人来说,新的关于冲突的语法应该不会起作用:

INSERT INTO target_table (field_one, field_two, field_three ) 
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;

For those of us who have an earlier version, this right join will work instead:

对于我们这些有较早版本的人来说,这个正确的连接将会起作用:

INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table 
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;

#8


2  

This solution avoids using rules:

此解决方案避免使用规则:

BEGIN
   INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION 
   WHEN unique_violation THEN
     UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;

but it has a performance drawback (see PostgreSQL.org):

但是它有一个性能缺陷(参见PostgreSQL.org):

A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need.

包含异常子句的块比没有异常子句的块要昂贵得多。因此,不需要就不要使用异常。

#9


1  

On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.

在批量上,您可以在插入前删除行。删除不存在的行不会导致错误,因此它被安全地跳过。

#10


1  

As @hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:

正如@hanmari在他的评论中提到的。当插入到postgres表时,on conflict(.. .)是不插入重复数据的最佳代码。

query = "INSERT INTO db_table_name(column_name)
         VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"

The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table. I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1. Example:

冲突代码行允许insert语句仍然插入数据行。查询和值代码是将日期从Excel插入到postgres db表的示例。我在postgres表中添加了一些约束,以确保ID字段是惟一的。我没有对相同的数据行执行删除操作,而是添加了一行sql代码,从1开始重新对ID列进行编号。例子:

q = 'ALTER id_column serial RESTART WITH 1'

If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial. I hope this information is helpful to everyone. *I have no college degree in software development/coding. Everything I know in coding, I study on my own.

如果我的数据有一个ID字段,我不使用它作为主ID/串行ID,我创建一个ID列并将其设置为串行。我希望这些信息对大家都有帮助。*我没有软件开发/编码方面的大学学位。我在编码中所知道的一切,都是我自己研究的。

#11


-1  

For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:

对于数据导入脚本,要替换“如果不存在”,在某种程度上,有一个稍微有点笨拙的公式,但仍然有效:

DO
$do$
BEGIN
PERFORM id
FROM whatever_table;

IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;