upserting到MySQL,但有多列和唯一索引作为重复检查?

时间:2022-10-15 04:25:26

I saw a lot of people have asked about upserting (this, this, this, this, this, this, and more and even the official doc).

我看到很多人都问过这个问题(这个,这个,这个,这个,这个,这个,甚至是官方文档)。

However, something that is not explained well enough for newbies to understand is how to create the duplicate key using primary key or unique indexes.

但是,对于新手无法理解的问题是如何使用主键或唯一索引创建重复键。

What I need:
If a table1's unique combination of 3 columns (attributeId, entityId, carId) has a duplicate in table2, then update the value column. Else take table1's row and insert it into table2.

我需要的:如果table1的3列唯一组合(attributeId,entityId,carId)在table2中有重复,则更新value列。否则取table1的行并将其插入table2。

The attributeId, entityId, carId combination will be unique for every row.
ie: If a row has columns as 1,2,5, then no other row will have 1,2,5. But another row might have 5,1,2 or 3,4,2 etc.

attributeId,entityId,carId组合对于每一行都是唯一的。 ie:如果一行的列为1,2,5,则其他行不会有1,2,5。但另一行可能有5,1,2或3,4,2等。

The dilemma here is about creating the unique index. Is it sufficient to just do it like this:

这里的困境是关于创建唯一索引。这样做就足够了:

CREATE INDEX PIndex ON table1 (attributeId, entityId, carId);

CREATE INDEX PIndex ON table1(attributeId,entityId,carId);

or is it necessary to delete all other indexes and then create this index and then run a query like this? (pseudocode below):

或者是否有必要删除所有其他索引,然后创建此索引,然后运行这样的查询? (下面的伪代码):

    INSERT INTO table1 (attributeId, entityId, carId, value, name) 
    VALUES (table2.attributeId,table2.entityId,table2.carId,table2.value,table2.name) 
ON DUPLICATE KEY UPDATE value=VALUES(value);

The basic logic being:
If for a row in table2, there is a corresponding row in table1 with exactly the same values for attributeId, entityId and carId, then update the value column in table1 with the value of the value column in table2. If there is no corresponding row, then take the row of table2 and append it to table1.

基本逻辑是:如果对于table2中的一行,table1中的对应行与attributeId,entityId和carId具有完全相同的值,则使用table2中value列的值更新table1中的value列。如果没有相应的行,则取table2行并将其追加到table1。

2 个解决方案

#1


1  

Seems like the specification is for two different operations: 1) an UPDATE of existing rows in table1, and 2) an INSERT of new rows into table2.

似乎规范适用于两种不同的操作:1)table1中现有行的UPDATE,以及2)将新行INSERT到table2中。

The specification says "update the value column"... we take that to mean update the value column in the row of table1.

规范说“更新值列”...我们认为这意味着更新table1行中的value列。

The specification also says "insert ... into table2.

规范还说“插入...到table2。

Confusingly, the specification also shows an example pseudo-code INSERT INTO table1.

令人困惑的是,规范还显示了一个示例伪代码INSERT INTO table1。


To perform an UPDATE of table1 based on values in table2, assuming we are going to ignore rows that have a NULL value in any of the three columns...

要根据table2中的值执行table1的UPDATE,假设我们要忽略三列中任何一列中具有NULL值的行...

 UPDATE table1 t
   JOIN table2 s
     ON t.attributeid = s.attributeid
    AND t.entityid    = s.entityid
    AND t.carid       = s.carid
    SET t.value  = s.value 

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which of those rows value will be taken from.

如果table2中存在“重复”(即table2中的多行具有三列属性idid,entityid和carid的相同值,则不确定将从哪些行中获取值。


To insert a row that is found in table2 but "missing" from table1 (again assuming those three columns may not be unique in table2), we can use an anti-join pattern to eliminate rows which already have a "match" in table1.

要插入在table2中找到但在table1中“丢失”的行(再次假设这三列在table2中可能不唯一),我们可以使用反连接模式来消除table1中已经具有“匹配”的行。

For example:

例如:

 INSERT INTO table1 (attributeid, entityid, carid, value)
 SELECT v.*
   FROM ( SELECT s.attribute_id
               , s.entity_id
               , s.carid
               , s.value
            FROM table2 s
            LEFT
            JOIN table1 r
              ON r.attributeid = s.attributeid
             AND r.entityid    = s.entityid
             AND r.carid       = s.carid
           WHERE r.attributeid IS NULL
             AND s.attributeid IS NOT NULL
             AND s.entityid    IS NOT NULL
             AND s.carid       IS NOT NULL
           GROUP
              BY s.attributeid
               , s.entityid
               , s.carid
        ) v

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which row value will be taken from.

如果table2中存在“重复”(即table2中的多行具有三列属性idid,entityid和carid的相同值,则不确定哪个行值将从中获取。

If there are other UNIQUE constraints defined on other columns, or combinations of columns, the statement has a potential to throw a "duplicate key" error. (Without knowing the key definitions, we're kinda flying blind.) We could add the IGNORE keyword if we want the statement to succeed, just ignoring rows that fail to insert due to "unique key" violations.)

如果在其他列或列组合上定义了其他UNIQUE约束,则该语句可能会抛出“重复键”错误。 (在不知道关键定义的情况下,我们有点盲目。)如果我们希望语句成功,我们可以添加IGNORE关键字,只是忽略由于“唯一键”违规而无法插入的行。)

Again, if there are rows in table2 with the same values in the the three columns (no indication is given that this combination of columns is unique in table2), it's indeterminate which of those rows value will be taken from.

同样,如果table2中的行在三列中具有相同的值(没有指示这个列的组合在table2中是唯一的),那么将不确定哪些行值将是不确定的。

The same operations can be performed in the opposite direction, swapping all occurrences of the table references table1 and table2 in the queries.

可以在相反的方向上执行相同的操作,在查询中交换所有出现的表引用table1和table2。


It's not necessary to add a UNIQUE KEY to either of the tables to perform these operations. There would (likely) be a performance benefit to having a suitable index defined, with those three columns as the leading (first) columns in the index. (That doesn't necessarily need to be a UNIQUE index for this operation.)

没有必要在任何一个表中添加UNIQUE KEY来执行这些操作。定义合适的索引可能(可能)具有性能优势,将这三列作为索引中的前导(第一列)。 (这不一定需要是此操作的UNIQUE索引。)

If that combination of columns should be unique, then by all means add a UNIQUE KEY on that combination of columns. But the specified operations can be performed without a UNIQUE KEY defined.

如果列的组合应该是唯一的,那么无论如何都要在该列组合上添加UNIQUE KEY。但是可以在没有定义UNIQUE KEY的情况下执行指定的操作。

The MySQL INSERT ... ON DUPLICATE KEY syntax does require at least one PRIMARY KEY or UNIQUE KEY to operate. If there are multiple UNIQUE KEY constraints on the target table, and an INSERT would violate two or more of the unique key constraints, I believe it's indeterminate which of those keys will be used in the UPDATE action. Personally, I'd tend to steer clear of using that syntax on a table with more than one UNIQUE KEY defined.

MySQL INSERT ... ON DUPLICATE KEY语法确实需要至少一个PRIMARY KEY或UNIQUE KEY才能运行。如果目标表上有多个UNIQUE KEY约束,并且INSERT会违反两个或多个唯一键约束,我相信在UPDATE操作中将使用哪些键是不确定的。就个人而言,我倾向于避免在定义了多个UNIQUE KEY的表上使用该语法。

#2


0  

You can use the syntax

您可以使用语法

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX PIndex (attributeId, entityId, carId);

According to the documentation:

根据文件:

If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

如果指定了IGNORE,则只使用一行在唯一键上具有重复项的行。其他冲突的行将被删除。不正确的值将被截断为最接近的匹配可接受值。

Unfortunately it does not specify which value will be kept. Doing some tests it seems like it keeps the first occurrence, but you can never be sure.

不幸的是,它没有指定保留哪个值。做一些测试似乎是第一次出现,但你永远无法确定。

If which entry will be dropped does not bother you this is the easiest solution, otherwise if you want more control it would be better to go through a temporary table.

如果哪个条目将被删除不打扰你这是最简单的解决方案,否则如果你想要更多的控制,最好通过一个临时表。

The command CREATE UNIQUE PIndex ON table1 (attributeId, entityId, carId); (note the added UNIQUE) will simply fail on the first duplicate key, and no option to manage duplicates is available.

命令CREATE UNIQUE PIndex ON table1(attributeId,entityId,carId); (注意添加的UNIQUE)将在第一个重复键上失败,并且没有可用于管理重复项的选项。

#1


1  

Seems like the specification is for two different operations: 1) an UPDATE of existing rows in table1, and 2) an INSERT of new rows into table2.

似乎规范适用于两种不同的操作:1)table1中现有行的UPDATE,以及2)将新行INSERT到table2中。

The specification says "update the value column"... we take that to mean update the value column in the row of table1.

规范说“更新值列”...我们认为这意味着更新table1行中的value列。

The specification also says "insert ... into table2.

规范还说“插入...到table2。

Confusingly, the specification also shows an example pseudo-code INSERT INTO table1.

令人困惑的是,规范还显示了一个示例伪代码INSERT INTO table1。


To perform an UPDATE of table1 based on values in table2, assuming we are going to ignore rows that have a NULL value in any of the three columns...

要根据table2中的值执行table1的UPDATE,假设我们要忽略三列中任何一列中具有NULL值的行...

 UPDATE table1 t
   JOIN table2 s
     ON t.attributeid = s.attributeid
    AND t.entityid    = s.entityid
    AND t.carid       = s.carid
    SET t.value  = s.value 

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which of those rows value will be taken from.

如果table2中存在“重复”(即table2中的多行具有三列属性idid,entityid和carid的相同值,则不确定将从哪些行中获取值。


To insert a row that is found in table2 but "missing" from table1 (again assuming those three columns may not be unique in table2), we can use an anti-join pattern to eliminate rows which already have a "match" in table1.

要插入在table2中找到但在table1中“丢失”的行(再次假设这三列在table2中可能不唯一),我们可以使用反连接模式来消除table1中已经具有“匹配”的行。

For example:

例如:

 INSERT INTO table1 (attributeid, entityid, carid, value)
 SELECT v.*
   FROM ( SELECT s.attribute_id
               , s.entity_id
               , s.carid
               , s.value
            FROM table2 s
            LEFT
            JOIN table1 r
              ON r.attributeid = s.attributeid
             AND r.entityid    = s.entityid
             AND r.carid       = s.carid
           WHERE r.attributeid IS NULL
             AND s.attributeid IS NOT NULL
             AND s.entityid    IS NOT NULL
             AND s.carid       IS NOT NULL
           GROUP
              BY s.attributeid
               , s.entityid
               , s.carid
        ) v

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which row value will be taken from.

如果table2中存在“重复”(即table2中的多行具有三列属性idid,entityid和carid的相同值,则不确定哪个行值将从中获取。

If there are other UNIQUE constraints defined on other columns, or combinations of columns, the statement has a potential to throw a "duplicate key" error. (Without knowing the key definitions, we're kinda flying blind.) We could add the IGNORE keyword if we want the statement to succeed, just ignoring rows that fail to insert due to "unique key" violations.)

如果在其他列或列组合上定义了其他UNIQUE约束,则该语句可能会抛出“重复键”错误。 (在不知道关键定义的情况下,我们有点盲目。)如果我们希望语句成功,我们可以添加IGNORE关键字,只是忽略由于“唯一键”违规而无法插入的行。)

Again, if there are rows in table2 with the same values in the the three columns (no indication is given that this combination of columns is unique in table2), it's indeterminate which of those rows value will be taken from.

同样,如果table2中的行在三列中具有相同的值(没有指示这个列的组合在table2中是唯一的),那么将不确定哪些行值将是不确定的。

The same operations can be performed in the opposite direction, swapping all occurrences of the table references table1 and table2 in the queries.

可以在相反的方向上执行相同的操作,在查询中交换所有出现的表引用table1和table2。


It's not necessary to add a UNIQUE KEY to either of the tables to perform these operations. There would (likely) be a performance benefit to having a suitable index defined, with those three columns as the leading (first) columns in the index. (That doesn't necessarily need to be a UNIQUE index for this operation.)

没有必要在任何一个表中添加UNIQUE KEY来执行这些操作。定义合适的索引可能(可能)具有性能优势,将这三列作为索引中的前导(第一列)。 (这不一定需要是此操作的UNIQUE索引。)

If that combination of columns should be unique, then by all means add a UNIQUE KEY on that combination of columns. But the specified operations can be performed without a UNIQUE KEY defined.

如果列的组合应该是唯一的,那么无论如何都要在该列组合上添加UNIQUE KEY。但是可以在没有定义UNIQUE KEY的情况下执行指定的操作。

The MySQL INSERT ... ON DUPLICATE KEY syntax does require at least one PRIMARY KEY or UNIQUE KEY to operate. If there are multiple UNIQUE KEY constraints on the target table, and an INSERT would violate two or more of the unique key constraints, I believe it's indeterminate which of those keys will be used in the UPDATE action. Personally, I'd tend to steer clear of using that syntax on a table with more than one UNIQUE KEY defined.

MySQL INSERT ... ON DUPLICATE KEY语法确实需要至少一个PRIMARY KEY或UNIQUE KEY才能运行。如果目标表上有多个UNIQUE KEY约束,并且INSERT会违反两个或多个唯一键约束,我相信在UPDATE操作中将使用哪些键是不确定的。就个人而言,我倾向于避免在定义了多个UNIQUE KEY的表上使用该语法。

#2


0  

You can use the syntax

您可以使用语法

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX PIndex (attributeId, entityId, carId);

According to the documentation:

根据文件:

If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

如果指定了IGNORE,则只使用一行在唯一键上具有重复项的行。其他冲突的行将被删除。不正确的值将被截断为最接近的匹配可接受值。

Unfortunately it does not specify which value will be kept. Doing some tests it seems like it keeps the first occurrence, but you can never be sure.

不幸的是,它没有指定保留哪个值。做一些测试似乎是第一次出现,但你永远无法确定。

If which entry will be dropped does not bother you this is the easiest solution, otherwise if you want more control it would be better to go through a temporary table.

如果哪个条目将被删除不打扰你这是最简单的解决方案,否则如果你想要更多的控制,最好通过一个临时表。

The command CREATE UNIQUE PIndex ON table1 (attributeId, entityId, carId); (note the added UNIQUE) will simply fail on the first duplicate key, and no option to manage duplicates is available.

命令CREATE UNIQUE PIndex ON table1(attributeId,entityId,carId); (注意添加的UNIQUE)将在第一个重复键上失败,并且没有可用于管理重复项的选项。