I have a script that did double inserts into the database with the same data. Is there a good way to do this (without scanning through, inserting every record into an array, and then deleting duplicate array entries)?
我有一个脚本,使用相同的数据双重插入数据库。有没有一个很好的方法来做到这一点(没有扫描,将每个记录插入一个数组,然后删除重复的数组条目)?
4 个解决方案
#1
MySQL supports multi-table DELETE which is really cool and can help here. You can do a self-join on the equality of all columns except the id
, and then delete the matching row with the greater id
.
MySQL支持多表DELETE,这非常酷,可以在这里提供帮助。您可以对除id之外的所有列的相等性进行自联接,然后删除具有更大id的匹配行。
DELETE t2
FROM mytable t1 JOIN mytable t2
USING (column1, column2, column3) -- this is an equi-join
WHERE t1.id < t2.id;
#2
DELETE
FROM t
WHERE ID IN (
SELECT MAX(ID)
FROM t
GROUP BY {Your Group Criteria Here}
HAVING COUNT(*) > 1
)
#3
Or the old simple way, I'd be surprised if it's not fastest. Especially faster than matching a GROUP BY aggregate function.
或者旧的简单方式,如果它不是最快的话,我会感到惊讶。特别是比匹配GROUP BY聚合函数更快。
DELETE FROM mytable m1
WHERE EXISTS
( SELECT 1 FROM mytable
WHERE fields = m1.fields
AND id < m1.id )
DELETE FROM mytable m1 WHERE EXISTS(SELECT 1 FROM mytable WHERE fields = m1.fields AND id
#4
If you don't have anything referencing into the table by key right now, I'd mysqldump --complete-insert it, strip the primary keys, change the table definition to enforce some unique key or another that would catch your duplications, change the INSERTs to REPLACEs, and load the data back in. Gets you a nice clean table without holes in the PK sequence or deleted rows.
如果你现在没有通过键引用表的任何内容,我会mysqldump --complete-插入它,删除主键,更改表定义以强制执行某些唯一键或其他可以捕获重复的键,更改INSERTs to REPLACEs,并重新加载数据。获取一个漂亮的干净表,PK序列中没有空洞或删除行。
#1
MySQL supports multi-table DELETE which is really cool and can help here. You can do a self-join on the equality of all columns except the id
, and then delete the matching row with the greater id
.
MySQL支持多表DELETE,这非常酷,可以在这里提供帮助。您可以对除id之外的所有列的相等性进行自联接,然后删除具有更大id的匹配行。
DELETE t2
FROM mytable t1 JOIN mytable t2
USING (column1, column2, column3) -- this is an equi-join
WHERE t1.id < t2.id;
#2
DELETE
FROM t
WHERE ID IN (
SELECT MAX(ID)
FROM t
GROUP BY {Your Group Criteria Here}
HAVING COUNT(*) > 1
)
#3
Or the old simple way, I'd be surprised if it's not fastest. Especially faster than matching a GROUP BY aggregate function.
或者旧的简单方式,如果它不是最快的话,我会感到惊讶。特别是比匹配GROUP BY聚合函数更快。
DELETE FROM mytable m1
WHERE EXISTS
( SELECT 1 FROM mytable
WHERE fields = m1.fields
AND id < m1.id )
DELETE FROM mytable m1 WHERE EXISTS(SELECT 1 FROM mytable WHERE fields = m1.fields AND id
#4
If you don't have anything referencing into the table by key right now, I'd mysqldump --complete-insert it, strip the primary keys, change the table definition to enforce some unique key or another that would catch your duplications, change the INSERTs to REPLACEs, and load the data back in. Gets you a nice clean table without holes in the PK sequence or deleted rows.
如果你现在没有通过键引用表的任何内容,我会mysqldump --complete-插入它,删除主键,更改表定义以强制执行某些唯一键或其他可以捕获重复的键,更改INSERTs to REPLACEs,并重新加载数据。获取一个漂亮的干净表,PK序列中没有空洞或删除行。