如何删除MySQL表上的副本?

时间:2022-10-06 04:23:37

I need to DELETE duplicated rows for specified sid on a MySQL table.

我需要在MySQL表上删除指定sid的重复行。

How can I do this with an SQL query?

如何使用SQL查询实现这一点?

DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"

Something like this, but I don't know how to do it.

像这样的东西,但我不知道怎么做。

22 个解决方案

#1


195  

this removes duplicates in place, without making a new table

这样可以在不创建新表的情况下删除重复。

ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)

note: only works well if index fits in memory

注意:只有索引在内存中才能正常工作

#2


101  

Suppose you have a table employee, with the following columns:

假设您有一个表雇员,列如下:

employee (first_name, last_name, start_date)

In order to delete the rows with a duplicate first_name column:

为了删除具有重复first_name列的行:

delete
from employee using employee,
    employee e1
where employee.id > e1.id
    and employee.first_name = e1.first_name  

#3


47  

Following remove duplicates for all SID-s, not only single one.

接下来,删除所有SID-s的副本,而不是仅删除一个。

With temp table

与临时表

CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;

Since temp_table is freshly created it has no indexes. You'll need to recreate them after removing duplicates. You can check what indexes you have in the table with SHOW INDEXES IN table

由于temp_table是新创建的,所以没有索引。您需要在删除副本之后重新创建它们。您可以使用表中的SHOW索引检查表中有哪些索引

Without temp table:

没有临时表:

DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)

#4


42  

Deleting duplicate rows in MySQL, walkthrough

Create the table and insert some rows:

创建表格并插入一些行:

dev-db> create table penguins(foo int, bar varchar(15), baz datetime);
Query OK, 0 rows affected (0.07 sec)
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(4, 'rico', now());
Query OK, 6 rows affected (0.07 sec)
dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:54 |
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:09 |
|    3 | kowalski | 2014-08-25 14:22:13 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)

Then remove the duplicates:

然后删除重复:

dev-db> delete a
    -> from penguins a
    -> left join(
    -> select max(baz) maxtimestamp, foo, bar
    -> from penguins
    -> group by foo, bar) b
    -> on a.baz = maxtimestamp and
    -> a.foo = b.foo and
    -> a.bar = b.bar
    -> where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)

Result:

结果:

dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)

What's that delete statement doing

删除语句是做什么的

Pseudocode: Group the rows by the two columns you want to remove duplicates of. Choose the one row of each group to keep by using the max aggregate. A left join returns all rows from the left table, with the matching rows in the right table. In this case the left table has all rows in the table, and the right only holds those rows that are NULL (not the one row per group you want to keep). Deleting those rows, you are left with only the unique one per group.

伪代码:按要删除重复的两列对行进行分组。选择每个组的一行,以使用最大聚合。左连接返回左表中的所有行,并在右表中返回匹配的行。在这种情况下,左表拥有表中的所有行,而右表只保存那些为NULL的行(而不是您希望保留的每组一行)。删除这些行后,每个组只剩下唯一的行。

More technical explanation, How you should read that sql delete statement:

更多的技术解释,如何读取sql delete语句:

Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.

别名“a”的桌企鹅被放在名为“b”的桌企鹅子集上。右边的表格“b”是一个子集,它根据foo和bar分组找到最大时间戳。这与左手表“a”匹配。(foo,bar,baz)左边有表格中的每一行。右边的子集“b”有一个(maxtimestamp,foo,bar),它只匹配左边的最大值。

Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.

不是max的每一行都有NULL的maxtimestamp值。在这些空行上进行筛选,您有一组由foo和bar分组的所有行,这不是最新的时间戳baz。删除那些。

Make a backup of the table before you run this.

在运行这个表之前,先做一个备份。

Prevent this problem from ever happening again on this table:

防止这个问题再次发生在这张桌子上:

If you got this to work, and it put out your "duplicate rows" fire. Great. Your work isn't done yet. Define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place. Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.

如果你让它工作,它会熄灭你的“重复行”火焰。太好了。你的工作还没有完成。在表上(在这两列上)定义一个新的组合唯一键,以防止首先添加更多重复项。就像一个良好的免疫系统一样,在插入时甚至不应该允许坏行进入表中。之后,所有添加副本的程序都会播放他们的*,当你修复它们时,这个问题再也不会出现。

#5


11  

This always seems to work for me:

这似乎总是对我有用:

CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;

Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.

它保持每个被欺骗者和其他非被欺骗记录的最低ID。

I've also taken to doing the following so that the dupe issue no longer occurs after the removal:

我还做了以下的工作,以便在删除后不再发生dupe问题:

CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;

In other words, I create a duplicate of the first table, add a unique index on the fields I don't want duplicates of, and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records.

换句话说,我创建了一个复制的第一个表,添加一个唯一索引的字段我不想重复,然后做一个插入忽略已不像一个正常插入失败的优势会第一次试图添加一个重复的记录基于两个字段而忽略了任何这样的记录。

Moving fwd it becomes impossible to create any duplicate records based on those two fields.

移动fwd根据这两个字段创建任何重复的记录是不可能的。

#6


7  

Here is a simple answer:

下面是一个简单的答案:

delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated  
    from target_table GROUP BY field_being_repeated) b 
    on a.field_being_repeated = b.field_being_repeated
      and a.id_field = b.id_field
    where b.id_field is null;

#7


5  

After running into this issue myself, on a huge database, I wasn't completely impressed with the performance of any of the other answers. I want to keep only the latest duplicate row, and delete the rest.

在我自己遇到这个问题之后,在一个巨大的数据库上,我对其他任何一个答案的性能都不是很满意。我只想保留最新的复制行,并删除其余的行。

In a one-query statement, without a temp table, this worked best for me,

在一个查询语句中,没有临时表,这对我来说是最有效的,

DELETE e.*
FROM employee e
WHERE id IN
 (SELECT id
   FROM (SELECT MIN(id) as id
          FROM employee e2
          GROUP BY first_name, last_name
          HAVING COUNT(*) > 1) x);

The only caveat is that I have to run the query multiple times, but even with that, I found it worked better for me than the other options.

唯一需要注意的是,我必须多次运行查询,但即便如此,我发现它比其他选项更适合我。

#8


4  

This procedure will remove all duplicates (incl multiples) in a table, keeping the last duplicate. This is an extension of Retrieving last record in each group

此过程将删除表中的所有副本(包括多个副本),保留最后的副本。这是在每个组中检索最后一条记录的扩展

Hope this is useful to someone.

希望这对某人有用。

DROP TABLE IF EXISTS UniqueIDs;
CREATE Temporary table UniqueIDs (id Int(11));

INSERT INTO UniqueIDs
    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields 
    AND T1.ID < T2.ID)
    WHERE T2.ID IS NULL);

DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);

#9


3  

This work for me to remove old records:

这项工作让我删除旧记录:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
); 

You can replace min(e.id) to max(e.id) to remove newest records.

您可以将min(e.id)替换为max(e.id)以删除最新的记录。

#10


3  

delete p from 
product p
inner join (
    select max(id) as id, url from product 
    group by url 
    having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;

#11


3  

The following works for all tables

以下方法适用于所有表

CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;

#12


2  

Another easy way... using UPDATE IGNORE:

另一个简单的方法…使用更新忽略:

U have to use an index on one or more columns (type index). Create a new temporary reference column (not part of the index). In this column, you mark the uniques in by updating it with ignore clause. Step by step:

您必须在一个或多个列上使用索引(类型索引)。创建一个新的临时引用列(不是索引的一部分)。在本专栏中,您通过使用ignore子句更新uniques来标记它。一步一步:

Add a temporary reference column to mark the uniques:

增加一个临时参考栏以标记院校:

ALTER TABLE `yourtable` ADD `unique` VARCHAR(3) NOT NULL AFTER `lastcolname`;

=> this will add a column to your table.

=>这将为您的表添加一个列。

Update the table, try to mark everything as unique, but ignore possible errors due to to duplicate key issue (records will be skipped):

更新表,尝试将所有内容标记为唯一,但忽略由于重复密钥问题而可能出现的错误(记录将被跳过):

UPDATE IGNORE `yourtable` SET `unique` = 'Yes' WHERE 1;

=> you will find your duplicate records will not be marked as unique = 'Yes', in other words only one of each set of duplicate records will be marked as unique.

=>您会发现您的重复记录将不会被标记为唯一= 'Yes',换句话说,每组重复记录中只有一个将被标记为唯一。

Delete everything that's not unique:

删除所有不是唯一的:

DELETE * FROM `yourtable` WHERE `unique` <> 'Yes';

=> This will remove all duplicate records.

=>将删除所有重复记录。

Drop the column...

删除列…

ALTER TABLE `yourtable` DROP `unique`;

#13


0  

delete from `table` where `table`.`SID` in 
    (
    select t.SID from table t join table t1 on t.title = t1.title  where t.SID > t1.SID
)

#14


0  

Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:

Love @eric的答案,但是如果您有一个非常大的表,它似乎就不起作用了(我得到的选择将检查超过MAX_JOIN_SIZE行;检查您的WHERE并使用SET sql_big_select =1或设置MAX_JOIN_SIZE=#,如果我尝试运行它时选择是正确的)。所以我将连接查询限制为只考虑重复的行,最后得到:

DELETE a FROM penguins a
    LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
        FROM penguins
        GROUP BY deviceId HAVING num > 1) b
        ON a.baz != b.keepBaz
        AND a.foo = b.foo
    WHERE b.foo IS NOT NULL

The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.

在本例中,WHERE子句允许MySQL忽略没有重复的行,如果这是重复的第一个实例,那么也将忽略,因此只有后续的重复将被忽略。将MIN(baz)改为MAX(baz)以保留最后一个实例而不是第一个实例。

#15


0  

This works for large tables:

这适用于大型表格:

 CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;

 DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;

To delete oldest change max(id) to min(id)

将最老的更改max(id)删除为min(id)

#16


0  

This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.

这将使列column_name成为主键,同时忽略所有错误。因此,它将删除具有column_name重复值的行。

ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);

#17


0  

Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid to handle big data sources (with examples for different use cases).

在MySQL表上删除副本是一个常见问题,通常需要特定的需求。如果有人感兴趣,这里(删除MySQL中的重复行)我将解释如何使用一个临时表以一种可靠且快速的方式删除MySQL重复,这对于处理大数据源也是有效的(对于不同的用例有示例)。

Ali, in your case, you can run something like this:

阿里,在你的情况下,你可以这样运行:

-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;

#18


0  

I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

我发现上面的Werner解决方案是最方便的,因为不管主键的存在与否,它都可以工作,不会对表造成混乱,使用的是不会过时的纯sql,这是可以理解的。

As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

正如我在评论中所指出的,这个解决方案还没有得到恰当的解释。这是我的,基于它。

1) add a new boolean column

1)添加一个新的布尔列

alter table mytable add tokeep boolean;

2) add a constraint on the duplicated columns AND the new column

2)在重复列和新列上添加约束

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);

3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

3)将布尔列设为true。由于新的约束,这只会在一个重复的行上成功

update ignore mytable set tokeep = true;

4) delete rows that have not been marked as tokeep

4)删除未标记为tokeep的行。

delete from mytable where tokeep is null;

5) drop the added column

5)删除添加的列

alter table mytable drop tokeep;

I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

我建议您保留所添加的约束,以便将来避免新的重复。

#19


0  

I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.

我认为这可以通过复制表并清空它,然后只将不同的值放回表中来实现,但请在对大量数据进行检查之前再次检查它。

Creates a carbon copy of your table

创建您的表的副本

create table temp_table like oldtablename; insert temp_table select * from oldtablename;

创建表temp_table,如oldtablename;从oldtablename中插入temp_table select *;

Empties your original table

清空你的原始表

DELETE * from oldtablename;

删除从oldtablename *;

Copies all distinct values from the copied table back to your original table

将复制表中的所有不同值复制回原始表

INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob

通过firstname、lastname、dob从temp_table组插入oldtablename SELECT *

Deletes your temp table.

删除临时表。

Drop Table temp_table

删除表temp_table

You need to group by aLL fields that you want to keep distinct.

您需要将所有希望保持不同的字段分组。

#20


-2  

You could just use a DISTINCT clause to select the "cleaned up" list (and here is a very easy example on how to do that).

您可以使用一个不同的子句来选择“清理”列表(这里有一个非常简单的示例说明如何进行此操作)。

#21


-3  

Could it work if you count them, and then add a limit to your delete query leaving just one?

如果您对它们进行计数,然后在删除查询中添加一个限制,只留下一个,是否可以?

For example, if you have two or more, write your query like this:

例如,如果您有两个或两个以上的查询,可以这样编写查询:

DELETE FROM table WHERE SID = 1 LIMIT 1;

#22


-5  

There are just a few basic steps when removing duplicate data from your table:

从表中删除重复数据时,只有几个基本步骤:

  • Back up your table!
  • 备份你的表!
  • Find the duplicate rows
  • 发现重复的行
  • Remove the duplicate rows
  • 删除重复的行

Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473

这里有完整的教程:https://blog.teamsql.io/deleting-duplicate- 35data - 41485b3473

#1


195  

this removes duplicates in place, without making a new table

这样可以在不创建新表的情况下删除重复。

ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)

note: only works well if index fits in memory

注意:只有索引在内存中才能正常工作

#2


101  

Suppose you have a table employee, with the following columns:

假设您有一个表雇员,列如下:

employee (first_name, last_name, start_date)

In order to delete the rows with a duplicate first_name column:

为了删除具有重复first_name列的行:

delete
from employee using employee,
    employee e1
where employee.id > e1.id
    and employee.first_name = e1.first_name  

#3


47  

Following remove duplicates for all SID-s, not only single one.

接下来,删除所有SID-s的副本,而不是仅删除一个。

With temp table

与临时表

CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;

Since temp_table is freshly created it has no indexes. You'll need to recreate them after removing duplicates. You can check what indexes you have in the table with SHOW INDEXES IN table

由于temp_table是新创建的,所以没有索引。您需要在删除副本之后重新创建它们。您可以使用表中的SHOW索引检查表中有哪些索引

Without temp table:

没有临时表:

DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)

#4


42  

Deleting duplicate rows in MySQL, walkthrough

Create the table and insert some rows:

创建表格并插入一些行:

dev-db> create table penguins(foo int, bar varchar(15), baz datetime);
Query OK, 0 rows affected (0.07 sec)
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(4, 'rico', now());
Query OK, 6 rows affected (0.07 sec)
dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:54 |
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:09 |
|    3 | kowalski | 2014-08-25 14:22:13 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)

Then remove the duplicates:

然后删除重复:

dev-db> delete a
    -> from penguins a
    -> left join(
    -> select max(baz) maxtimestamp, foo, bar
    -> from penguins
    -> group by foo, bar) b
    -> on a.baz = maxtimestamp and
    -> a.foo = b.foo and
    -> a.bar = b.bar
    -> where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)

Result:

结果:

dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)

What's that delete statement doing

删除语句是做什么的

Pseudocode: Group the rows by the two columns you want to remove duplicates of. Choose the one row of each group to keep by using the max aggregate. A left join returns all rows from the left table, with the matching rows in the right table. In this case the left table has all rows in the table, and the right only holds those rows that are NULL (not the one row per group you want to keep). Deleting those rows, you are left with only the unique one per group.

伪代码:按要删除重复的两列对行进行分组。选择每个组的一行,以使用最大聚合。左连接返回左表中的所有行,并在右表中返回匹配的行。在这种情况下,左表拥有表中的所有行,而右表只保存那些为NULL的行(而不是您希望保留的每组一行)。删除这些行后,每个组只剩下唯一的行。

More technical explanation, How you should read that sql delete statement:

更多的技术解释,如何读取sql delete语句:

Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.

别名“a”的桌企鹅被放在名为“b”的桌企鹅子集上。右边的表格“b”是一个子集,它根据foo和bar分组找到最大时间戳。这与左手表“a”匹配。(foo,bar,baz)左边有表格中的每一行。右边的子集“b”有一个(maxtimestamp,foo,bar),它只匹配左边的最大值。

Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.

不是max的每一行都有NULL的maxtimestamp值。在这些空行上进行筛选,您有一组由foo和bar分组的所有行,这不是最新的时间戳baz。删除那些。

Make a backup of the table before you run this.

在运行这个表之前,先做一个备份。

Prevent this problem from ever happening again on this table:

防止这个问题再次发生在这张桌子上:

If you got this to work, and it put out your "duplicate rows" fire. Great. Your work isn't done yet. Define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place. Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.

如果你让它工作,它会熄灭你的“重复行”火焰。太好了。你的工作还没有完成。在表上(在这两列上)定义一个新的组合唯一键,以防止首先添加更多重复项。就像一个良好的免疫系统一样,在插入时甚至不应该允许坏行进入表中。之后,所有添加副本的程序都会播放他们的*,当你修复它们时,这个问题再也不会出现。

#5


11  

This always seems to work for me:

这似乎总是对我有用:

CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;

Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.

它保持每个被欺骗者和其他非被欺骗记录的最低ID。

I've also taken to doing the following so that the dupe issue no longer occurs after the removal:

我还做了以下的工作,以便在删除后不再发生dupe问题:

CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;

In other words, I create a duplicate of the first table, add a unique index on the fields I don't want duplicates of, and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records.

换句话说,我创建了一个复制的第一个表,添加一个唯一索引的字段我不想重复,然后做一个插入忽略已不像一个正常插入失败的优势会第一次试图添加一个重复的记录基于两个字段而忽略了任何这样的记录。

Moving fwd it becomes impossible to create any duplicate records based on those two fields.

移动fwd根据这两个字段创建任何重复的记录是不可能的。

#6


7  

Here is a simple answer:

下面是一个简单的答案:

delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated  
    from target_table GROUP BY field_being_repeated) b 
    on a.field_being_repeated = b.field_being_repeated
      and a.id_field = b.id_field
    where b.id_field is null;

#7


5  

After running into this issue myself, on a huge database, I wasn't completely impressed with the performance of any of the other answers. I want to keep only the latest duplicate row, and delete the rest.

在我自己遇到这个问题之后,在一个巨大的数据库上,我对其他任何一个答案的性能都不是很满意。我只想保留最新的复制行,并删除其余的行。

In a one-query statement, without a temp table, this worked best for me,

在一个查询语句中,没有临时表,这对我来说是最有效的,

DELETE e.*
FROM employee e
WHERE id IN
 (SELECT id
   FROM (SELECT MIN(id) as id
          FROM employee e2
          GROUP BY first_name, last_name
          HAVING COUNT(*) > 1) x);

The only caveat is that I have to run the query multiple times, but even with that, I found it worked better for me than the other options.

唯一需要注意的是,我必须多次运行查询,但即便如此,我发现它比其他选项更适合我。

#8


4  

This procedure will remove all duplicates (incl multiples) in a table, keeping the last duplicate. This is an extension of Retrieving last record in each group

此过程将删除表中的所有副本(包括多个副本),保留最后的副本。这是在每个组中检索最后一条记录的扩展

Hope this is useful to someone.

希望这对某人有用。

DROP TABLE IF EXISTS UniqueIDs;
CREATE Temporary table UniqueIDs (id Int(11));

INSERT INTO UniqueIDs
    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields 
    AND T1.ID < T2.ID)
    WHERE T2.ID IS NULL);

DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);

#9


3  

This work for me to remove old records:

这项工作让我删除旧记录:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
); 

You can replace min(e.id) to max(e.id) to remove newest records.

您可以将min(e.id)替换为max(e.id)以删除最新的记录。

#10


3  

delete p from 
product p
inner join (
    select max(id) as id, url from product 
    group by url 
    having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;

#11


3  

The following works for all tables

以下方法适用于所有表

CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;

#12


2  

Another easy way... using UPDATE IGNORE:

另一个简单的方法…使用更新忽略:

U have to use an index on one or more columns (type index). Create a new temporary reference column (not part of the index). In this column, you mark the uniques in by updating it with ignore clause. Step by step:

您必须在一个或多个列上使用索引(类型索引)。创建一个新的临时引用列(不是索引的一部分)。在本专栏中,您通过使用ignore子句更新uniques来标记它。一步一步:

Add a temporary reference column to mark the uniques:

增加一个临时参考栏以标记院校:

ALTER TABLE `yourtable` ADD `unique` VARCHAR(3) NOT NULL AFTER `lastcolname`;

=> this will add a column to your table.

=>这将为您的表添加一个列。

Update the table, try to mark everything as unique, but ignore possible errors due to to duplicate key issue (records will be skipped):

更新表,尝试将所有内容标记为唯一,但忽略由于重复密钥问题而可能出现的错误(记录将被跳过):

UPDATE IGNORE `yourtable` SET `unique` = 'Yes' WHERE 1;

=> you will find your duplicate records will not be marked as unique = 'Yes', in other words only one of each set of duplicate records will be marked as unique.

=>您会发现您的重复记录将不会被标记为唯一= 'Yes',换句话说,每组重复记录中只有一个将被标记为唯一。

Delete everything that's not unique:

删除所有不是唯一的:

DELETE * FROM `yourtable` WHERE `unique` <> 'Yes';

=> This will remove all duplicate records.

=>将删除所有重复记录。

Drop the column...

删除列…

ALTER TABLE `yourtable` DROP `unique`;

#13


0  

delete from `table` where `table`.`SID` in 
    (
    select t.SID from table t join table t1 on t.title = t1.title  where t.SID > t1.SID
)

#14


0  

Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:

Love @eric的答案,但是如果您有一个非常大的表,它似乎就不起作用了(我得到的选择将检查超过MAX_JOIN_SIZE行;检查您的WHERE并使用SET sql_big_select =1或设置MAX_JOIN_SIZE=#,如果我尝试运行它时选择是正确的)。所以我将连接查询限制为只考虑重复的行,最后得到:

DELETE a FROM penguins a
    LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
        FROM penguins
        GROUP BY deviceId HAVING num > 1) b
        ON a.baz != b.keepBaz
        AND a.foo = b.foo
    WHERE b.foo IS NOT NULL

The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.

在本例中,WHERE子句允许MySQL忽略没有重复的行,如果这是重复的第一个实例,那么也将忽略,因此只有后续的重复将被忽略。将MIN(baz)改为MAX(baz)以保留最后一个实例而不是第一个实例。

#15


0  

This works for large tables:

这适用于大型表格:

 CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;

 DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;

To delete oldest change max(id) to min(id)

将最老的更改max(id)删除为min(id)

#16


0  

This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.

这将使列column_name成为主键,同时忽略所有错误。因此,它将删除具有column_name重复值的行。

ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);

#17


0  

Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid to handle big data sources (with examples for different use cases).

在MySQL表上删除副本是一个常见问题,通常需要特定的需求。如果有人感兴趣,这里(删除MySQL中的重复行)我将解释如何使用一个临时表以一种可靠且快速的方式删除MySQL重复,这对于处理大数据源也是有效的(对于不同的用例有示例)。

Ali, in your case, you can run something like this:

阿里,在你的情况下,你可以这样运行:

-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;

#18


0  

I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

我发现上面的Werner解决方案是最方便的,因为不管主键的存在与否,它都可以工作,不会对表造成混乱,使用的是不会过时的纯sql,这是可以理解的。

As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

正如我在评论中所指出的,这个解决方案还没有得到恰当的解释。这是我的,基于它。

1) add a new boolean column

1)添加一个新的布尔列

alter table mytable add tokeep boolean;

2) add a constraint on the duplicated columns AND the new column

2)在重复列和新列上添加约束

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);

3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

3)将布尔列设为true。由于新的约束,这只会在一个重复的行上成功

update ignore mytable set tokeep = true;

4) delete rows that have not been marked as tokeep

4)删除未标记为tokeep的行。

delete from mytable where tokeep is null;

5) drop the added column

5)删除添加的列

alter table mytable drop tokeep;

I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

我建议您保留所添加的约束,以便将来避免新的重复。

#19


0  

I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.

我认为这可以通过复制表并清空它,然后只将不同的值放回表中来实现,但请在对大量数据进行检查之前再次检查它。

Creates a carbon copy of your table

创建您的表的副本

create table temp_table like oldtablename; insert temp_table select * from oldtablename;

创建表temp_table,如oldtablename;从oldtablename中插入temp_table select *;

Empties your original table

清空你的原始表

DELETE * from oldtablename;

删除从oldtablename *;

Copies all distinct values from the copied table back to your original table

将复制表中的所有不同值复制回原始表

INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob

通过firstname、lastname、dob从temp_table组插入oldtablename SELECT *

Deletes your temp table.

删除临时表。

Drop Table temp_table

删除表temp_table

You need to group by aLL fields that you want to keep distinct.

您需要将所有希望保持不同的字段分组。

#20


-2  

You could just use a DISTINCT clause to select the "cleaned up" list (and here is a very easy example on how to do that).

您可以使用一个不同的子句来选择“清理”列表(这里有一个非常简单的示例说明如何进行此操作)。

#21


-3  

Could it work if you count them, and then add a limit to your delete query leaving just one?

如果您对它们进行计数,然后在删除查询中添加一个限制,只留下一个,是否可以?

For example, if you have two or more, write your query like this:

例如,如果您有两个或两个以上的查询,可以这样编写查询:

DELETE FROM table WHERE SID = 1 LIMIT 1;

#22


-5  

There are just a few basic steps when removing duplicate data from your table:

从表中删除重复数据时,只有几个基本步骤:

  • Back up your table!
  • 备份你的表!
  • Find the duplicate rows
  • 发现重复的行
  • Remove the duplicate rows
  • 删除重复的行

Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473

这里有完整的教程:https://blog.teamsql.io/deleting-duplicate- 35data - 41485b3473