Possible Duplicate:
How can I remove duplicate rows?
Remove duplicates using only a MySQL query?可能重复:如何删除重复的行?仅使用MySQL查询删除重复项?
I have a large table with ~14M entries. The table type is MyISAM ans not InnoDB.
我有一个大约14M条目的大桌子。表类型是MyISAM而不是InnoDB。
Unfortunately, I have some duplicate entries in this table that I found with the following request :
不幸的是,我在这个表中有一些重复的条目,我发现有以下请求:
SELECT device_serial, temp, tstamp, COUNT(*) c FROM up_logs GROUP BY device_serial, temp, tstamp HAVING c > 1
To avoid these duplicates in the future, I want to convert my current index to a unique constraint using SQL request :
为了避免将来出现这些重复,我想使用SQL请求将当前索引转换为唯一约束:
ALTER TABLE up_logs DROP INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL,
ALTER TABLE up_logs ADD INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL ( `tstamp` , `device_serial` )
But before that, I need to clean up my duplicates!
但在此之前,我需要清理我的副本!
My question is : How can I keep only one entry of my duplicated entries? Keep in mind that my table contain 14M entries, so I would like avoid loops if it is possible.
我的问题是:如何只保留一个重复条目的条目?请记住,我的表包含14M条目,所以如果可能的话我想避免循环。
Any comments are welcome!
欢迎任何评论!
3 个解决方案
#1
4
Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.
在您需要具有唯一身份的列上创建新的唯一键将自动清除任何重复项的表。
ALTER IGNORE TABLE `table_name`
ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);
The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.
IGNORE部分不允许脚本在第一个错误发生后终止。默认行为是删除重复项。
#2
4
Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:
由于MySQL允许在更新/删除语句中使用子查询,但是如果它们引用您要更新的表,则不会首先创建原始表的副本。然后:
DELETE FROM original_table
WHERE id NOT IN(
SELECT id FROM copy_table
GROUP BY column1, column2, ...
);
But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:
但我可以想象复制一个包含14M条目的表需要一些时间...选择要在复制时保留的项目可能会使其更快:
INSERT INTO copy_table
SELECT * FROM original_table
GROUP BY column1, column2, ...;
and then
接着
DELETE FROM original_table
WHERE id IN(
SELECT id FROM copy_table
);
It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)
自从我上次使用MySQL和SQL以来已经有一段时间了,所以我确信有一些性能更好的东西 - 但这应该有效;)
#3
1
This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID
and I want to delete the rows with repeated first_name
这是你删除重复行的方法......我会写你的例子,你需要申请你的代码。我有带ID的Actors表,我想删除重复first_name的行
mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name |
+----------+-------------+
| 1 | PENELOPE |
| 2 | NICK |
| 3 | ED |
....
| 199 | JULIA |
| 200 | THORA |
+----------+-------------+
200 rows in set (0.00 sec)
-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).
- 如果下一行具有相同的first_name,则使用名为@a的变量来获取ID(重复,如果不是,则返回null)。
mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
| first_names | @a:=first_name |
+---------------+----------------+
| NULL | ADAM |
| 71 | ADAM |
| NULL | AL |
| NULL | ALAN |
| NULL | ALBERT |
| 125 | ALBERT |
| NULL | ALEC |
| NULL | ANGELA |
| 144 | ANGELA |
...
| NULL | WILL |
| NULL | WILLIAM |
| NULL | WOODY |
| 28 | WOODY |
| NULL | ZERO |
+---------------+----------------+
200 rows in set (0.00 sec)
-Now we can get only duplicates ID:
- 现在我们只能得到重复的ID:
mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
| NULL |
| 71 |
| NULL |
...
| 28 |
| NULL |
+-------------+
200 rows in set (0.00 sec)
-the Final Step, Lets DELETE!
- 最后一步,让我们删除!
mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)
-Now lets check our table:
- 现在让我们检查一下表:
mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
| 1 |
| 1 |
| 1 |
...
| 1 |
+----------+
128 rows in set (0.00 sec)
it works, if you have any question write me back
它有用,如果你有任何问题请写回来
#1
4
Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.
在您需要具有唯一身份的列上创建新的唯一键将自动清除任何重复项的表。
ALTER IGNORE TABLE `table_name`
ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);
The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.
IGNORE部分不允许脚本在第一个错误发生后终止。默认行为是删除重复项。
#2
4
Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:
由于MySQL允许在更新/删除语句中使用子查询,但是如果它们引用您要更新的表,则不会首先创建原始表的副本。然后:
DELETE FROM original_table
WHERE id NOT IN(
SELECT id FROM copy_table
GROUP BY column1, column2, ...
);
But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:
但我可以想象复制一个包含14M条目的表需要一些时间...选择要在复制时保留的项目可能会使其更快:
INSERT INTO copy_table
SELECT * FROM original_table
GROUP BY column1, column2, ...;
and then
接着
DELETE FROM original_table
WHERE id IN(
SELECT id FROM copy_table
);
It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)
自从我上次使用MySQL和SQL以来已经有一段时间了,所以我确信有一些性能更好的东西 - 但这应该有效;)
#3
1
This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID
and I want to delete the rows with repeated first_name
这是你删除重复行的方法......我会写你的例子,你需要申请你的代码。我有带ID的Actors表,我想删除重复first_name的行
mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name |
+----------+-------------+
| 1 | PENELOPE |
| 2 | NICK |
| 3 | ED |
....
| 199 | JULIA |
| 200 | THORA |
+----------+-------------+
200 rows in set (0.00 sec)
-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).
- 如果下一行具有相同的first_name,则使用名为@a的变量来获取ID(重复,如果不是,则返回null)。
mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
| first_names | @a:=first_name |
+---------------+----------------+
| NULL | ADAM |
| 71 | ADAM |
| NULL | AL |
| NULL | ALAN |
| NULL | ALBERT |
| 125 | ALBERT |
| NULL | ALEC |
| NULL | ANGELA |
| 144 | ANGELA |
...
| NULL | WILL |
| NULL | WILLIAM |
| NULL | WOODY |
| 28 | WOODY |
| NULL | ZERO |
+---------------+----------------+
200 rows in set (0.00 sec)
-Now we can get only duplicates ID:
- 现在我们只能得到重复的ID:
mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
| NULL |
| 71 |
| NULL |
...
| 28 |
| NULL |
+-------------+
200 rows in set (0.00 sec)
-the Final Step, Lets DELETE!
- 最后一步,让我们删除!
mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)
-Now lets check our table:
- 现在让我们检查一下表:
mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
| 1 |
| 1 |
| 1 |
...
| 1 |
+----------+
128 rows in set (0.00 sec)
it works, if you have any question write me back
它有用,如果你有任何问题请写回来