I've seen a number of variations on this but nothing quite matches what I'm trying to accomplish.
我已经看到了一些变化,但是没有什么与我想要完成的事情相匹配。
I have a table, TableA
, which contain the answers given by users to configurable questionnaires. The columns are member_id, quiz_num, question_num, answer_num
.
我有一个表格TableA,它包含了用户对可配置问卷的回答。列是member_id、quiz_num、question_num、answer_num。
Somehow a few members got their answers submitted twice. So I need to remove the duplicated records, but make sure that one row is left behind.
不知怎的,一些会员得到了两次答复。所以我需要删除重复的记录,但是要确保留下一行。
There is no primary column so there could be two or three rows all with the exact same data.
没有主列,所以可能有两到三行都有相同的数据。
Is there a query to remove all the duplicates?
是否有查询来删除所有的副本?
8 个解决方案
#1
100
Add Unique Index on your table:
在你的表格上添加唯一索引:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
另一种方法是:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
在表中添加主键,然后可以使用以下查询从表中删除重复的内容:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
#2
13
Instead of drop table TableA
, you could delete all registers (delete from TableA;
) and then populate original table with registers coming from TableA_Verify (insert into TAbleA select * from TAbleA_Verify
). In this way you won't lost all references to original table (indexes,... )
可以删除所有寄存器(从TableA中删除;),然后用来自TableA_Verify的寄存器填充原始表(从TableA_Verify中插入到TableA select *中)。这样,您就不会丢失对原始表的所有引用(索引,…)
CREATE TABLE TableA_Verify AS SELECT DISTINCT * FROM TableA;
创建表TableA_Verify AS SELECT DISTINCT * FROM TableA;
DELETE FROM TableA;
删除从为多;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
从TAbleA_Verify中插入到TableA SELECT *中;
DROP TABLE TableA_Verify;
删除表TableA_Verify;
#3
12
This doesn't use TEMP Tables, but real tables instead. If the problem is just about temp tables and not about table creation or dropping tables, this will work:
这不是使用临时表,而是实际的表。如果问题只是关于临时表而不是表创建或删除表,那么这将起作用:
SELECT DISTINCT * INTO TableA_Verify FROM TableA;
DROP TABLE TableA;
RENAME TABLE TableA_Verify TO TableA;
#4
6
Thanks to jveirasv for the answer above.
感谢jveirasv的回答。
If you need to remove duplicates of a specific sets of column, you can use this (if you have a timestamp in the table that vary for example)
如果需要删除特定列集的副本,可以使用这个(例如,如果表中有一个变化的时间戳)
CREATE TABLE TableA_Verify AS SELECT * FROM TableA WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
创建表TableA_Verify AS SELECT * FROM TableA,其中1 GROUP BY [COLUMN]删除重复;
DELETE FROM TableA;
删除从为多;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
从TAbleA_Verify中插入到TableA SELECT *中;
DROP TABLE TableA_Verify;
删除表TableA_Verify;
#5
6
Add Unique Index on your table:
在你的表格上添加唯一索引:
ALTER IGNORE TABLE TableA
ADD UNIQUE INDEX (member_id, quiz_num, question_num, answer_num);
is work very well
是工作很好
#6
2
If you are not using any primary key, then execute following queries at one single stroke. By replacing values:
如果您不使用任何主键,那么一次执行以下查询。通过替换值:
# table_name - Your Table Name
# column_name_of_duplicates - Name of column where duplicate entries are found
create table table_name_temp like table_name;
insert into table_name_temp select distinct(column_name_of_duplicates),value,type from table_name group by column_name_of_duplicates;
delete from table_name;
insert into table_name select * from table_name_temp;
drop table table_name_temp
- create temporary table and store distinct(non duplicate) values
- 创建临时表并存储不同的(非重复的)值。
- make empty original table
- 做空的原始表
- insert values to original table from temp table
- 从临时表向原始表插入值
- delete temp table
- 删除临时表
It is always advisable to take backup of database before you play with it.
在使用数据库之前,最好先进行备份。
#7
0
As noted in the comments, the query in Saharsh Shah's answer must be run multiple times if items are duplicated more than once.
如注释中所指出的,如果项目重复多次,则Saharsh Shah的回答中的查询必须运行多次。
Here's a solution that doesn't delete any data, and keeps the data in the original table the entire time, allowing for duplicates to be deleted while keeping the table 'live':
这里有一个不删除任何数据的解决方案,并将数据保存在原始表中,允许在保留表“live”的同时删除重复的数据:
alter table tableA add column duplicate tinyint(1) not null default '0';
update tableA set
duplicate=if(@member_id=member_id
and @quiz_num=quiz_num
and @question_num=question_num
and @answer_num=answer_num,1,0),
member_id=(@member_id:=member_id),
quiz_num=(@quiz_num:=quiz_num),
question_num=(@question_num:=question_num),
answer_num=(@answer_num:=answer_num)
order by member_id, quiz_num, question_num, answer_num;
delete from tableA where duplicate=1;
alter table tableA drop column duplicate;
This basically checks to see if the current row is the same as the last row, and if it is, marks it as duplicate (the order statement ensures that duplicates will show up next to each other). Then you delete the duplicate records. I remove the duplicate
column at the end to bring it back to its original state.
这将基本检查当前行是否与最后一行相同,如果是,则将其标记为duplicate(顺序语句确保重复出现在彼此的旁边)。然后删除重复的记录。我在末尾删除重复的列以使它恢复到原始状态。
It looks like alter table ignore
also might go away soon: http://dev.mysql.com/worklog/task/?id=7395
看起来alter table ignore也可能很快消失:http://dev.mysql.com/worklog/task/?id=7395
#8
0
An alternative way would be to create a new temporary table with same structure.
另一种方法是创建具有相同结构的新临时表。
CREATE TABLE temp_table AS SELECT * FROM original_table LIMIT 0
Then create the primary key in the table.
然后在表中创建主键。
ALTER TABLE temp_table ADD PRIMARY KEY (primary-key-field)
Finally copy all records from the original table while ignoring the duplicate records.
最后从原始表中复制所有记录,同时忽略重复的记录。
INSERT IGNORE INTO temp_table AS SELECT * FROM original_table
Now you can delete the original table and rename the new table.
现在可以删除原始表并重命名新表。
DROP TABLE original_table
RENAME TABLE temp_table TO original_table
#1
100
Add Unique Index on your table:
在你的表格上添加唯一索引:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
另一种方法是:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
在表中添加主键,然后可以使用以下查询从表中删除重复的内容:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
#2
13
Instead of drop table TableA
, you could delete all registers (delete from TableA;
) and then populate original table with registers coming from TableA_Verify (insert into TAbleA select * from TAbleA_Verify
). In this way you won't lost all references to original table (indexes,... )
可以删除所有寄存器(从TableA中删除;),然后用来自TableA_Verify的寄存器填充原始表(从TableA_Verify中插入到TableA select *中)。这样,您就不会丢失对原始表的所有引用(索引,…)
CREATE TABLE TableA_Verify AS SELECT DISTINCT * FROM TableA;
创建表TableA_Verify AS SELECT DISTINCT * FROM TableA;
DELETE FROM TableA;
删除从为多;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
从TAbleA_Verify中插入到TableA SELECT *中;
DROP TABLE TableA_Verify;
删除表TableA_Verify;
#3
12
This doesn't use TEMP Tables, but real tables instead. If the problem is just about temp tables and not about table creation or dropping tables, this will work:
这不是使用临时表,而是实际的表。如果问题只是关于临时表而不是表创建或删除表,那么这将起作用:
SELECT DISTINCT * INTO TableA_Verify FROM TableA;
DROP TABLE TableA;
RENAME TABLE TableA_Verify TO TableA;
#4
6
Thanks to jveirasv for the answer above.
感谢jveirasv的回答。
If you need to remove duplicates of a specific sets of column, you can use this (if you have a timestamp in the table that vary for example)
如果需要删除特定列集的副本,可以使用这个(例如,如果表中有一个变化的时间戳)
CREATE TABLE TableA_Verify AS SELECT * FROM TableA WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
创建表TableA_Verify AS SELECT * FROM TableA,其中1 GROUP BY [COLUMN]删除重复;
DELETE FROM TableA;
删除从为多;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
从TAbleA_Verify中插入到TableA SELECT *中;
DROP TABLE TableA_Verify;
删除表TableA_Verify;
#5
6
Add Unique Index on your table:
在你的表格上添加唯一索引:
ALTER IGNORE TABLE TableA
ADD UNIQUE INDEX (member_id, quiz_num, question_num, answer_num);
is work very well
是工作很好
#6
2
If you are not using any primary key, then execute following queries at one single stroke. By replacing values:
如果您不使用任何主键,那么一次执行以下查询。通过替换值:
# table_name - Your Table Name
# column_name_of_duplicates - Name of column where duplicate entries are found
create table table_name_temp like table_name;
insert into table_name_temp select distinct(column_name_of_duplicates),value,type from table_name group by column_name_of_duplicates;
delete from table_name;
insert into table_name select * from table_name_temp;
drop table table_name_temp
- create temporary table and store distinct(non duplicate) values
- 创建临时表并存储不同的(非重复的)值。
- make empty original table
- 做空的原始表
- insert values to original table from temp table
- 从临时表向原始表插入值
- delete temp table
- 删除临时表
It is always advisable to take backup of database before you play with it.
在使用数据库之前,最好先进行备份。
#7
0
As noted in the comments, the query in Saharsh Shah's answer must be run multiple times if items are duplicated more than once.
如注释中所指出的,如果项目重复多次,则Saharsh Shah的回答中的查询必须运行多次。
Here's a solution that doesn't delete any data, and keeps the data in the original table the entire time, allowing for duplicates to be deleted while keeping the table 'live':
这里有一个不删除任何数据的解决方案,并将数据保存在原始表中,允许在保留表“live”的同时删除重复的数据:
alter table tableA add column duplicate tinyint(1) not null default '0';
update tableA set
duplicate=if(@member_id=member_id
and @quiz_num=quiz_num
and @question_num=question_num
and @answer_num=answer_num,1,0),
member_id=(@member_id:=member_id),
quiz_num=(@quiz_num:=quiz_num),
question_num=(@question_num:=question_num),
answer_num=(@answer_num:=answer_num)
order by member_id, quiz_num, question_num, answer_num;
delete from tableA where duplicate=1;
alter table tableA drop column duplicate;
This basically checks to see if the current row is the same as the last row, and if it is, marks it as duplicate (the order statement ensures that duplicates will show up next to each other). Then you delete the duplicate records. I remove the duplicate
column at the end to bring it back to its original state.
这将基本检查当前行是否与最后一行相同,如果是,则将其标记为duplicate(顺序语句确保重复出现在彼此的旁边)。然后删除重复的记录。我在末尾删除重复的列以使它恢复到原始状态。
It looks like alter table ignore
also might go away soon: http://dev.mysql.com/worklog/task/?id=7395
看起来alter table ignore也可能很快消失:http://dev.mysql.com/worklog/task/?id=7395
#8
0
An alternative way would be to create a new temporary table with same structure.
另一种方法是创建具有相同结构的新临时表。
CREATE TABLE temp_table AS SELECT * FROM original_table LIMIT 0
Then create the primary key in the table.
然后在表中创建主键。
ALTER TABLE temp_table ADD PRIMARY KEY (primary-key-field)
Finally copy all records from the original table while ignoring the duplicate records.
最后从原始表中复制所有记录,同时忽略重复的记录。
INSERT IGNORE INTO temp_table AS SELECT * FROM original_table
Now you can delete the original table and rename the new table.
现在可以删除原始表并重命名新表。
DROP TABLE original_table
RENAME TABLE temp_table TO original_table