I have the following SQL to delete duplicate values form a table,
我有以下SQL从表中删除重复值,
DELETE p1
FROM `ProgramsList` p1, `ProgramsList` p2
WHERE p1.CustId = p2.CustId
AND p1.CustId = 1
AND p1.`Id`>p2.`Id`
AND p1.`ProgramName` = p2.`ProgramName`;
Id
is auto incremental
for a given CustId
ProgramName
must be unique (currently it is not)
The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records
Id是给定CustId的自动增量ProgramName必须是唯一的(目前不是)上述SQL需要大约4到5个小时才能完成,大约有1,000,000条记录
Could anyone suggest a quicker way of deleting duplicates from a table?
有人能建议更快捷地从表中删除重复项吗?
2 个解决方案
#1
0
First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.
首先,您可以尝试将索引添加到ProgramName和CustID字段(如果您还没有它们)。
De-Duping
You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.
您可以将记录分组以识别欺骗,并在执行此操作时,获取每个组的最小ID值。然后,只删除ID不是MinID之一的所有记录。
In-Clause Method
delete from
ProgramsList
where
id not in
(select min(id) as MinID
from ProgramsList
group by ProgramName, CustID)
Join-Method
You may have to run this more than once, if there are many members per group.
如果每个组有许多成员,则可能必须多次运行此操作。
DELETE P
FROM ProgramsList as P
INNER JOIN
(select count(*) as Count, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2
Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.
有些人在使用In-Clause时会遇到性能问题,有些则没有。这很大程度上取决于您的索引等。如果一个太慢,请尝试另一个。
Related: https://*.com/a/4192849/127880
#2
0
This will remove all the duplicates in one go.
这将一次性删除所有重复项。
From the inner query an ID is got which is not deleted and the rest is deleted for each of the program.
从内部查询中获取一个ID,该ID不会被删除,其余的将被删除。
delete p from ProgramsList as p
INNER JOIN (select ProgramName as Pname, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID order by null) as A on Pname=P.ProgramName
where A.MaxID != P.id
#1
0
First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.
首先,您可以尝试将索引添加到ProgramName和CustID字段(如果您还没有它们)。
De-Duping
You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.
您可以将记录分组以识别欺骗,并在执行此操作时,获取每个组的最小ID值。然后,只删除ID不是MinID之一的所有记录。
In-Clause Method
delete from
ProgramsList
where
id not in
(select min(id) as MinID
from ProgramsList
group by ProgramName, CustID)
Join-Method
You may have to run this more than once, if there are many members per group.
如果每个组有许多成员,则可能必须多次运行此操作。
DELETE P
FROM ProgramsList as P
INNER JOIN
(select count(*) as Count, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2
Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.
有些人在使用In-Clause时会遇到性能问题,有些则没有。这很大程度上取决于您的索引等。如果一个太慢,请尝试另一个。
Related: https://*.com/a/4192849/127880
#2
0
This will remove all the duplicates in one go.
这将一次性删除所有重复项。
From the inner query an ID is got which is not deleted and the rest is deleted for each of the program.
从内部查询中获取一个ID,该ID不会被删除,其余的将被删除。
delete p from ProgramsList as p
INNER JOIN (select ProgramName as Pname, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID order by null) as A on Pname=P.ProgramName
where A.MaxID != P.id