I have a tables that has a lot of data about 8 Million and running in MySQL. So, what I need to do is to delete all the duplicates and retain only the first entry or value in the date column, but there's a different conditions after that. I will make a sample data below:
我有一个表有大量关于800万的数据并在MySQL中运行。所以,我需要做的是删除所有重复项并仅保留日期列中的第一个条目或值,但之后有不同的条件。我将在下面制作一个示例数据:
Columns are source, destination and date, I need to delete all the source and destination that has the same value and retain the earliest date entry, but for example the destination change and then after few minutes the destination values going back to the values that is same with what i need to delete, that values should not be deleted.
列是源,目标和日期,我需要删除具有相同值的所有源和目标并保留最早的日期条目,但是例如目标更改,然后在几分钟之后目标值返回到值与我需要删除的内容相同,不应删除这些值。
Source Destination Datetime
1 2 2017-01-01 23:45:46
1 2 2017-01-01 23:46:46-this should be deleted
1 3 2017-01-01 23:47:46
1 2 2017-01-01 23:48:46-but not this one as the value for destination is changed from 3 to 2.
So, although the values of the destination at the last entry is the same with what on the first and second entry, that row or values should not be deleted as there's a changed happened on or from the third row or values.
因此,尽管最后一个条目的目标值与第一个和第二个条目的值相同,但不应删除该行,因为第三行或值发生了更改。
1 个解决方案
#1
1
Use user-defined variables to hold the Source
and Destination
from the previous row. When they're the same as the current row, output the Datetime
in the result of the subquery. Then join this with the original table to get the rows to delete.
使用用户定义的变量来保存前一行的Source和Destination。当它们与当前行相同时,在子查询的结果中输出Datetime。然后将其与原始表连接以获取要删除的行。
DELETE t1.*
FROM yourTable AS t1
JOIN (SELECT source, destination, IF(@prevSource = source AND @prevDest = destination, datetime, NULL) AS datetime, @prevSource := source, @prevDest := destination
FROM yourTable
ORDER BY datetime) AS t2
ON t1.source = t2.source AND t1.destination = t2.destination AND t1.datetime = t2.datetime
CROSS JOIN (select @prevSource := null, @prevDest := null) AS vars
If your table has a unique ID column, you could return that from the subquery instead of source
, destination
, and datetime
, and join on that instead, which should be more efficient.
如果您的表具有唯一的ID列,则可以从子查询而不是源,目标和日期时间返回该列,并在其上加入,这应该更有效。
#1
1
Use user-defined variables to hold the Source
and Destination
from the previous row. When they're the same as the current row, output the Datetime
in the result of the subquery. Then join this with the original table to get the rows to delete.
使用用户定义的变量来保存前一行的Source和Destination。当它们与当前行相同时,在子查询的结果中输出Datetime。然后将其与原始表连接以获取要删除的行。
DELETE t1.*
FROM yourTable AS t1
JOIN (SELECT source, destination, IF(@prevSource = source AND @prevDest = destination, datetime, NULL) AS datetime, @prevSource := source, @prevDest := destination
FROM yourTable
ORDER BY datetime) AS t2
ON t1.source = t2.source AND t1.destination = t2.destination AND t1.datetime = t2.datetime
CROSS JOIN (select @prevSource := null, @prevDest := null) AS vars
If your table has a unique ID column, you could return that from the subquery instead of source
, destination
, and datetime
, and join on that instead, which should be more efficient.
如果您的表具有唯一的ID列,则可以从子查询而不是源,目标和日期时间返回该列,并在其上加入,这应该更有效。