MySQL删除连续行中的重复项

时间:2023-02-14 04:18:21

Suppose this table:

假设这个表:

ID ColA ColB
1   7    8
2   7    9
3   7    9
4   5    8
5   6    9
6   6    9
7   5    4

The PK is the ID coumn. Now, I want to delete all duplicates of ColA and ColB in consecutive rows.

PK是ID coumn。现在,我想在连续的行中删除所有重复的ColA和ColB。

In this example rows 2,3 and 5,6 contain duplicates. These shall be removed so that the higher ID is remained.

在此示例中,行2,3和5,6包含重复项。应删除这些,以便保留较高的ID。

The output should be:

输出应该是:

ID ColA ColB
1   7    8

3   7    9
4   5    8

6   6    9
7   5    4

How can this be done with mySQL?

怎样才能用mySQL完成?

Thanks, Juergen

4 个解决方案

#1


1  

select ID from MyTable m1 where 0 < (select count(*) from MyTable m2 where m2.ID = m1.ID - 1 and m2.ColA = m1.ColA and m2.ColB = m1.ColB)

and then you can use a

然后你可以用一个

delete from MyTable where ID in ...

query. This way it would surely work in any version.

查询。这样它肯定适用于任何版本。

#2


2  

CREATE TEMPORARY TABLE duplicates (id int primary key)

INSERT INTO duplicates (id)
    SELECT t1.id
      FROM table t1
      join table t2 on t2.id = t1.id + 1
     WHERE t1.ColA = t2.ColA
       and t1.ColB = t2.ColB

-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
  FROM table
  join duplicates on table.id = duplicates.id

#3


0  

I've called the first table 'test'.

我称第一个表'测试'。

Firstly create a table that will hold all the identical combinations of ColA and ColB:

首先创建一个表,它将包含ColA和ColB的所有相同组合:

create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;

Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):

现在,为原始表中的每个相同的ColA和ColB组合选择最大ID。将它放入一个新表(称为idsToKeep,因为这些是我们不想删除的行):

create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;

Finally, delete all the entries from the original table that are not in the idsToKeep table:

最后,删除原始表中不在idsToKeep表中的所有条目:

delete from test where ID <> all (select ID from idsToKeep);

#4


-1  

Depending on how many records you have, this might not be the most efficient:

根据您拥有的记录数量,这可能不是最有效的:

SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
      FROM  table) m

There might be syntax errors because I usually use mssql, but the idea should be similar.

可能存在语法错误,因为我通常使用mssql,但这个想法应该是类似的。

#1


1  

select ID from MyTable m1 where 0 < (select count(*) from MyTable m2 where m2.ID = m1.ID - 1 and m2.ColA = m1.ColA and m2.ColB = m1.ColB)

and then you can use a

然后你可以用一个

delete from MyTable where ID in ...

query. This way it would surely work in any version.

查询。这样它肯定适用于任何版本。

#2


2  

CREATE TEMPORARY TABLE duplicates (id int primary key)

INSERT INTO duplicates (id)
    SELECT t1.id
      FROM table t1
      join table t2 on t2.id = t1.id + 1
     WHERE t1.ColA = t2.ColA
       and t1.ColB = t2.ColB

-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
  FROM table
  join duplicates on table.id = duplicates.id

#3


0  

I've called the first table 'test'.

我称第一个表'测试'。

Firstly create a table that will hold all the identical combinations of ColA and ColB:

首先创建一个表,它将包含ColA和ColB的所有相同组合:

create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;

Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):

现在,为原始表中的每个相同的ColA和ColB组合选择最大ID。将它放入一个新表(称为idsToKeep,因为这些是我们不想删除的行):

create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;

Finally, delete all the entries from the original table that are not in the idsToKeep table:

最后,删除原始表中不在idsToKeep表中的所有条目:

delete from test where ID <> all (select ID from idsToKeep);

#4


-1  

Depending on how many records you have, this might not be the most efficient:

根据您拥有的记录数量,这可能不是最有效的:

SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
      FROM  table) m

There might be syntax errors because I usually use mssql, but the idea should be similar.

可能存在语法错误,因为我通常使用mssql,但这个想法应该是类似的。