如何删除表中的重复行

时间:2021-08-07 09:20:48

I have a table with say 3 columns. There's no primary key so there can be duplicate rows. I need to just keep one and delete the others. Any idea how to do this is Sql Server?

我有一个有3列的表。没有主键,所以可以有重复的行。我只需要保留一个,然后删除其他的。知道怎么做是Sql Server吗?

13 个解决方案

#1


23  

I'd SELECT DISTINCT the rows and throw them into a temporary table, then drop the source table and copy back the data from the temp. EDIT: now with code snippet!

我将选择不同的行并将它们放入临时表中,然后删除源表并从临时表中复制数据。

INSERT INTO TABLE_2 
SELECT DISTINCT * FROM TABLE_1
GO
DELETE FROM TABLE_1
GO
INSERT INTO TABLE_1
SELECT * FROM TABLE_2
GO

#2


7  

Add an identity column to act as a surrogate primary key, and use this to identify two of the three rows to be deleted.

添加标识列作为代理主键,并使用它标识要删除的三行中的两行。

I would consider leaving the identity column in place afterwards, or if this is some kind of link table, create a compound primary key on the other columns.

我将考虑在之后保留标识列,或者如果这是某种链接表,则在其他列上创建复合主键。

#3


7  

The following example works as well when your PK is just a subset of all table columns.

当您的PK仅仅是所有表列的子集时,下面的示例也同样有效。

(Note: I like the approach with inserting another surrogate id column more. But maybe this solution comes handy as well.)

(注意:我更喜欢插入另一个代理id列的方法。但或许这个解决方案也能派上用场。

First find the duplicate rows:

首先找到重复的行:

SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

If there are only few, you can delete them manually:

如果只有少数,你可以手动删除:

set rowcount 1
delete from t1
where col1=1 and col2=1

The value of "rowcount" should be n-1 times the number of duplicates. In this example there are 2 dulpicates, therefore rowcount is 1. If you get several duplicate rows, you have to do this for every unique primary key.

“rowcount”的值应该是n-1乘以重复的数量。在本例中有两个dulpicates,因此rowcount是1。如果您得到多个重复的行,那么您必须对每个惟一的主键执行此操作。

If you have many duplicates, then copy every key once into anoher table:

如果你有很多副本,那么把每一把钥匙都拷贝到另一张桌子上:

SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

Then copy the keys, but eliminate the duplicates.

然后复制密钥,但是删除副本。

SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2

In your keys you have now unique keys. Check if you don't get any result:

在你的钥匙中现在有唯一的钥匙。检查是否没有结果:

SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2

Delete the duplicates from the original table:

从原始表格中删除副本:

DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2

Insert the original rows:

插入原始行:

INSERT t1 SELECT * FROM holddups

btw and for completeness: In Oracle there is a hidden field you could use (rowid):

顺便说一句,为了完整性:在Oracle中有一个隐藏字段可以使用(rowid):

DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3... ;

see: Microsoft Knowledge Site

看到:微软知识的网站

#4


4  

Here's the method I used when I asked this question -

这是我问这个问题时使用的方法

DELETE MyTable 
FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

#5


4  

This is a way to do it with Common Table Expressions, CTE. It involves no loops, no new columns or anything and won't cause any unwanted triggers to fire (due to deletes+inserts).

这是用通用表表达式CTE实现的一种方法。它不包含任何循环、新列或其他内容,不会引发任何不想要的触发器(由于删除+插入)。

Inspired by this article.

灵感来自这篇文章。

CREATE TABLE #temp (i INT)

INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (2)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (4)

SELECT * FROM #temp

;
WITH [#temp+rowid] AS
(SELECT ROW_NUMBER() OVER (ORDER BY i ASC) AS ROWID, * FROM #temp)
DELETE FROM [#temp+rowid] WHERE rowid IN 
(SELECT MIN(rowid) FROM [#temp+rowid] GROUP BY i HAVING COUNT(*) > 1)

SELECT * FROM #temp

DROP TABLE #temp   

#6


2  

This is a tough situation to be in. Without knowing your particular situation (table size etc) I think that your best shot is to add an identity column, populate it and then delete according to it. You may remove the column later but I would suggest that you should keep it as it is really a good thing to have in the table

这是一个艰难的处境。在不了解您的具体情况(表大小等)的情况下,我认为您最好的方法是添加标识列,填充它,然后根据它删除。稍后您可以删除该列,但我建议您保留它,因为在表中有它确实是一件好事

#7


0  

After you clean up the current mess you could add a primary key that includes all the fields in the table. that will keep you from getting into the mess again. Of course this solution could very well break existing code. That will have to be handled as well.

清理完当前的混乱之后,您可以添加一个主键,其中包含表中的所有字段。那将使你避免再次陷入混乱。当然,这个解决方案很可能会破坏现有的代码。这也需要处理。

#8


0  

Can you add a primary key identity field to the table?

是否可以将主键标识字段添加到表中?

#9


0  

Manrico Corazzi - I specialize in Oracle, not MS SQL, so you'll have to tell me if this is possible as a performance boost:-

Manrico Corazzi——我专门研究Oracle,不是SQL女士,所以你必须告诉我,这是否能促进性能:-

  1. Leave the same as your first step - insert distinct values into TABLE2 from TABLE1.
  2. 第一步保持不变——将表1中的不同值插入表2。
  3. Drop TABLE1. (Drop should be faster than delete I assume, much as truncate is faster than delete).
  4. 表1。(我认为Drop应该比delete快,截断比delete快)。
  5. Rename TABLE2 as TABLE1 (saves you time, as you're renaming an object rather than copying data from one table to another).
  6. 将TABLE2重命名为TABLE1(节省您的时间,因为您正在重命名一个对象,而不是将数据从一个表复制到另一个表)。

#10


0  

Here's another way, with test data

这是另一种方法,使用测试数据

create table #table1 (colWithDupes1 int, colWithDupes2 int)
insert into #table1
(colWithDupes1, colWithDupes2)
Select 1, 2 union all
Select 1, 2 union all
Select 2, 2 union all
Select 3, 4 union all
Select 3, 4 union all
Select 3, 4 union all
Select 4, 2 union all
Select 4, 2 


select * from #table1

set rowcount 1
select 1

while @@rowcount > 0
delete #table1  where 1 < (select count(*) from #table1 a2 
   where #table1.colWithDupes1 = a2.colWithDupes1
and #table1.colWithDupes2 = a2.colWithDupes2
)

set rowcount 0

select * from #table1

#11


0  

What about this solution :

这个解决方案怎么样:

First you execute the following query :

首先执行以下查询:

  select 'set rowcount ' + convert(varchar,COUNT(*)-1) + ' delete from MyTable where field=''' + field +'''' + ' set rowcount 0'  from mytable group by field having COUNT(*)>1

And then you just have to execute the returned result set

然后你只需要执行返回的结果集

set rowcount 3 delete from Mytable where field='foo' set rowcount 0
....
....
set rowcount 5 delete from Mytable where field='bar' set rowcount 0

I've handled the case when you've got only one column, but it's pretty easy to adapt the same approach tomore than one column. Let me know if you want me to post the code.

当您只有一个列时,我已经处理过这种情况,但是要将相同的方法应用于多个列是很容易的。如果你想让我发布代码,请告诉我。

#12


0  

How about:

如何:

select distinct * into #t from duplicates_tbl

truncate duplicates_tbl

insert duplicates_tbl select * from #t

drop table #t

#13


-1  

I'm not sure if this works with DELETE statements, but this is a way to find duplicate rows:

我不确定这是否适用于DELETE语句,但这是找到重复行的一种方法:

 SELECT *
 FROM myTable t1, myTable t2
 WHERE t1.field = t2.field AND t1.id > t2.id

I'm not sure if you can just change the "SELECT" to a "DELETE" (someone wanna let me know?), but even if you can't, you could just make it into a subquery.

我不确定你是否可以把“SELECT”改成“DELETE”(有人想让我知道吗?),但即使你不能,你也可以把它变成一个子查询。

#1


23  

I'd SELECT DISTINCT the rows and throw them into a temporary table, then drop the source table and copy back the data from the temp. EDIT: now with code snippet!

我将选择不同的行并将它们放入临时表中,然后删除源表并从临时表中复制数据。

INSERT INTO TABLE_2 
SELECT DISTINCT * FROM TABLE_1
GO
DELETE FROM TABLE_1
GO
INSERT INTO TABLE_1
SELECT * FROM TABLE_2
GO

#2


7  

Add an identity column to act as a surrogate primary key, and use this to identify two of the three rows to be deleted.

添加标识列作为代理主键,并使用它标识要删除的三行中的两行。

I would consider leaving the identity column in place afterwards, or if this is some kind of link table, create a compound primary key on the other columns.

我将考虑在之后保留标识列,或者如果这是某种链接表,则在其他列上创建复合主键。

#3


7  

The following example works as well when your PK is just a subset of all table columns.

当您的PK仅仅是所有表列的子集时,下面的示例也同样有效。

(Note: I like the approach with inserting another surrogate id column more. But maybe this solution comes handy as well.)

(注意:我更喜欢插入另一个代理id列的方法。但或许这个解决方案也能派上用场。

First find the duplicate rows:

首先找到重复的行:

SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

If there are only few, you can delete them manually:

如果只有少数,你可以手动删除:

set rowcount 1
delete from t1
where col1=1 and col2=1

The value of "rowcount" should be n-1 times the number of duplicates. In this example there are 2 dulpicates, therefore rowcount is 1. If you get several duplicate rows, you have to do this for every unique primary key.

“rowcount”的值应该是n-1乘以重复的数量。在本例中有两个dulpicates,因此rowcount是1。如果您得到多个重复的行,那么您必须对每个惟一的主键执行此操作。

If you have many duplicates, then copy every key once into anoher table:

如果你有很多副本,那么把每一把钥匙都拷贝到另一张桌子上:

SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

Then copy the keys, but eliminate the duplicates.

然后复制密钥,但是删除副本。

SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2

In your keys you have now unique keys. Check if you don't get any result:

在你的钥匙中现在有唯一的钥匙。检查是否没有结果:

SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2

Delete the duplicates from the original table:

从原始表格中删除副本:

DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2

Insert the original rows:

插入原始行:

INSERT t1 SELECT * FROM holddups

btw and for completeness: In Oracle there is a hidden field you could use (rowid):

顺便说一句,为了完整性:在Oracle中有一个隐藏字段可以使用(rowid):

DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3... ;

see: Microsoft Knowledge Site

看到:微软知识的网站

#4


4  

Here's the method I used when I asked this question -

这是我问这个问题时使用的方法

DELETE MyTable 
FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

#5


4  

This is a way to do it with Common Table Expressions, CTE. It involves no loops, no new columns or anything and won't cause any unwanted triggers to fire (due to deletes+inserts).

这是用通用表表达式CTE实现的一种方法。它不包含任何循环、新列或其他内容,不会引发任何不想要的触发器(由于删除+插入)。

Inspired by this article.

灵感来自这篇文章。

CREATE TABLE #temp (i INT)

INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (2)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (4)

SELECT * FROM #temp

;
WITH [#temp+rowid] AS
(SELECT ROW_NUMBER() OVER (ORDER BY i ASC) AS ROWID, * FROM #temp)
DELETE FROM [#temp+rowid] WHERE rowid IN 
(SELECT MIN(rowid) FROM [#temp+rowid] GROUP BY i HAVING COUNT(*) > 1)

SELECT * FROM #temp

DROP TABLE #temp   

#6


2  

This is a tough situation to be in. Without knowing your particular situation (table size etc) I think that your best shot is to add an identity column, populate it and then delete according to it. You may remove the column later but I would suggest that you should keep it as it is really a good thing to have in the table

这是一个艰难的处境。在不了解您的具体情况(表大小等)的情况下,我认为您最好的方法是添加标识列,填充它,然后根据它删除。稍后您可以删除该列,但我建议您保留它,因为在表中有它确实是一件好事

#7


0  

After you clean up the current mess you could add a primary key that includes all the fields in the table. that will keep you from getting into the mess again. Of course this solution could very well break existing code. That will have to be handled as well.

清理完当前的混乱之后,您可以添加一个主键,其中包含表中的所有字段。那将使你避免再次陷入混乱。当然,这个解决方案很可能会破坏现有的代码。这也需要处理。

#8


0  

Can you add a primary key identity field to the table?

是否可以将主键标识字段添加到表中?

#9


0  

Manrico Corazzi - I specialize in Oracle, not MS SQL, so you'll have to tell me if this is possible as a performance boost:-

Manrico Corazzi——我专门研究Oracle,不是SQL女士,所以你必须告诉我,这是否能促进性能:-

  1. Leave the same as your first step - insert distinct values into TABLE2 from TABLE1.
  2. 第一步保持不变——将表1中的不同值插入表2。
  3. Drop TABLE1. (Drop should be faster than delete I assume, much as truncate is faster than delete).
  4. 表1。(我认为Drop应该比delete快,截断比delete快)。
  5. Rename TABLE2 as TABLE1 (saves you time, as you're renaming an object rather than copying data from one table to another).
  6. 将TABLE2重命名为TABLE1(节省您的时间,因为您正在重命名一个对象,而不是将数据从一个表复制到另一个表)。

#10


0  

Here's another way, with test data

这是另一种方法,使用测试数据

create table #table1 (colWithDupes1 int, colWithDupes2 int)
insert into #table1
(colWithDupes1, colWithDupes2)
Select 1, 2 union all
Select 1, 2 union all
Select 2, 2 union all
Select 3, 4 union all
Select 3, 4 union all
Select 3, 4 union all
Select 4, 2 union all
Select 4, 2 


select * from #table1

set rowcount 1
select 1

while @@rowcount > 0
delete #table1  where 1 < (select count(*) from #table1 a2 
   where #table1.colWithDupes1 = a2.colWithDupes1
and #table1.colWithDupes2 = a2.colWithDupes2
)

set rowcount 0

select * from #table1

#11


0  

What about this solution :

这个解决方案怎么样:

First you execute the following query :

首先执行以下查询:

  select 'set rowcount ' + convert(varchar,COUNT(*)-1) + ' delete from MyTable where field=''' + field +'''' + ' set rowcount 0'  from mytable group by field having COUNT(*)>1

And then you just have to execute the returned result set

然后你只需要执行返回的结果集

set rowcount 3 delete from Mytable where field='foo' set rowcount 0
....
....
set rowcount 5 delete from Mytable where field='bar' set rowcount 0

I've handled the case when you've got only one column, but it's pretty easy to adapt the same approach tomore than one column. Let me know if you want me to post the code.

当您只有一个列时,我已经处理过这种情况,但是要将相同的方法应用于多个列是很容易的。如果你想让我发布代码,请告诉我。

#12


0  

How about:

如何:

select distinct * into #t from duplicates_tbl

truncate duplicates_tbl

insert duplicates_tbl select * from #t

drop table #t

#13


-1  

I'm not sure if this works with DELETE statements, but this is a way to find duplicate rows:

我不确定这是否适用于DELETE语句,但这是找到重复行的一种方法:

 SELECT *
 FROM myTable t1, myTable t2
 WHERE t1.field = t2.field AND t1.id > t2.id

I'm not sure if you can just change the "SELECT" to a "DELETE" (someone wanna let me know?), but even if you can't, you could just make it into a subquery.

我不确定你是否可以把“SELECT”改成“DELETE”(有人想让我知道吗?),但即使你不能,你也可以把它变成一个子查询。