T-SQL基于列删除重复

The screenshot shows the first 8 rows of a table. For the same id (each id has thousands of rows), based on the same "updatetime", I only want to keep the first row, delete the rest row(s). Here for example, I want to delete 3rd, 5th, 8th rows. All columns of two rows could be exactly the same (here when updatetime are the same, the UpdateMillisec are different, but not necessary). The screenshot is the result of a query, and I don't have the primary key now (the left-most column in the screenshot is not available in the table now). What SQL code should I write? Thanks in advance!

屏幕截图显示了表的前8行。对于相同的id(每个id有数千行)，基于相同的“updatetime”，我只想保留第一行，删除其余行。例如，我想删除3、5、8行。两行中的所有列可以完全相同(这里当updatetime是相同的时，UpdateMillisec是不同的，但不是必需的)。屏幕截图是查询的结果，现在我没有主键(屏幕截图中最左边的列现在不在表中)。我应该写什么SQL代码?提前谢谢!

T-SQL基于列删除重复

3 个解决方案

#1

There is an easy way to delete duplicate rows.

有一种简单的方法可以删除重复的行。

In a first step, we will sort the records and add a rownumber.
Second step will be deleting rows with rownumber > 1.

在第一步中，我们将对记录进行排序并添加行号。第二步是使用行号> 1删除行。

WITH CTE AS
(
SELECT  *
       ,ROW_NUMBER() OVER 
                (PARTITION BY id, updatetime
                     ORDER BY id, updatetime, UpdateMillisec ASC
                     ) AS RowNum
  FROM yourtable

)
SELECT * FROM CTE                    -- for checking the result before deleting
-- DELETE FROM CTE WHERE RowNum > 1  -- uncomment this row for the final DELETE

Attention:
To identify, which is the first record and which is a following (second, third,..) record, we have to sort the data.
Before deleting them, always check the resultset with a SELECT * FROM CTE first

注意:要识别第一个记录和以下记录，我们必须对数据进行排序。在删除它们之前，总是先用CTE中的SELECT *检查resultset

In your case i checked the resultset of the above query, which is:

在您的情况下，我检查了上述查询的结果集，即:

id  lastprice   updatetime          UpdateMillisec  RowNum
211709  51370   09:30:00.0000000    500             1
211709  51380   09:30:01.0000000    0               1
211709  51370   09:30:01.0000000    500             2
211709  51370   09:30:02.0000000    0               1
211709  51370   09:30:02.0000000    500             2
211709  51370   09:30:03.0000000    0               1
211709  51370   09:30:04.0000000    0               1
211709  51370   09:30:04.0000000    500             2

As we can see, exactly those records, which you want to delete, have RowNum = 2. So finally we can change the SELECT * to a DELETE and execute the query again.

正如我们看到的，这些记录，你想要删除的，有RowNum = 2。最后我们可以将SELECT *改为DELETE并再次执行查询。

#2

Give a row number partitioned by the columns and order by time columns and then delete the unwanted rows.

给出按列划分的行号，按时间列排序，然后删除不需要的行。

Query

查询

;with cte as(
    select [rn] = row_number() over(
        partition by [id], [lastprice], [updatetime] 
        order by [id], [updatetime], [updateMillisec]
    ), *
    from [your_table_nam]
)
select * from cte -- first select and check whether these are the rows that has to be deleted
where [rn] > 1;

If ok, then delete the rows having [rn] greater than 1.

如果可以，则删除有[rn]大于1的行。

delete from cte
where [rn] > 1;

#3

I like @Estban P.'s solution. And I was tempted to try further. It turns out to be possible to do it this way too:

我喜欢@Estban P。的解决方案。我很想进一步尝试。这样做也是有可能的:

DELETE seq FROM (SELECT ROW_NUMBER() 
       OVER(PARTITION BY id, updatetime ORDER BY id, updatetime, updatems ASC) AS RowNum
FROM tbl ) seq where rownum>1;

So, you don't even have to use a CTE, see the demo here http://rextester.com/VLZOD12591

因此，您甚至不需要使用CTE，请参阅这里的演示http://rextester.com/VLZOD12591

#1

There is an easy way to delete duplicate rows.

有一种简单的方法可以删除重复的行。

In a first step, we will sort the records and add a rownumber.
Second step will be deleting rows with rownumber > 1.

在第一步中，我们将对记录进行排序并添加行号。第二步是使用行号> 1删除行。

WITH CTE AS
(
SELECT  *
       ,ROW_NUMBER() OVER 
                (PARTITION BY id, updatetime
                     ORDER BY id, updatetime, UpdateMillisec ASC
                     ) AS RowNum
  FROM yourtable

)
SELECT * FROM CTE                    -- for checking the result before deleting
-- DELETE FROM CTE WHERE RowNum > 1  -- uncomment this row for the final DELETE

注意:要识别第一个记录和以下记录，我们必须对数据进行排序。在删除它们之前，总是先用CTE中的SELECT *检查resultset

In your case i checked the resultset of the above query, which is:

在您的情况下，我检查了上述查询的结果集，即:

id  lastprice   updatetime          UpdateMillisec  RowNum
211709  51370   09:30:00.0000000    500             1
211709  51380   09:30:01.0000000    0               1
211709  51370   09:30:01.0000000    500             2
211709  51370   09:30:02.0000000    0               1
211709  51370   09:30:02.0000000    500             2
211709  51370   09:30:03.0000000    0               1
211709  51370   09:30:04.0000000    0               1
211709  51370   09:30:04.0000000    500             2

As we can see, exactly those records, which you want to delete, have RowNum = 2. So finally we can change the SELECT * to a DELETE and execute the query again.

正如我们看到的，这些记录，你想要删除的，有RowNum = 2。最后我们可以将SELECT *改为DELETE并再次执行查询。

#2

Give a row number partitioned by the columns and order by time columns and then delete the unwanted rows.

给出按列划分的行号，按时间列排序，然后删除不需要的行。

Query

查询

;with cte as(
    select [rn] = row_number() over(
        partition by [id], [lastprice], [updatetime] 
        order by [id], [updatetime], [updateMillisec]
    ), *
    from [your_table_nam]
)
select * from cte -- first select and check whether these are the rows that has to be deleted
where [rn] > 1;

If ok, then delete the rows having [rn] greater than 1.

如果可以，则删除有[rn]大于1的行。

delete from cte
where [rn] > 1;

#3

I like @Estban P.'s solution. And I was tempted to try further. It turns out to be possible to do it this way too:

我喜欢@Estban P。的解决方案。我很想进一步尝试。这样做也是有可能的:

DELETE seq FROM (SELECT ROW_NUMBER() 
       OVER(PARTITION BY id, updatetime ORDER BY id, updatetime, updatems ASC) AS RowNum
FROM tbl ) seq where rownum>1;

So, you don't even have to use a CTE, see the demo here http://rextester.com/VLZOD12591

因此，您甚至不需要使用CTE，请参阅这里的演示http://rextester.com/VLZOD12591

秒客网

T-SQL基于列删除重复

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章