how to write a statement to accomplish the folowing?
如何写一份声明来完成这个过程?
lets say a table has 2 columns (both are nvarchar) with the following data
假设一个表有2个列(都是nvarchar)和以下数据。
col1 10000_10000_10001_10002_10002_10002
col2 10____20____10____30____40_____50
I'd like to keep only the following data:
我只保留以下数据:
col1 10000_10001_10002
col2 10____10____30
thus removing the duplicates based on the second column values (neither of the columns are primary keys), keeping only those records with the minimal value in the second column.
因此,根据第二个列值(两个列都不是主键)删除重复项,只保留第二个列中值最小的记录。
how to accomplish this?
如何完成这个吗?
3 个解决方案
#1
4
This should work for you:
这应该对你有用:
;
WITH NotMin AS
(
SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
FROM Table1
)
DELETE Table1
--SELECT *
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2
AND Table1.Col2 != TheMin
This uses a CTE (like a derived table, but cleaner) and the over clause as a shortcut for less code. I also added a commented select so you can see the matching rows (verify before deleting). This will work in SQL 2005/2008.
这使用CTE(类似于派生表,但更简洁)和over子句作为减少代码的快捷方式。我还添加了一个注释选择,以便您可以看到匹配的行(在删除之前进行验证)。这将适用于SQL 2005/2008。
Thanks, Eric
谢谢你,埃里克
#2
0
Ideally, you'd like to be able to say:
理想情况下,你会说:
DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)
Unfortunately, that's not allowed in T-SQL, but there is a proprietary extension with a double FROM (using EXCEPT for clarity):
不幸的是,这在T-SQL中是不允许的,但是有一个专有的扩展,它有一个双重的FROM(除了清晰之外使用):
DELETE
FROM tbl
FROM tbl
EXCEPT
SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1
In general:
一般来说:
DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)
Or other workarounds.
或其他解决方法。
#3
0
Sorry, I misunderstood the question.
对不起,我误解你的问题了。
SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1
Of course returns the rows in question, but assuming you can't alter the table to add a unique identifier, you would need to do something like:
当然会返回有问题的行,但是如果您不能更改表以添加唯一标识符,则需要执行以下操作:
DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)
Which should work assuming that the pipe character never appears in your set.
假设管道字符永远不会出现在您的集合中。
#1
4
This should work for you:
这应该对你有用:
;
WITH NotMin AS
(
SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
FROM Table1
)
DELETE Table1
--SELECT *
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2
AND Table1.Col2 != TheMin
This uses a CTE (like a derived table, but cleaner) and the over clause as a shortcut for less code. I also added a commented select so you can see the matching rows (verify before deleting). This will work in SQL 2005/2008.
这使用CTE(类似于派生表,但更简洁)和over子句作为减少代码的快捷方式。我还添加了一个注释选择,以便您可以看到匹配的行(在删除之前进行验证)。这将适用于SQL 2005/2008。
Thanks, Eric
谢谢你,埃里克
#2
0
Ideally, you'd like to be able to say:
理想情况下,你会说:
DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)
Unfortunately, that's not allowed in T-SQL, but there is a proprietary extension with a double FROM (using EXCEPT for clarity):
不幸的是,这在T-SQL中是不允许的,但是有一个专有的扩展,它有一个双重的FROM(除了清晰之外使用):
DELETE
FROM tbl
FROM tbl
EXCEPT
SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1
In general:
一般来说:
DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)
Or other workarounds.
或其他解决方法。
#3
0
Sorry, I misunderstood the question.
对不起,我误解你的问题了。
SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1
Of course returns the rows in question, but assuming you can't alter the table to add a unique identifier, you would need to do something like:
当然会返回有问题的行,但是如果您不能更改表以添加唯一标识符,则需要执行以下操作:
DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)
Which should work assuming that the pipe character never appears in your set.
假设管道字符永远不会出现在您的集合中。