It's easy to find duplicates
with one field:
很容易找到一个字段的重复:
SELECT name, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
So if we have a table
如果我们有一个表。
ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com
This query will give us John, Sam, Tom, Tom because they all have the same email
.
这个查询将给我们John, Sam, Tom, Tom,因为他们都有相同的电子邮件。
However, what I want is to get duplicates with the same email
and name
.
但是,我想要的是用相同的邮件和名字复制。
That is, I want to get "Tom", "Tom".
也就是说,我想要“Tom”,“Tom”。
The reason I need this: I made a mistake, and allowed to insert duplicate name
and email
values. Now I need to remove/change the duplicates, so I need to find them first.
我需要这样做的原因是:我犯了一个错误,允许插入重复的名称和电子邮件值。现在我需要删除/更改副本,所以我需要先找到它们。
23 个解决方案
#1
2058
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
简单地对两列进行分组。
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
注意:旧的ANSI标准是在组中包含所有非聚合列,但是随着“功能依赖”的概念的改变,这一标准已经改变:
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
在关系数据库理论中,函数依赖关系是数据库关系中的两组属性之间的约束。换句话说,函数依赖性是描述关系中属性之间关系的约束。
Support is not consistent:
支持不一致:
- Recent PostgreSQL supports it.
- 最近PostgreSQL支持它。
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- SQL Server(如SQL Server 2017)仍然需要组中的所有非聚合列。
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- 按lname顺序分组,显示错误结果;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- 在没有任何()的情况下,这是最便宜的聚合函数(请参阅已接受答案中的注释)。
- MySQL是不可预测的,您需要sql_mode=only_full_group_by:按lname顺序显示错误的结果;在没有任何()的情况下,这是最便宜的聚合函数(请参阅已接受答案中的注释)。
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
- 甲骨文还不够主流(警告:幽默,我不了解甲骨文)。
#2
265
try this:
试试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
输出:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
if you want the IDs of the dups use this:
如果你想让dups的id使用以下内容:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
输出:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
to delete the duplicates try:
若要删除副本,请尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
输出:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
#3
88
Try this:
试试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
#4
42
If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:
如果你想删除重复的数据,这里有一种更简单的方法,而不需要在三重子选择中找到偶数行。
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
And so to delete:
所以删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Much more easier to read and understand IMHO
更容易阅读和理解
Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
注意:唯一的问题是必须执行请求,直到没有删除行,因为每次只删除一个副本
#5
29
Try the following:
试试以下:
SELECT * FROM
(
SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
AS Rank
FROM Customers
) AS B WHERE Rank>1
#6
21
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
#7
17
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
参加聚会有点晚了,但我找到了一个很酷的方法来查找所有的重复id:
SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
#8
15
try this code
试试这个代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
#9
13
In case you work with Oracle, this way would be preferable:
如果你和甲骨文公司合作,最好是这样:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
#10
9
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
这将从每组重复记录中选择/删除除一个记录外的所有重复记录。因此,delete将保留所有唯一的记录+来自每组重复记录的一个记录。
Select duplicates:
选择副本:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Delete duplicates:
删除重复:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Be aware of larger amounts of records, it can cause performance problems.
注意大量的记录,这会导致性能问题。
#11
8
select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1
#12
7
If you wish to see if there is any duplicate rows in your table, I used below Query:
如果您希望查看表中是否有重复的行,我使用以下查询:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
#13
7
How we can count the duplicated values?? either it is repeated 2 times or greater than 2. just count them, not group wise.
如何计算重复的值?要么重复2次,要么大于2。数一数,而不是集体智慧。
as simple as
那么简单
select COUNT(distinct col_01) from Table_01
#14
6
This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).
这是我想到的最简单的事情。它使用一个公共表表达式(CTE)和一个分区窗口(我认为这些特性都在SQL 2008和以后的版本中)。
This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.
此示例查找具有相同名称和dob的所有学生。您想要检查的字段在OVER子句中执行。您可以在投影中包含任何您想要的其他字段。
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
#15
5
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
通过具有COUNT(id)>1的id从表1组中选择id、COUNT(id);
I think this will work properly to search repeated values in a particular column.
我认为这对于搜索特定列中的重复值是合适的。
#16
5
This should also work, maybe give it try.
这也应该行得通,也许试试吧。
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns
如果您搜索具有某种前缀或一般更改(如邮件中的新域)的副本,就会特别有用。然后可以在这些列上使用replace()
#17
5
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
#18
4
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
#19
4
If you want to find duplicate data (by one or several criterias) and select the actual rows.
如果您想查找重复的数据(通过一个或几个标准)并选择实际的行。
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
#20
4
By Using CTE also we can find duplicate value like this
通过使用CTE,我们也可以找到这样的重复值
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
#21
3
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
#22
1
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
选择column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1,具有COUNT(*) > 1;
#23
-1
SELECT
FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM
CUSTOMER
GROUP BY
FirstName, LastName, MobileNo
HAVING
COUNT(1) > 1;
#1
2058
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
简单地对两列进行分组。
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
注意:旧的ANSI标准是在组中包含所有非聚合列,但是随着“功能依赖”的概念的改变,这一标准已经改变:
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
在关系数据库理论中,函数依赖关系是数据库关系中的两组属性之间的约束。换句话说,函数依赖性是描述关系中属性之间关系的约束。
Support is not consistent:
支持不一致:
- Recent PostgreSQL supports it.
- 最近PostgreSQL支持它。
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- SQL Server(如SQL Server 2017)仍然需要组中的所有非聚合列。
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- 按lname顺序分组,显示错误结果;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- 在没有任何()的情况下,这是最便宜的聚合函数(请参阅已接受答案中的注释)。
- MySQL是不可预测的,您需要sql_mode=only_full_group_by:按lname顺序显示错误的结果;在没有任何()的情况下,这是最便宜的聚合函数(请参阅已接受答案中的注释)。
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
- 甲骨文还不够主流(警告:幽默,我不了解甲骨文)。
#2
265
try this:
试试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
输出:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
if you want the IDs of the dups use this:
如果你想让dups的id使用以下内容:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
输出:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
to delete the duplicates try:
若要删除副本,请尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
输出:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
#3
88
Try this:
试试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
#4
42
If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:
如果你想删除重复的数据,这里有一种更简单的方法,而不需要在三重子选择中找到偶数行。
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
And so to delete:
所以删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Much more easier to read and understand IMHO
更容易阅读和理解
Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
注意:唯一的问题是必须执行请求,直到没有删除行,因为每次只删除一个副本
#5
29
Try the following:
试试以下:
SELECT * FROM
(
SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
AS Rank
FROM Customers
) AS B WHERE Rank>1
#6
21
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
#7
17
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
参加聚会有点晚了,但我找到了一个很酷的方法来查找所有的重复id:
SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
#8
15
try this code
试试这个代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
#9
13
In case you work with Oracle, this way would be preferable:
如果你和甲骨文公司合作,最好是这样:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
#10
9
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
这将从每组重复记录中选择/删除除一个记录外的所有重复记录。因此,delete将保留所有唯一的记录+来自每组重复记录的一个记录。
Select duplicates:
选择副本:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Delete duplicates:
删除重复:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Be aware of larger amounts of records, it can cause performance problems.
注意大量的记录,这会导致性能问题。
#11
8
select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1
#12
7
If you wish to see if there is any duplicate rows in your table, I used below Query:
如果您希望查看表中是否有重复的行,我使用以下查询:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
#13
7
How we can count the duplicated values?? either it is repeated 2 times or greater than 2. just count them, not group wise.
如何计算重复的值?要么重复2次,要么大于2。数一数,而不是集体智慧。
as simple as
那么简单
select COUNT(distinct col_01) from Table_01
#14
6
This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).
这是我想到的最简单的事情。它使用一个公共表表达式(CTE)和一个分区窗口(我认为这些特性都在SQL 2008和以后的版本中)。
This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.
此示例查找具有相同名称和dob的所有学生。您想要检查的字段在OVER子句中执行。您可以在投影中包含任何您想要的其他字段。
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
#15
5
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
通过具有COUNT(id)>1的id从表1组中选择id、COUNT(id);
I think this will work properly to search repeated values in a particular column.
我认为这对于搜索特定列中的重复值是合适的。
#16
5
This should also work, maybe give it try.
这也应该行得通,也许试试吧。
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns
如果您搜索具有某种前缀或一般更改(如邮件中的新域)的副本,就会特别有用。然后可以在这些列上使用replace()
#17
5
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
#18
4
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
#19
4
If you want to find duplicate data (by one or several criterias) and select the actual rows.
如果您想查找重复的数据(通过一个或几个标准)并选择实际的行。
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
#20
4
By Using CTE also we can find duplicate value like this
通过使用CTE,我们也可以找到这样的重复值
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
#21
3
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
#22
1
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
选择column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1,具有COUNT(*) > 1;
#23
-1
SELECT
FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM
CUSTOMER
GROUP BY
FirstName, LastName, MobileNo
HAVING
COUNT(1) > 1;