I have a table that I use to store some systematically chosen "serial numbers" for each product that is bought...
我有一个表格,用来存储一些系统选择的“序列号”,用于购买每一种产品。
The problem is, a CSV was uploaded that I believe contained some duplicate "serial numbers", which means that when the application tries to modify a row, it may not be modifying the correct one.
问题是,一个CSV被上传,我认为它包含了一些重复的“序列号”,这意味着当应用程序试图修改一行时,它可能不会修改正确的行。
I need to be able to query the database and get all rows that are a double of the serial_number
column. It should look something like this:
我需要能够查询数据库并获取所有列的所有行,这是serial_number列的两倍。它应该是这样的:
ID, serial_number, meta1, meta2, meta3
3, 123456, 0, 2, 4
55, 123456, 0, 0, 0
6, 345678, 0, 1, 2
99, 345678, 0, 1, 2
So as you can see, I need to be able to see both the original row and the duplicate row and all of it's columns of data ... this is so I can compare them and determine what data is now inconsistent.
正如你所看到的,我需要能够看到原始行和复制行以及所有的数据列…这样我就可以比较它们,确定哪些数据现在是不一致的。
2 个解决方案
#1
1
Some versions of MySQL implement in
with a subquery very inefficiently. A safe alternative is a join:
一些版本的MySQL在子查询中非常低效。一个安全的选择是加入:
SELECT t.*
FROM t join
(select serial_number, count(*) as cnt
from t
group by serial_number
) tsum
on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;
Another alternative is to use an exists
clause:
另一种选择是使用现有的条款:
select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;
Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id)
.
这两个查询(以及@fthiella提出的查询)都是标准的SQL。它们都将从索引(serial_number, id)中获益。
#2
1
SELECT *
FROM
yourtable
WHERE
serial_number IN (SELECT serial_number
FROM yourtable
GROUP BY serial_number
HAVING COUNT(*)>1)
ORDER BY
serial_number, id
#1
1
Some versions of MySQL implement in
with a subquery very inefficiently. A safe alternative is a join:
一些版本的MySQL在子查询中非常低效。一个安全的选择是加入:
SELECT t.*
FROM t join
(select serial_number, count(*) as cnt
from t
group by serial_number
) tsum
on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;
Another alternative is to use an exists
clause:
另一种选择是使用现有的条款:
select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;
Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id)
.
这两个查询(以及@fthiella提出的查询)都是标准的SQL。它们都将从索引(serial_number, id)中获益。
#2
1
SELECT *
FROM
yourtable
WHERE
serial_number IN (SELECT serial_number
FROM yourtable
GROUP BY serial_number
HAVING COUNT(*)>1)
ORDER BY
serial_number, id