I'm trying to create a MySQL query that will return all individual rows (not grouped) containing duplicate values from within a group of related records. By 'groups of related records' I mean those with the same account number (per the sample below).
我正在尝试创建一个MySQL查询,它将返回包含一组相关记录中的重复值的所有单独行(未分组)。 “相关记录组”是指具有相同帐号的组(根据下面的示例)。
Basically, within each group of related records that share the same distinct account number, select just those rows whose values for the date
or amount
columns are the same as another row's values within that account's group of records. Values should only be considered duplicate from within that account's group. The sample table and ideal output details below should clear things up.
基本上,在共享相同的不同帐号的每组相关记录中,只选择那些日期或金额列的值与该帐户的记录组中的另一行值相同的行。只应将该值视为该帐户组内的重复值。下面的样本表和理想的输出细节应该可以解决问题。
Also, I'm not concerned with any records with a status of X being returned, even if they have duplicate values.
此外,即使它们具有重复值,我也不关心任何返回X状态的记录。
Small sample table with relevant data:
id account invoice date amount status
1 1 1 2012-04-01 0 X
2 1 2 2012-04-01 120 P
3 1 2 2012-05-01 120 U
4 1 3 2012-05-01 117 U
5 2 4 2012-04-01 82 X
6 2 4 2012-05-01 82 U
7 2 5 2012-03-01 81 P
8 2 6 2012-05-01 80 U
9 3 7 2012-03-01 80 P
10 3 8 2012-04-01 79 U
11 3 9 2012-04-01 78 U
Ideal output returned from desired SQL query:
id account invoice date amount status
2 1 2 2012-04-01 120 P
3 1 2 2012-05-01 120 U
4 1 3 2012-05-01 117 U
6 2 4 2012-05-01 82 U
8 2 6 2012-05-01 80 U
10 3 8 2012-04-01 79 U
11 3 9 2012-04-01 78 U
Thus, row 7/9 and 8/9 should not both be returned because their duplicate values are not considered duplicate from within the scope of their respective accounts. However, row 8 should be returned because it shares a duplicate value with row 6.
因此,第7/9行和第8/9行不应同时返回,因为它们的重复值在各自的帐户范围内不被视为重复。但是,应返回第8行,因为它与第6行共享一个重复值。
Later, I may want to further hone the selection by grabbing only duplicate rows that have matching statuses, thus row 2 would be excluded because it does't match the other two found within that account's group of records. How much more difficult would that make the query? Would it just be a matter of adding a WHERE or HAVING clause, or is it more complicated than that?
稍后,我可能希望通过仅抓取具有匹配状态的重复行来进一步磨练选择,因此排除第2行,因为它与该帐户的记录组中找到的其他两个不匹配。进行查询会有多困难?它只是添加WHERE或HAVING子句的问题,还是比它更复杂?
I hope my explanation of what I'm trying to accomplish makes sense. I've tried using INNER JOIN but that returns each desired row more than once. I don't want duplicates of duplicates.
我希望我对我想要完成的事情的解释是有道理的。我尝试过使用INNER JOIN但是多次返回每个所需的行。我不想要重复的重复。
Table Structure and Sample Values:
CREATE TABLE payment (
id int(11) NOT NULL auto_increment,
account int(10) NOT NULL default '0',
invoice int(10) NOT NULL default '0',
date date NOT NULL default '0000-00-00',
amount int(10) NOT NULL default '0',
status char(1) NOT NULL default '',
PRIMARY KEY (id)
);
INSERT INTO payment VALUES (1, 1, 1, '2012-04-01', 0, 'X');
INSERT INTO payment VALUES (2, 1, 2, '2012-04-01', 120, 'P');
INSERT INTO payment VALUES (3, 1, 2, '2012-05-01', 120, 'U');
INSERT INTO payment VALUES (4, 1, 3, '2012-05-01', 117, 'U');
INSERT INTO payment VALUES (5, 2, 4, '2012-04-01', 82, 'X');
INSERT INTO payment VALUES (6, 2, 4, '2012-05-01', 82, 'U');
INSERT INTO payment VALUES (7, 2, 5, '2012-03-01', 81, 'p');
INSERT INTO payment VALUES (8, 2, 6, '2012-05-01', 80, 'U');
INSERT INTO payment VALUES (9, 3, 7, '2012-03-01', 80, 'U');
INSERT INTO payment VALUES (10, 3, 8, '2012-04-01', 79, 'U');
INSERT INTO payment VALUES (11, 3, 9, '2012-04-01', 78, 'U');
2 个解决方案
#1
10
This type of query can be implemented as a semi join.
这种类型的查询可以实现为半连接。
Semijoins are used to select rows from one of the tables in the join.
Semijoins用于从连接中的一个表中选择行。
For example:
select distinct l.*
from payment l
inner join payment r
on
l.id != r.id and l.account = r.account and
(l.date = r.date or l.amount = r.amount)
where l.status != 'X' and r.status != 'X'
order by l.id asc;
Note the use of distinct
, and that I'm only selecting columns from the left table. This ensures that there are no duplicates.
注意使用distinct,我只选择左表中的列。这确保没有重复。
The join condition checks that:
连接条件检查:
- it's not joining a row to itself (
l.id != r.id
) - rows are in the same account (
l.account = r.account
) - and either the date or the amount is the same (
l.date = r.date or l.amount = r.amount
)
它没有加入一行(l.id!= r.id)
行在同一个帐户中(l.account = r.account)
并且日期或金额相同(l.date = r.date或l.amount = r.amount)
For the second part of your question, you would need to update the on
clause in the query.
对于问题的第二部分,您需要更新查询中的on子句。
#2
3
This seems to work
这似乎有效
select * from payment p1
join payment p2 on
(p1.id != p2.id
and p1.status != 'X'
and p1.account = p2.account
and (p1.amount = p2.amount or p1.date = p2.date))
group by p1.id
#1
10
This type of query can be implemented as a semi join.
这种类型的查询可以实现为半连接。
Semijoins are used to select rows from one of the tables in the join.
Semijoins用于从连接中的一个表中选择行。
For example:
select distinct l.*
from payment l
inner join payment r
on
l.id != r.id and l.account = r.account and
(l.date = r.date or l.amount = r.amount)
where l.status != 'X' and r.status != 'X'
order by l.id asc;
Note the use of distinct
, and that I'm only selecting columns from the left table. This ensures that there are no duplicates.
注意使用distinct,我只选择左表中的列。这确保没有重复。
The join condition checks that:
连接条件检查:
- it's not joining a row to itself (
l.id != r.id
) - rows are in the same account (
l.account = r.account
) - and either the date or the amount is the same (
l.date = r.date or l.amount = r.amount
)
它没有加入一行(l.id!= r.id)
行在同一个帐户中(l.account = r.account)
并且日期或金额相同(l.date = r.date或l.amount = r.amount)
For the second part of your question, you would need to update the on
clause in the query.
对于问题的第二部分,您需要更新查询中的on子句。
#2
3
This seems to work
这似乎有效
select * from payment p1
join payment p2 on
(p1.id != p2.id
and p1.status != 'X'
and p1.account = p2.account
and (p1.amount = p2.amount or p1.date = p2.date))
group by p1.id