I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.
我需要从表中检索所有的行,其中两个列合并在一起,它们都是不同的。所以我想要所有的销售没有任何其他的销售发生在同一天同样的价格。基于日和价格的唯一销售将被更新为活动状态。
So I'm thinking:
所以我想:
UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
FROM sales
HAVING count = 1)
But my brain hurts going any farther than that.
但我的大脑会痛得更厉害。
4 个解决方案
#1
362
SELECT DISTINCT a,b,c FROM t
is roughly equivalent to:
相当于:
SELECT a,b,c FROM t GROUP BY a,b,c
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
通过语法习惯组是一个好主意,因为它更强大。
For your query, I'd do it like this:
对于你的问题,我是这样问的:
UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)
#2
287
If you put together the answers so far, clean up and improve, you would arrive at this superior query:
如果你把答案整理到一起,清理和改进,你就会得到这个上级问:
UPDATE sales
SET status = 'ACTIVE'
WHERE (saleprice, saledate) IN (
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING count(*) = 1
);
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
比这两个都快。通过因子10 - 15(在我对PostgreSQL 8.4和9.1的测试中)对当前被接受的答案的性能进行了分析。
But this is still far from optimal. Use a NOT EXISTS
(anti-)semi-join for even better performance. EXISTS
is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:
但这仍远不是最佳选择。使用不存在(反)半连接来获得更好的性能。存在是标准的SQL,已经存在了很久(至少从PostgreSQL 7.2开始,在这个问题被提出之前很久),并且完美地满足了提出的要求:
UPDATE sales s
SET status = 'ACTIVE'
WHERE NOT EXISTS (
SELECT 1
FROM sales s1
WHERE s.saleprice = s1.saleprice
AND s.saledate = s1.saledate
AND s.id <> s1.id -- except for row itself
);
AND s.status IS DISTINCT FROM 'ACTIVE'; -- avoid empty updates. see below
SQL小提琴。
Unique key to identify row
If you don't have a primary or unique key for the table (id
in the example), you can substitute with the system column ctid
for the purpose of this query (but not for some other purposes):
如果表没有主键或唯一键(本例中为id),可以使用system列ctid替代这个查询(但不用于其他目的):
AND s1.ctid <> s.ctid
Every table should have a primary key. Add one if you didn't have one, yet. I suggest a serial
or an IDENTITY
column in Postgres 10+.
每个表都应该有一个主键。如果你还没有的话,加一个。我建议在Postgres 10+中加入一个系列或一个身份列。
Related:
相关:
- In-order sequence generation
- 顺序序列生成
- Auto increment table column
- 汽车增量表列
How is this faster?
The subquery in the EXISTS
(anti-)semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.
存在(反)半连接中的子查询可以在找到第一个dupe后立即停止计算(没有必要进一步查看)。对于几乎没有重复的基表,这只会稍微提高效率。有了大量的重复,这就变得更有效率了。
Exclude empty updates
If some or many rows already have status = 'ACTIVE'
, your update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add another WHERE
condition like demonstrated above to make this even faster:
如果一些或许多行已经具有status = 'ACTIVE',那么您的更新不会改变任何内容,但是仍然会以完全的代价插入一个新的行版本(应用小异常)。通常情况下,你不想要这个。添加另一个条件,如上面演示的,使这更迅速:
If status
is defined NOT NULL
, you can simplify to:
如果状态定义为非空,可以简化为:
AND status <> 'ACTIVE';
Subtle difference in NULL handling
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. These two rows for (saleprice, saledate)
would qualify as "distinct" (though looking identical to the human eye):
这个查询(与Joel当前接受的答案不同)不认为NULL值是相等的。这两行(saleprice, saledate)将被称为“截然不同”(尽管看起来和人眼一样):
(123, NULL)
(123, NULL)
Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
还传递一个唯一的索引和其他几乎任何地方,因为根据SQL标准,NULL值的比较不相等。看到的:
- Create unique constraint with null columns
- 使用空列创建惟一约束
OTOH, GROUP BY
or DISTINCT
or DISTINCT ON ()
treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query style using IS NOT DISTINCT FROM
instead of =
for any or all comparisons to make NULL compare equal. More:
OTOH, GROUP BY或DISTINCT或DISTINCT ON()将空值视为相等。根据您想要实现的目标,使用适当的查询样式。您仍然可以使用这种更快的查询样式,使用IS NOT DISTINCT FROM而不是= for any或all comparison来使NULL compare相等。更多:
- How to delete duplicate rows without unique identifier
- 如何删除没有唯一标识符的重复行
If all columns being compared are defined NOT NULL
, there is no room for disagreement.
如果所有被比较的列都定义为非空,则没有分歧的余地。
#3
21
The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.
您的查询的问题是,在使用GROUP BY子句时(实际上是通过使用不同的方法),您只能使用GROUP BY或聚合函数的列。不能使用列id,因为可能存在不同的值。在您的示例中,因为有have子句,所以总是只有一个值,但是大多数RDBMS还不够聪明,不能识别这个值。
This should work however (and doesn't need a join):
这应该是可行的(而且不需要加入):
UPDATE sales
SET status='ACTIVE'
WHERE id IN (
SELECT MIN(id) FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(id) = 1
)
You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.
您也可以使用MAX或AVG代替MIN,只有当只有一个匹配行时,才需要使用返回列值的函数。
#4
1
I want to select the distinct values from one column 'GrondOfLucht' but they should be sorted in the order as given in the column 'sortering'. I cannot get the distinct values of just one column using
我想从一个列'GrondOfLucht'中选择不同的值,但是它们应该按照列'sortering'中给出的顺序进行排序。我不能得到一个列的不同值
Select distinct GrondOfLucht,sortering
from CorWijzeVanAanleg
order by sortering
It will also give the column 'sortering' and because 'GrondOfLucht' AND 'sortering' is not unique, the result will be ALL rows.
它还将给出列“sortering”,因为“GrondOfLucht”和“sortering”不是唯一的,结果将是所有的行。
use the GROUP to select the records of 'GrondOfLucht' in the order given by 'sortering
使用小组按照“排序”的顺序选择“GrondOfLucht”的记录
SELECT GrondOfLucht
FROM dbo.CorWijzeVanAanleg
GROUP BY GrondOfLucht, sortering
ORDER BY MIN(sortering)
#1
362
SELECT DISTINCT a,b,c FROM t
is roughly equivalent to:
相当于:
SELECT a,b,c FROM t GROUP BY a,b,c
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
通过语法习惯组是一个好主意,因为它更强大。
For your query, I'd do it like this:
对于你的问题,我是这样问的:
UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)
#2
287
If you put together the answers so far, clean up and improve, you would arrive at this superior query:
如果你把答案整理到一起,清理和改进,你就会得到这个上级问:
UPDATE sales
SET status = 'ACTIVE'
WHERE (saleprice, saledate) IN (
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING count(*) = 1
);
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
比这两个都快。通过因子10 - 15(在我对PostgreSQL 8.4和9.1的测试中)对当前被接受的答案的性能进行了分析。
But this is still far from optimal. Use a NOT EXISTS
(anti-)semi-join for even better performance. EXISTS
is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:
但这仍远不是最佳选择。使用不存在(反)半连接来获得更好的性能。存在是标准的SQL,已经存在了很久(至少从PostgreSQL 7.2开始,在这个问题被提出之前很久),并且完美地满足了提出的要求:
UPDATE sales s
SET status = 'ACTIVE'
WHERE NOT EXISTS (
SELECT 1
FROM sales s1
WHERE s.saleprice = s1.saleprice
AND s.saledate = s1.saledate
AND s.id <> s1.id -- except for row itself
);
AND s.status IS DISTINCT FROM 'ACTIVE'; -- avoid empty updates. see below
SQL小提琴。
Unique key to identify row
If you don't have a primary or unique key for the table (id
in the example), you can substitute with the system column ctid
for the purpose of this query (but not for some other purposes):
如果表没有主键或唯一键(本例中为id),可以使用system列ctid替代这个查询(但不用于其他目的):
AND s1.ctid <> s.ctid
Every table should have a primary key. Add one if you didn't have one, yet. I suggest a serial
or an IDENTITY
column in Postgres 10+.
每个表都应该有一个主键。如果你还没有的话,加一个。我建议在Postgres 10+中加入一个系列或一个身份列。
Related:
相关:
- In-order sequence generation
- 顺序序列生成
- Auto increment table column
- 汽车增量表列
How is this faster?
The subquery in the EXISTS
(anti-)semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.
存在(反)半连接中的子查询可以在找到第一个dupe后立即停止计算(没有必要进一步查看)。对于几乎没有重复的基表,这只会稍微提高效率。有了大量的重复,这就变得更有效率了。
Exclude empty updates
If some or many rows already have status = 'ACTIVE'
, your update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add another WHERE
condition like demonstrated above to make this even faster:
如果一些或许多行已经具有status = 'ACTIVE',那么您的更新不会改变任何内容,但是仍然会以完全的代价插入一个新的行版本(应用小异常)。通常情况下,你不想要这个。添加另一个条件,如上面演示的,使这更迅速:
If status
is defined NOT NULL
, you can simplify to:
如果状态定义为非空,可以简化为:
AND status <> 'ACTIVE';
Subtle difference in NULL handling
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. These two rows for (saleprice, saledate)
would qualify as "distinct" (though looking identical to the human eye):
这个查询(与Joel当前接受的答案不同)不认为NULL值是相等的。这两行(saleprice, saledate)将被称为“截然不同”(尽管看起来和人眼一样):
(123, NULL)
(123, NULL)
Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
还传递一个唯一的索引和其他几乎任何地方,因为根据SQL标准,NULL值的比较不相等。看到的:
- Create unique constraint with null columns
- 使用空列创建惟一约束
OTOH, GROUP BY
or DISTINCT
or DISTINCT ON ()
treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query style using IS NOT DISTINCT FROM
instead of =
for any or all comparisons to make NULL compare equal. More:
OTOH, GROUP BY或DISTINCT或DISTINCT ON()将空值视为相等。根据您想要实现的目标,使用适当的查询样式。您仍然可以使用这种更快的查询样式,使用IS NOT DISTINCT FROM而不是= for any或all comparison来使NULL compare相等。更多:
- How to delete duplicate rows without unique identifier
- 如何删除没有唯一标识符的重复行
If all columns being compared are defined NOT NULL
, there is no room for disagreement.
如果所有被比较的列都定义为非空,则没有分歧的余地。
#3
21
The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.
您的查询的问题是,在使用GROUP BY子句时(实际上是通过使用不同的方法),您只能使用GROUP BY或聚合函数的列。不能使用列id,因为可能存在不同的值。在您的示例中,因为有have子句,所以总是只有一个值,但是大多数RDBMS还不够聪明,不能识别这个值。
This should work however (and doesn't need a join):
这应该是可行的(而且不需要加入):
UPDATE sales
SET status='ACTIVE'
WHERE id IN (
SELECT MIN(id) FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(id) = 1
)
You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.
您也可以使用MAX或AVG代替MIN,只有当只有一个匹配行时,才需要使用返回列值的函数。
#4
1
I want to select the distinct values from one column 'GrondOfLucht' but they should be sorted in the order as given in the column 'sortering'. I cannot get the distinct values of just one column using
我想从一个列'GrondOfLucht'中选择不同的值,但是它们应该按照列'sortering'中给出的顺序进行排序。我不能得到一个列的不同值
Select distinct GrondOfLucht,sortering
from CorWijzeVanAanleg
order by sortering
It will also give the column 'sortering' and because 'GrondOfLucht' AND 'sortering' is not unique, the result will be ALL rows.
它还将给出列“sortering”,因为“GrondOfLucht”和“sortering”不是唯一的,结果将是所有的行。
use the GROUP to select the records of 'GrondOfLucht' in the order given by 'sortering
使用小组按照“排序”的顺序选择“GrondOfLucht”的记录
SELECT GrondOfLucht
FROM dbo.CorWijzeVanAanleg
GROUP BY GrondOfLucht, sortering
ORDER BY MIN(sortering)