Apologies, this is certainly a duplicate but I don't know the right words to google for.
对不起,这确实是重复的,但是我不知道谷歌应该用什么词。
I've got a table of purchasing decisions that looks like this:
我有一个采购决策表,看起来是这样的:
org_id item_id spend
--------------------------
123 AAB 2
123 AAC 4
124 AAB 10
124 AAD 5
I want to find all the items that were only bought by three or fewer organisations. Then I want to order them by summed spend.
我想找到所有只有三个或更少的组织购买的物品。然后我要把它们按总支出排序。
How would I do this in SQL? NB I'm using BigQuery SQL.
我如何用SQL来实现这一点?我在使用BigQuery SQL。
So far I've got:
到目前为止我有:
SELECT *
FROM
(SELECT ??(org_id) as org_count, -- How do I get the count of different org_ids?
item_id,
SUM(spend) AS total_spend
FROM mytable
GROUP BY item_id) t
WHERE org_count < 4
ORDER BY total_spend DESC
2 个解决方案
#1
4
SELECT
item_id,
EXACT_COUNT_DISTINCT(org_id) AS org_count,
SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC
Please note, in BigQuery:
请注意,在BigQuery:
If you use the COUNT with DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.
如果使用带有DISTINCT关键字的COUNT,函数将返回指定字段的不同值的数量。注意,DISTINCT的返回值是统计近似,不能保证是精确的。
To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.
要计算不同值的确切数目,请使用EXACT_COUNT_DISTINCT。或者,对于更可伸缩的方法,考虑在相关字段上逐个使用组,然后应用COUNT(*)。每一组采用的方法都是可伸缩的,但可能会带来轻微的预先性能损失。
See more on COUNT and DISTINCT in Syntax section of https://cloud.google.com/bigquery/query-reference#aggfunctions
请参阅https://cloud.google.com/bigquery/queryreference #aggfunctions的语法部分中的COUNT和DISTINCT
#2
0
Might be slightly different in your brand of SQL, but this is how you would do it in SQL Server:
在您的SQL品牌中可能略有不同,但这是您在SQL Server中实现的方法:
Select item_id, sum(spend) as total_spent, count(distinct org_id) as num_orgs
from myTable
group by item_id
having num_orgs <= 3
order by total_spend desc
#1
4
SELECT
item_id,
EXACT_COUNT_DISTINCT(org_id) AS org_count,
SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC
Please note, in BigQuery:
请注意,在BigQuery:
If you use the COUNT with DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.
如果使用带有DISTINCT关键字的COUNT,函数将返回指定字段的不同值的数量。注意,DISTINCT的返回值是统计近似,不能保证是精确的。
To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.
要计算不同值的确切数目,请使用EXACT_COUNT_DISTINCT。或者,对于更可伸缩的方法,考虑在相关字段上逐个使用组,然后应用COUNT(*)。每一组采用的方法都是可伸缩的,但可能会带来轻微的预先性能损失。
See more on COUNT and DISTINCT in Syntax section of https://cloud.google.com/bigquery/query-reference#aggfunctions
请参阅https://cloud.google.com/bigquery/queryreference #aggfunctions的语法部分中的COUNT和DISTINCT
#2
0
Might be slightly different in your brand of SQL, but this is how you would do it in SQL Server:
在您的SQL品牌中可能略有不同,但这是您在SQL Server中实现的方法:
Select item_id, sum(spend) as total_spent, count(distinct org_id) as num_orgs
from myTable
group by item_id
having num_orgs <= 3
order by total_spend desc