如何在SQL中获取不同的跟随组计数?

Apologies, this is certainly a duplicate but I don't know the right words to google for.

对不起，这确实是重复的，但是我不知道谷歌应该用什么词。

I've got a table of purchasing decisions that looks like this:

我有一个采购决策表，看起来是这样的:

org_id    item_id    spend
--------------------------
123        AAB         2
123        AAC         4
124        AAB        10
124        AAD         5

I want to find all the items that were only bought by three or fewer organisations. Then I want to order them by summed spend.

我想找到所有只有三个或更少的组织购买的物品。然后我要把它们按总支出排序。

How would I do this in SQL? NB I'm using BigQuery SQL.

我如何用SQL来实现这一点?我在使用BigQuery SQL。

So far I've got:

到目前为止我有:

SELECT * 
FROM 
  (SELECT ??(org_id) as org_count, -- How do I get the count of different org_ids? 
         item_id, 
         SUM(spend) AS total_spend
  FROM mytable 
  GROUP BY item_id) t
WHERE org_count < 4
ORDER BY total_spend DESC

2 个解决方案

#1

SELECT 
  item_id, 
  EXACT_COUNT_DISTINCT(org_id) AS org_count, 
  SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC

Please note, in BigQuery:

请注意,在BigQuery:

If you use the COUNT with DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.

如果使用带有DISTINCT关键字的COUNT，函数将返回指定字段的不同值的数量。注意，DISTINCT的返回值是统计近似，不能保证是精确的。

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.

要计算不同值的确切数目，请使用EXACT_COUNT_DISTINCT。或者，对于更可伸缩的方法，考虑在相关字段上逐个使用组，然后应用COUNT(*)。每一组采用的方法都是可伸缩的，但可能会带来轻微的预先性能损失。

See more on COUNT and DISTINCT in Syntax section of https://cloud.google.com/bigquery/query-reference#aggfunctions

请参阅https://cloud.google.com/bigquery/queryreference #aggfunctions的语法部分中的COUNT和DISTINCT

#2

Might be slightly different in your brand of SQL, but this is how you would do it in SQL Server:

在您的SQL品牌中可能略有不同，但这是您在SQL Server中实现的方法:

Select item_id, sum(spend) as total_spent, count(distinct org_id) as num_orgs
from myTable
group by item_id
having num_orgs <= 3
order by total_spend desc

#1

SELECT 
  item_id, 
  EXACT_COUNT_DISTINCT(org_id) AS org_count, 
  SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC