I have a table like this:
我有这样一张桌子:
Column | Type | Modifiers
---------+------+-----------
country | text |
food_id | int |
eaten | date |
And for each country, I want to get the food that is eaten most often. The best I can think of (I'm using postgres) is:
对于每个国家,我想获得最常吃的食物。我能想到的最好的(我使用的是postgres)是:
CREATE TEMP TABLE counts AS
SELECT country, food_id, count(*) as count FROM munch GROUP BY country, food_id;
CREATE TEMP TABLE max_counts AS
SELECT country, max(count) as max_count FROM counts GROUP BY country;
SELECT country, max(food_id) FROM counts
WHERE (country, count) IN (SELECT * from max_counts) GROUP BY country;
In that last statement, the GROUP BY and max() are needed to break ties, where two different foods have the same count.
在最后一个陈述中,需要GROUP BY和max()来打破关系,其中两种不同的食物具有相同的数量。
This seems like a lot of work for something conceptually simple. Is there a more straight forward way to do it?
对于概念上简单的事情来说,这似乎是很多工作。有没有更直接的方式来做到这一点?
9 个解决方案
#1
13
PostgreSQL introduced support for window functions in 8.4, the year after this question was asked. It's worth noting that it might be solved today as follows:
PostgreSQL在提出这个问题后的一年中引入了对8.4功能的窗口功能的支持。值得注意的是,今天可能会解决如下问题:
SELECT country, food_id
FROM (SELECT country, food_id, ROW_NUMBER() OVER (PARTITION BY country ORDER BY freq DESC) AS rn
FROM ( SELECT country, food_id, COUNT('x') AS freq
FROM country_foods
GROUP BY 1, 2) food_freq) ranked_food_req
WHERE rn = 1;
The above will break ties. If you don't want to break ties, you could use DENSE_RANK() instead.
以上将打破关系。如果你不想打破关系,你可以改用DENSE_RANK()。
#2
8
SELECT DISTINCT
"F1"."food",
"F1"."country"
FROM "foo" "F1"
WHERE
"F1"."food" =
(SELECT "food" FROM
(
SELECT "food", COUNT(*) AS "count"
FROM "foo" "F2"
WHERE "F2"."country" = "F1"."country"
GROUP BY "F2"."food"
ORDER BY "count" DESC
) AS "F5"
LIMIT 1
)
Well, I wrote this in a hurry and didn't check it really well. The sub-select might be pretty slow, but this is shortest and most simple SQL statement that I could think of. I'll probably tell more when I'm less drunk.
嗯,我写的很匆忙,并没有检查得很好。子选择可能非常慢,但这是我能想到的最短且最简单的SQL语句。当我喝醉的时候,我可能会告诉你更多。
PS: Oh well, "foo" is the name of my table, "food" contains the name of the food and "country" the name of the country. Sample output:
PS:哦,好吧,“foo”是我桌子的名字,“food”包含食物的名称,“country”包含国家名称。样本输出:
food | country
-----------+------------
Bratwurst | Germany
Fisch | Frankreich
#3
5
try this:
尝试这个:
Select Country, Food_id
From Munch T1
Where Food_id=
(Select Food_id
from Munch T2
where T1.Country= T2.Country
group by Food_id
order by count(Food_id) desc
limit 1)
group by Country, Food_id
#4
5
It is now even simpler: PostgreSQL 9.4 introduced the mode()
function:
它现在更简单:PostgreSQL 9.4引入了mode()函数:
select mode() within group (order by food_id)
from munch
group by country
returns (like user2247323's example):
返回(如user2247323的示例):
country | mode
--------------
GB | 3
US | 1
See documentation here: https://wiki.postgresql.org/wiki/Aggregate_Mode
请参阅此处的文档:https://wiki.postgresql.org/wiki/Aggregate_Mode
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
#5
3
Here's how to do it without any temp tables:
以下是如何在没有任何临时表的情况下执行此操作:
Edit: simplified
编辑:简化
select nf.country, nf.food_id as most_frequent_food_id
from national_foods nf
group by country, food_id
having
(country,count(*)) in (
select country, max(cnt)
from
(
select country, food_id, count(*) as cnt
from national_foods nf1
group by country, food_id
)
group by country
having country = nf.country
)
#6
3
SELECT country, MAX( food_id )
FROM( SELECT m1.country, m1.food_id
FROM munch m1
INNER JOIN ( SELECT country
, food_id
, COUNT(*) as food_counts
FROM munch m2
GROUP BY country, food_id ) as m3
ON m1.country = m3.country
GROUP BY m1.country, m1.food_id
HAVING COUNT(*) / COUNT(DISTINCT m3.food_id) = MAX(food_counts) ) AS max_foods
GROUP BY country
I don't like the MAX(.) GROUP BY to break ties... There's gotta be a way to incorporate eaten date into the JOIN in some way to arbitrarily select the most recent one...
我不喜欢MAX(。)GROUP BY来打破关系......必须有一种方法可以通过某种方式将吃掉的日期合并到JOIN中,以任意选择最新的...
I'm interested on the query plan for this thing if you run it on your live data!
如果您在实时数据上运行它,我对此事的查询计划感兴趣!
#7
3
select country,food_id, count(*) ne
from food f1
group by country,food_id
having count(*) = (select max(count(*))
from food f2
where country = f1.country
group by food_id)
#8
2
Try something like this
尝试这样的事情
select country, food_id, count(*) cnt
into #tempTbl
from mytable
group by country, food_id
select country, food_id
from #tempTbl as x
where cnt =
(select max(cnt)
from mytable
where country=x.country
and food_id=x.food_id)
This could be put all into a single select, but I don't have time to muck around with it right now.
这可以全部放入一个选项中,但我现在没有时间去处理它。
Good luck.
祝你好运。
#9
1
Here is a statement which I believe gives you what you want and is simple and concise:
这是一个声明,我相信它会给你你想要的东西,简单明了:
select distinct on (country) country, food_id
from munch
group by country, food_id
order by country, count(*) desc
Please let me know what you think.
请让我知道你在想什么。
BTW, the distinct on feature is only available in Postgres.
顺便说一句,独特的功能仅在Postgres中可用。
Example, source data:
示例,源数据:
country | food_id | eaten
US 1 2017-1-1
US 1 2017-1-1
US 2 2017-1-1
US 3 2017-1-1
GB 3 2017-1-1
GB 3 2017-1-1
GB 2 2017-1-1
output:
输出:
country | food_id
US 1
GB 3
#1
13
PostgreSQL introduced support for window functions in 8.4, the year after this question was asked. It's worth noting that it might be solved today as follows:
PostgreSQL在提出这个问题后的一年中引入了对8.4功能的窗口功能的支持。值得注意的是,今天可能会解决如下问题:
SELECT country, food_id
FROM (SELECT country, food_id, ROW_NUMBER() OVER (PARTITION BY country ORDER BY freq DESC) AS rn
FROM ( SELECT country, food_id, COUNT('x') AS freq
FROM country_foods
GROUP BY 1, 2) food_freq) ranked_food_req
WHERE rn = 1;
The above will break ties. If you don't want to break ties, you could use DENSE_RANK() instead.
以上将打破关系。如果你不想打破关系,你可以改用DENSE_RANK()。
#2
8
SELECT DISTINCT
"F1"."food",
"F1"."country"
FROM "foo" "F1"
WHERE
"F1"."food" =
(SELECT "food" FROM
(
SELECT "food", COUNT(*) AS "count"
FROM "foo" "F2"
WHERE "F2"."country" = "F1"."country"
GROUP BY "F2"."food"
ORDER BY "count" DESC
) AS "F5"
LIMIT 1
)
Well, I wrote this in a hurry and didn't check it really well. The sub-select might be pretty slow, but this is shortest and most simple SQL statement that I could think of. I'll probably tell more when I'm less drunk.
嗯,我写的很匆忙,并没有检查得很好。子选择可能非常慢,但这是我能想到的最短且最简单的SQL语句。当我喝醉的时候,我可能会告诉你更多。
PS: Oh well, "foo" is the name of my table, "food" contains the name of the food and "country" the name of the country. Sample output:
PS:哦,好吧,“foo”是我桌子的名字,“food”包含食物的名称,“country”包含国家名称。样本输出:
food | country
-----------+------------
Bratwurst | Germany
Fisch | Frankreich
#3
5
try this:
尝试这个:
Select Country, Food_id
From Munch T1
Where Food_id=
(Select Food_id
from Munch T2
where T1.Country= T2.Country
group by Food_id
order by count(Food_id) desc
limit 1)
group by Country, Food_id
#4
5
It is now even simpler: PostgreSQL 9.4 introduced the mode()
function:
它现在更简单:PostgreSQL 9.4引入了mode()函数:
select mode() within group (order by food_id)
from munch
group by country
returns (like user2247323's example):
返回(如user2247323的示例):
country | mode
--------------
GB | 3
US | 1
See documentation here: https://wiki.postgresql.org/wiki/Aggregate_Mode
请参阅此处的文档:https://wiki.postgresql.org/wiki/Aggregate_Mode
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
#5
3
Here's how to do it without any temp tables:
以下是如何在没有任何临时表的情况下执行此操作:
Edit: simplified
编辑:简化
select nf.country, nf.food_id as most_frequent_food_id
from national_foods nf
group by country, food_id
having
(country,count(*)) in (
select country, max(cnt)
from
(
select country, food_id, count(*) as cnt
from national_foods nf1
group by country, food_id
)
group by country
having country = nf.country
)
#6
3
SELECT country, MAX( food_id )
FROM( SELECT m1.country, m1.food_id
FROM munch m1
INNER JOIN ( SELECT country
, food_id
, COUNT(*) as food_counts
FROM munch m2
GROUP BY country, food_id ) as m3
ON m1.country = m3.country
GROUP BY m1.country, m1.food_id
HAVING COUNT(*) / COUNT(DISTINCT m3.food_id) = MAX(food_counts) ) AS max_foods
GROUP BY country
I don't like the MAX(.) GROUP BY to break ties... There's gotta be a way to incorporate eaten date into the JOIN in some way to arbitrarily select the most recent one...
我不喜欢MAX(。)GROUP BY来打破关系......必须有一种方法可以通过某种方式将吃掉的日期合并到JOIN中,以任意选择最新的...
I'm interested on the query plan for this thing if you run it on your live data!
如果您在实时数据上运行它,我对此事的查询计划感兴趣!
#7
3
select country,food_id, count(*) ne
from food f1
group by country,food_id
having count(*) = (select max(count(*))
from food f2
where country = f1.country
group by food_id)
#8
2
Try something like this
尝试这样的事情
select country, food_id, count(*) cnt
into #tempTbl
from mytable
group by country, food_id
select country, food_id
from #tempTbl as x
where cnt =
(select max(cnt)
from mytable
where country=x.country
and food_id=x.food_id)
This could be put all into a single select, but I don't have time to muck around with it right now.
这可以全部放入一个选项中,但我现在没有时间去处理它。
Good luck.
祝你好运。
#9
1
Here is a statement which I believe gives you what you want and is simple and concise:
这是一个声明,我相信它会给你你想要的东西,简单明了:
select distinct on (country) country, food_id
from munch
group by country, food_id
order by country, count(*) desc
Please let me know what you think.
请让我知道你在想什么。
BTW, the distinct on feature is only available in Postgres.
顺便说一句,独特的功能仅在Postgres中可用。
Example, source data:
示例,源数据:
country | food_id | eaten
US 1 2017-1-1
US 1 2017-1-1
US 2 2017-1-1
US 3 2017-1-1
GB 3 2017-1-1
GB 3 2017-1-1
GB 2 2017-1-1
output:
输出:
country | food_id
US 1
GB 3