I have a table that looks like this...
我有一张看起来像这样的桌子......
id city date
1 chicago 5/1
1 chicago 5/2
1 new york 5/1
2 new york 5/3
2 seattle .
3 chicago .
4 seattle .
4 seattle .
And I want to create a third column that takes the value of 'city' where the specific city makes up the majority (>51%) of the number of entries a single ID has. So for example, id #1 would have favorite_city = 'chicago'. I'm not sure where to even start...
我想创建一个第三列,它取“城市”的值,其中特定城市占单个ID所具有的条目数的大多数(> 51%)。例如,id#1会有favorite_city ='chicago'。我不确定从哪里开始......
Help is much appreciated. Thanks!
非常感谢帮助。谢谢!
3 个解决方案
#1
WITH
summary As
(
SELECT
your_table.*,
COUNT(*) OVER (PARTITION BY id) AS id_count,
COUNT(*) OVER (PARTITION BY id, city) AS id_city_count
FROM
your_table
)
SELECT
summary.*,
MAX(
CASE WHEN id_city_count * 2 > id_count THEN city ELSE NULL END
)
OVER (PARTITION BY id)
FROM
summary
#2
This works fine but gives all cities (not unique one) for id which have equal count of cities,
这工作正常,但给所有城市(不是唯一的)具有相同数量的城市,
with a as( select * from (
select id, city, nb,
rank() OVER (PARTITION BY id ORDER BY nb DESC) as rnk
from(
select id, city, count(city) nb
from test
group by id, city)as t group by id, city,nb) as tt where rnk =1)
select test.id as id, test.city as city, a.city as favcity from
test, a where test.id= a.id
Life demo and output HERE
生活演示和输出在这里
#3
Assuming you've already added new column to your table (which name in my example is test
) you can run:
假设您已经在表中添加了新列(我的示例中的名称是test),您可以运行:
update test t
set t.favorite_city=
case
when
(select c.count from (select count(1) from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)/
(select count(1) from test t_all where t_all.id=t.id) > 0.5
then
(select c.city from (select count(1), city from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)
else
null
end;
#1
WITH
summary As
(
SELECT
your_table.*,
COUNT(*) OVER (PARTITION BY id) AS id_count,
COUNT(*) OVER (PARTITION BY id, city) AS id_city_count
FROM
your_table
)
SELECT
summary.*,
MAX(
CASE WHEN id_city_count * 2 > id_count THEN city ELSE NULL END
)
OVER (PARTITION BY id)
FROM
summary
#2
This works fine but gives all cities (not unique one) for id which have equal count of cities,
这工作正常,但给所有城市(不是唯一的)具有相同数量的城市,
with a as( select * from (
select id, city, nb,
rank() OVER (PARTITION BY id ORDER BY nb DESC) as rnk
from(
select id, city, count(city) nb
from test
group by id, city)as t group by id, city,nb) as tt where rnk =1)
select test.id as id, test.city as city, a.city as favcity from
test, a where test.id= a.id
Life demo and output HERE
生活演示和输出在这里
#3
Assuming you've already added new column to your table (which name in my example is test
) you can run:
假设您已经在表中添加了新列(我的示例中的名称是test),您可以运行:
update test t
set t.favorite_city=
case
when
(select c.count from (select count(1) from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)/
(select count(1) from test t_all where t_all.id=t.id) > 0.5
then
(select c.city from (select count(1), city from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)
else
null
end;