如何根据值是否构成计数的大部分来创建一个取值列值的标志?

时间:2021-09-04 19:14:50

I have a table that looks like this...

我有一张看起来像这样的桌子......

id    city        date
1     chicago     5/1
1     chicago     5/2
1     new york    5/1
2     new york    5/3
2     seattle      .
3     chicago      .
4     seattle      .
4     seattle      .

And I want to create a third column that takes the value of 'city' where the specific city makes up the majority (>51%) of the number of entries a single ID has. So for example, id #1 would have favorite_city = 'chicago'. I'm not sure where to even start...

我想创建一个第三列,它取“城市”的值,其中特定城市占单个ID所具有的条目数的大多数(> 51%)。例如,id#1会有favorite_city ='chicago'。我不确定从哪里开始......

Help is much appreciated. Thanks!

非常感谢帮助。谢谢!

3 个解决方案

#1


WITH
  summary As
(
  SELECT
    your_table.*,
    COUNT(*) OVER (PARTITION BY id)  AS id_count,
    COUNT(*) OVER (PARTITION BY id, city)  AS id_city_count
  FROM
    your_table
)
SELECT
  summary.*,
  MAX(
    CASE WHEN id_city_count * 2 > id_count THEN city ELSE NULL END
  ) 
  OVER (PARTITION BY id)
FROM
  summary

#2


This works fine but gives all cities (not unique one) for id which have equal count of cities,

这工作正常,但给所有城市(不是唯一的)具有相同数量的城市,

with a as( select * from (
select id, city, nb, 
     rank() OVER (PARTITION BY id ORDER BY nb DESC) as rnk
from(
select id, city, count(city) nb 
  from test
group by id, city)as t group by id, city,nb) as tt where rnk =1)
select test.id as id, test.city as city, a.city as favcity from
test, a where test.id= a.id

Life demo and output HERE

生活演示和输出在这里

#3


Assuming you've already added new column to your table (which name in my example is test) you can run:

假设您已经在表中添加了新列(我的示例中的名称是test),您可以运行:

update test t
    set t.favorite_city= 
        case
            when 
                (select c.count from (select count(1) from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)/ 
                (select count(1) from test t_all where t_all.id=t.id) > 0.5
            then 
                (select c.city from (select count(1), city from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)
            else
                null
        end;

#1


WITH
  summary As
(
  SELECT
    your_table.*,
    COUNT(*) OVER (PARTITION BY id)  AS id_count,
    COUNT(*) OVER (PARTITION BY id, city)  AS id_city_count
  FROM
    your_table
)
SELECT
  summary.*,
  MAX(
    CASE WHEN id_city_count * 2 > id_count THEN city ELSE NULL END
  ) 
  OVER (PARTITION BY id)
FROM
  summary

#2


This works fine but gives all cities (not unique one) for id which have equal count of cities,

这工作正常,但给所有城市(不是唯一的)具有相同数量的城市,

with a as( select * from (
select id, city, nb, 
     rank() OVER (PARTITION BY id ORDER BY nb DESC) as rnk
from(
select id, city, count(city) nb 
  from test
group by id, city)as t group by id, city,nb) as tt where rnk =1)
select test.id as id, test.city as city, a.city as favcity from
test, a where test.id= a.id

Life demo and output HERE

生活演示和输出在这里

#3


Assuming you've already added new column to your table (which name in my example is test) you can run:

假设您已经在表中添加了新列(我的示例中的名称是test),您可以运行:

update test t
    set t.favorite_city= 
        case
            when 
                (select c.count from (select count(1) from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)/ 
                (select count(1) from test t_all where t_all.id=t.id) > 0.5
            then 
                (select c.city from (select count(1), city from test t_freq where t_freq.id=t.id group by city) as c order by 1 desc limit 1)
            else
                null
        end;