Hive sql 每天场景题26-28

时间:2023-01-13 16:02:36

26、从商品信息表(sku_info)求出各分类商品价格的中位数,如果一个分类下的商品个数为偶数则输出中间两个值的平均值,如果是奇数,则输出中间数即可

结果如下:

category_id
<string>
(品类id)

medprice
<decimal(16,2)>
(中位数)

1

3500.00

2

550.00

3

75.00

需要用到的表:

商品信息表:sku_info

sku_id(商品id)

name(商品名称)

category_id(分类id)

from_date(上架日期)

price(商品价格)

1

xiaomi 10

1

2020-01-01

2000

6

洗碗机

2

2020-02-01

2000

9

自行车

3

2020-01-01

1000

代码

with t1 as (
-- 类别 商品 排名 各类别商品总数
select 
category_id
,sku_id
,price
,row_number() over (partition by category_id order by price ) as rn
,count(1) over (partition by category_id rows between unbounded preceding and unbounded following ) as cnt
from sku_info
)

select 
category_id
,cast(avg(price) as decimal(10,2) )  as medprice
from t1
-- 偶数取中间, 奇数取中间值
where if(cnt %2 = 0,rn in(cnt/2,cnt/2+1), rn = (cnt+1)/2)
group by category_id

27、从订单详情表(order_detail)中找出销售额连续3天超过100的商品

结果如下:

sku_id
<string>
(商品id)

1

10

11

12

2

3

4

5

6

7

8

9

需要用到的表:

订单明细表:order_detail

order_detail_id(订单明细id)

order_id(订单id)

sku_id(商品id)

create_date(下单日期)

price(商品单价)

sku_num(商品件数)

1

1

1

2021-09-30

2000.00

2

2

1

3

2021-09-30

5000.00

5

22

10

4

2020-10-02

6000.00

1

23

10

5

2020-10-02

500.00

24

24

10

6

2020-10-02

2000.00

5

with t as (
-- 商品 日期 销售额 下一行日期 下下一行日期
select 
sku_id
,create_date
,sum(price*sku_num) as sale_all
,lead(create_date,1,'null') over(partition by sku_id order by create_date) as next_date
,lead(create_date,2,'null') over(partition by sku_id order by create_date) as next_next_date
from order_detail
group by 
sku_id
,create_date
)
select distinct sku_id
from t
where sale_all >=100 and datediff(next_date,create_date)=1 and datediff(next_next_date,next_date)=1

28、从用户登录明细表(user_login_detail)中首次登录算作当天新增,第二天也登录了算作一日留存

结果如下:

first_login
<string>
(注册时间)

register
<string>
(新增用户数)

retention
<decimal(16,2)>
(留存率)

2021-09-21

1

0.00

2021-09-22

1

0.00

2021-09-23

1

0.00

2021-09-24

1

0.00

2021-09-25

1

0.00

2021-09-26

1

0.00

2021-09-27

1

0.00

2021-10-04

2

0.50

2021-10-06

1

0.00

需要用到的表:

用户登录明细表:user_login_detail

user_id(用户id)

ip_address(ip地址)

login_ts(登录时间)

logout_ts(登出时间)

101

180.149.130.161

2021-09-21 08:00:00

2021-09-27 08:30:00

102

120.245.11.2

2021-09-22 09:00:00

2021-09-27 09:30:00

103

27.184.97.3

2021-09-23 10:00:00

2021-09-27 10:30:00

代码

-- 用户 登录时间 注册日期  注册时间第二天 
with t1 as (
select 
  user_id
  ,substr(login_ts,1,10) as login_ts
  ,min(login_ts) over(partition by user_id) as date1
  ,lead(login_ts,1,'null') over(partition by user_id order by login_ts) as date2
  from 
  (select distinct user_id,substr(login_ts,1,10) as login_ts 
   from user_login_detail)a
)
,t2 as (
  -- 用户 登录时间 注册日期  注册时间第二天  是否新增 是否留存
select 
*
,if(login_ts=date1,1,0) as is_new 
,if(datediff(date2,date1)=1,1,0) as is_retention
from t1  
)
-- 注册日期 注册期内新增数  注册期内留存数的用户/注册期内注册的所有用户

select 
date1 as first_login
,sum(is_new) as register 
,cast (sum(is_retention)/count(distinct user_id) as decimal(10,2))as retention
from t2
group by date1