hive笔记:时间格式的统一

时间:2021-07-10 15:19:15

一、string类型,年月日部分包含的时间统一格式:

原数据格式(时间字段为string类型) 取数时间和格式的语法
 2018-11-01 00:12:49.0 substr(regexp_replace(created_at,'-',''),0,8)>='20181101'
month=201809,day=01 concat(month,day)>= '20180901'
dt=181101 concat('20',a.dt)>=‘20181101’

二、日期函数(时间戳)以及各种格式的时间截取,转换方法

1.from_unixtime(bigint unixtime[, string format]):将是将戳转化为日期

将时间的秒值转换成format格式(format可为“yyyy-MM-dd hh:mm:ss”,“yyyy-MM-dd hh”,“yyyy-MM-dd hh:mm”等等)如from_unixtime(1250111000,"yyyy-MM-dd") 得到2009-03-12

(1)时间戳为13位的情况:

hive中from_unixtime可以将一个时间戳转为时间格式,如:
hive> select from_unixtime(1445391280,'yyyy-MM-dd HH:mm:ss');
2015-10-21 09:34:40
问题:
其中第一个参数为bigint型数据,一般是10位的,遇到13位的时间戳,需要去掉最后三位才行,但是bigint型数据不支持直接算数运算,也不支持字符串截取
如,13位时间戳直接转换
hive> select from_unixtime(1445391280000,'yyyy-MM-dd HH:mm:ss');
47772-08-17 01:46:40
两种方法处理此问题:
a.一种是将bigint型数据先转成double型计算之后再转成bigint型,
   hive> select from_unixtime(cast(cast(1445391280000 as double)/1000 as bigint),'yyyy-MM-dd HH:mm:ss');
   2015-10-21 09:34:40
b.另一种是将bigint型数据转成string型,截取之后再转回bigint型。
   hive> select from_unixtime(cast(substr(cast(1445391280 as string),1,10) as bigint),'yyyy-MM-dd HH:mm:ss');
   2015-10-21 09:34:40

(2) 案例:时间戳为13位的情况

 %jdbc(hive)
     select from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-mm-dd') as dt1,
     from_unixtime(cast(substr(pc.ttl,0,10) as int),'yy-MM-dd HH:mm:ss')  as dt2,
     from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyyy-mm-dd') as dt3,
     from_unixtime(cast(substr(pc.ttl,0,10) as int),'yyyy-MM-dd HH:mm:ss')  as dt4
 from xxxx  pc
hive笔记:时间格式的统一

(3)yy-MM-dd和yyMMdd时分秒的划取方法(注意本表中的ttl为string类型)

a.yyMMdd:

       %jdbc(hive)
       select from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyMMdd HH:mm:ss') as dt1,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyMMdd HH:mm') as dt2,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyMMdd HH') as dt3,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyMMdd') as dt4,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yyMM') as dt3,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy') as dt4
       from xxxx  pc
 hive笔记:时间格式的统一

b.yy-MM-dd:

%jdbc(hive)
select from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-MM-dd HH:mm:ss') as dt1,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-MM-dd HH:mm') as dt2,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-MM-dd HH') as dt3,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-MM-dd') as dt4,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy-MM') as dt5,
       from_unixtime(cast(cast(pc.ttl as bigint) / 1000 as bigint), 'yy') as dt6
       from xxxx  pc
       hive笔记:时间格式的统一

2.unix_timestamp获取当前UNIX时间戳函数:(将日期转化为时间戳)

(1) unix_timestamp()

返回值:   bigint
说明: 获得当前时区的UNIX时间戳

(2) unix_timestamp(string date)

返回值:   bigint
说明: 转换格式为“yyyy-MM-dd HH:mm:ss“的日期到UNIX时间戳。如果转化失败,则返回0。

(3)unix_timestamp(string date, string pattern)

返回值: bigint
说明: 转换pattern格式的日期到UNIX时间戳。如果转化失败,则返回0。

(4)案例如下(yy-MM-dd和yyMMdd两种时间格式):最好使用unix_timestamp(string date, string pattern)转化,表明时间格式

       a.yyMMdd

%jdbc(hive)  
select  a.dt as time,unix_timestamp(a.dt) as time1,
           unix_timestamp(a.dt,'yyMMdd') as time2,
           concat('20',a.dt) as dt0,unix_timestamp(concat('20',a.dt)) as dt1,
           unix_timestamp(concat('20',a.dt),'yyyyMMdd') as dt2
from track.click a
where concat('20',a.dt)>='20181101' and concat('20',a.dt)<='20181103'
limit 100
hive笔记:时间格式的统一

b.yyyy-MM-dd

%jdbc(hive)
select created_at,
     unix_timestamp(created_at) created_at1,
     substr(created_at,1,10)as dt0,
     unix_timestamp(substr(created_at,1,10))dt1,
     unix_timestamp(substr(created_at,1,10),'yyyy-MM-dd') dt2
from trial_sdk.device
where created_at>='2018-11-01' and created_at<='2018-11-03'
limit 100
hive笔记:时间格式的统一

3.yymmdd和yy-mm-dd日期的切换

方法1: from_unixtime+ unix_timestamp

a.20171205转成2017-12-05
select from_unixtime(unix_timestamp('20171205','yyyymmdd'),'yyyy-mm-dd') from dual;
b.2017-12-05转成20171205
select from_unixtime(unix_timestamp('2017-12-05','yyyy-mm-dd'),'yyyymmdd') from dual;
如:from_unixtime(unix_timestamp(ts),'yyMMdd')
其中ts类型为timestamp,2018-10-11 04:05:29.028

方法2: substr + concat

a.20171205转成2017-12-05
select concat(substr('20171205',1,4),'-',substr('20171205',5,2),'-',substr('20171205',7,2))
from dual;
b.2017-12-05转成20171205
select concat(substr('2017-12-05',1,4),substr('2017-12-05',6,2),substr('2017-12-05',9,2))
from dual;

hive笔记:时间格式的统一