spark sql中的first函数在多个字段使用实例

时间:2022-09-16 19:53:55

1.建立hive表如下:

CREATE EXTERNAL TABLE `newsapp.test_first`(
`userkey` string,
`publish_id` string,
`data_type` string,
`soft_version` string,
`ua` string,
`mos` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://ifhoffl/user/hive/warehouse/newsapp.db/test_first'
TBLPROPERTIES (
'transient_lastDdlTime'='1489126869')

  2.插入数据 

从一个表中查出数据 插入表中 数据是:

867463027018201 2006 2017012420 5.2.4 mi_4lte android_4.4.4
8516c9ebd4073a5281e74ecbf2af6297e159b98d400220170124205.4.1iphone8_1iphone_10.2
861600033349692610220170124205.5.0oppo_r9mandroid_5.1
867822021100820600120170124205.5.0hm_note_1sandroid_4.4.4
99000872898896600120170124205.5.0redmi_note_3android_6.0.1
860482037942991610920170124205.5.0huawei_nxt-al10android_7.0
862051031463029332920170124205.5.0huawei_tit-al00android_5.1
861545034465692188420170124205.5.0le_x620android_6.0
862095023392928610320170124205.5.0mx4android_5.1
e04b9d165757e4ef4d3f66dbb02bdfe1f689ad7b400220170124205.4.1iphone8_2iphone_10.2
869609021094139332920170124205.3.0huawei_tit-al00android_5.1
861575030737703610920170124205.5.0huawei_tag-al00android_5.1
860671021446080600120170124205.4.12013022android_4.2.2
353254060442122201120170124205.5.0l50uandroid_4.4.2
d5889858814c200b2d8815a8e5845ad2dab468b5400220170124205.4.1iphone8_1iphone_10.2
869573021015355610920170124205.5.0huawei_crr-ul00android_6.0
c9d4113fce3987ff199420170124205.4.1gn5001sandroid_5.1
864181027315061610220170124205.5.0n5117android_4.3
d18d7cf2899b5dd98fdadfbeb604cb32dc90a002400220170124205.4.1iphone6_2iphone_10.2
869511022231761202420170124205.4.0huawei_gra-tl00android_6.0

插入表中:

insert into table newsapp.test_first
select userkey, publish_id ,date ,soft_version, ua, mos from newsapp.stats_log where date like '%' and data_type = 'newsapp' and action in ( 'page','v','action') limit 20

 

3.在scala_shell中使用fist()函数用于多个字段,也是只选出一条

 

hive (newsapp)> select * from test_first ;
OK
863792031896895610920170214225.5.0cam-al00android_6.0
863730034740413600120170214225.5.0redmi_note_4android_6.0
A100004E58DD35610220170214225.4.1oppo_r7smandroid_5.1.1
862534037030736200220170214225.5.0redmi_note_3android_5.1.1
867628023774987610920170214225.5.0plk-tl01handroid_6.0
863410031072801600120170214225.5.0mi_5sandroid_6.0.1
357390061934547260420170214225.5.0sm-n9100android_6.0.1
867905022664884610320170214225.5.0pro_5android_5.1
862979030425377610120170214225.5.0vivo_x7android_5.1.1
A000005E4F58D1610920170214225.5.0scl-cl00android_5.1.1
868486029009174610220170214225.5.0oppo_a33android_5.1.1
869158022640854610920170214225.4.1eva-al00android_6.0
861533036737193610920170214225.2.4eva-al10android_7.0

E7NDU15A20002276610920170214225.3.1s8-701wandroid_4.3
A0000059C57038610920170214225.5.0kiw-cl00android_5.1.1
352107069757323202420170214224.4.6sm-g9008vandroid_4.4.2
A00000597295D2332920170214225.4.1huawei_tag-al00android_5.1
860777033176332201120170214225.4.1f100android_5.1
868979029557051610220170214225.5.0oppo_r7android_4.4.4
866486022346468325620170214225.1.02014813android_4.4.4
Time taken: 0.042 seconds, Fetched: 20 row(s)

  

scala>  var df = spark.sql(s"select publish_id, first(userkey) as userkey,first(data_type) as date,first(soft_version) as soft_version,first(ua) as ua,first(mos) as mos  from newsapp.test_first group by publish_id")
df: org.apache.spark.sql.DataFrame = [publish_id: string, userkey: string ... 4 more fields]

scala> df.show()
+----------+---------------+----------+------------+---------------+-------------+
|publish_id| userkey| date|soft_version| ua| mos|
+----------+---------------+----------+------------+---------------+-------------+
| 6109|863792031896895|2017021422| 5.5.0| cam-al00| android_6.0|
| 6101|862979030425377|2017021422| 5.5.0| vivo_x7|android_5.1.1|
| 2604|357390061934547|2017021422| 5.5.0| sm-n9100|android_6.0.1|
| 2002|862534037030736|2017021422| 5.5.0| redmi_note_3|android_5.1.1|
| 3329| A00000597295D2|2017021422| 5.4.1|huawei_tag-al00| android_5.1|
| 6001|863730034740413|2017021422| 5.5.0| redmi_note_4| android_6.0|
| 2011|860777033176332|2017021422| 5.4.1| f100| android_5.1|
| 3256|866486022346468|2017021422| 5.1.0| 2014813|android_4.4.4|
| 2024|352107069757323|2017021422| 4.4.6| sm-g9008v|android_4.4.2|
| 6102| A100004E58DD35|2017021422| 5.4.1| oppo_r7sm|android_5.1.1|
| 6103|867905022664884|2017021422| 5.5.0| pro_5| android_5.1|
+----------+---------------+----------+------------+---------------+-------------+

  还可以group by多个字段

scala>  var df = spark.sql(s"select soft_version, first(userkey) as userkey,first(data_type) as date,first(publish_id) as publish_id,first(ua) as ua,mos  from newsapp.test_first group by soft_version,mos");
df: org.apache.spark.sql.DataFrame = [soft_version: string, userkey: string ... 4 more fields]

scala> df.show()
17/03/16 13:23:11 WARN DFSClient: Slow ReadProcessor read fields took 177140ms (threshold=30000ms); ack: seqno: 131 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 635983, targets: [10.90.11.27:50010, 10.90.11.17:50010, 10.90.11.16:50010]
+------------+----------------+----------+----------+---------------+-------------+
|soft_version| userkey| date|publish_id| ua| mos|
+------------+----------------+----------+----------+---------------+-------------+
| 5.4.1| 869158022640854|2017021422| 6109| eva-al00| android_6.0|
| 5.4.1| A100004E58DD35|2017021422| 6102| oppo_r7sm|android_5.1.1|
| 5.2.4| 861533036737193|2017021422| 6109| eva-al10| android_7.0|
| 5.5.0| 868979029557051|2017021422| 6102| oppo_r7|android_4.4.4|
| 5.4.1| A00000597295D2|2017021422| 3329|huawei_tag-al00| android_5.1|
| 4.4.6| 352107069757323|2017021422| 2024| sm-g9008v|android_4.4.2|
| 5.3.1|E7NDU15A20002276|2017021422| 6109| s8-701w| android_4.3|
| 5.5.0| 863792031896895|2017021422| 6109| cam-al00| android_6.0|
| 5.5.0| 867905022664884|2017021422| 6103| pro_5| android_5.1|
| 5.1.0| 866486022346468|2017021422| 3256| 2014813|android_4.4.4|
| 5.5.0| 862534037030736|2017021422| 2002| redmi_note_3|android_5.1.1|
| 5.5.0| 863410031072801|2017021422| 6001| mi_5s|android_6.0.1|
+------------+----------------+----------+----------+---------------+-------------+