Spark+Hadoop+Hive集群上数据操作记录

时间:2022-08-22 10:23:46
[rc@vq18ptkh01 ~]$ hadoop fs -ls /
drwxr-xr-x+ - jc_rc supergroup 0 2016-11-03 11:46 /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1030.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1031.csv /dt
[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1101.csv /dt [rc@vq18ptkh01 ~]$ hadoop fs -ls /dt
16/11/03 11:53:16 INFO hdfs.PeerCache: SocketCache disabled.
Found 3 items
-rw-r--r--+ 3 jc_rc supergroup 1548749 2016-11-03 11:48 /dt/wifi_phone_list_1030.csv
-rw-r--r--+ 3 jc_rc supergroup 1262964 2016-11-03 11:52 /dt/wifi_phone_list_1031.csv
-rw-r--r--+ 3 jc_rc supergroup 979619 2016-11-03 11:52 /dt/wifi_phone_list_1101.csv [rc@vq18ptkh01 ~]$ beeline
Connecting to jdbc:hive2://1.8.15.1:24002,10.78.152.24:24002,1.8.15.2:24002,1.8.12.42:24002,1.8.15.62:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/hadoop.hadoop.com@HADOOP.COM
Debug is true storeKey false useTicketCache true useKeyTab false doNotPrompt false ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is jc_rc@HADOOP.COM
Commit Succeeded Connected to: Apache Hive (version 1.3.0)
Driver: Hive JDBC (version 1.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.3.0 by Apache Hive
0: jdbc:hive2://1.8.15.2:21066/> use r_hive_db;
No rows affected (0.547 seconds) 0: jdbc:hive2://1.8.15.2:21066/> create table tmp_wifi1030(imisdn string,starttime string,endtime string) row format delimited fields terminated by ',' stored as textfile;
0: jdbc:hive2://1.8.15.2:21066/> show tables; [rc@vq18ptkh01 ~]$ wc wifi_phone_list_1030.csv -l
25390 wifi_phone_list_1030.csv
+---------------+--+
| tab_name |
+---------------+--+
| tmp_wifi1030 |
+---------------+--+
1 row selected (0.401 seconds)
0: jdbc:hive2://1.8.15.2:21066/> load data inpath 'hdfs:/dt/wifi_phone_list_1030.csv' into table tmp_wifi1030;
0: jdbc:hive2://1.8.15.2:21066/> select * from tmp_wifi1030;
| tmp_wifi1030.imisdn | tmp_wifi1030.starttime | tmp_wifi1030.endtime |
+----------------------+--------------------------+--------------------------+--+
| 18806503523 | 2016-10-30 23:58:56.000 | 2016-10-31 00:01:07.000 |
| 15700125216 | 2016-10-30 23:58:57.000 | 2016-10-31 00:01:49.000 |
+----------------------+--------------------------+--------------------------+--+
25,390 rows selected (5.649 seconds) 0: jdbc:hive2://1.8.15.2:21066/> select count(*) from tmp_wifi1030;
INFO : Number of reduce tasks determined at compile time: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_1475071482566_2471703
INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19416140 for jc_rc)
INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 57 df 96 8a 01 58 4c 64 63 96 8d 0d 65 ff 8e 03 97
INFO : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO : Starting Job = job_1475071482566_2471703, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2471703/
INFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job -kill job_1475071482566_2471703
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2016-11-03 12:04:58,351 Stage-1 map = 0%, reduce = 0%
INFO : 2016-11-03 12:05:04,702 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.72 sec
INFO : 2016-11-03 12:05:12,096 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.86 sec
INFO : MapReduce Total cumulative CPU time: 4 seconds 860 msec
INFO : Ended Job = job_1475071482566_2471703
+--------+--+
| _c0 |
+--------+--+
| 25390 |
+--------+--+
1 row selected (25.595 seconds) 0: jdbc:hive2://1.8.15.62:21066/> select * from default.d_s1mme limit 10;
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
| d_s1mme .length | d_s1mme .city 。。。。。。。。。。。。 | | 2016101714 |
| NULL | 579 | 5 | 130980097fb8c900 | 6 | 460006791248581 | 352093070081343 | 88888888888888888 | 20 | 2016-10-17 13:30:23.0 | 2016-10-17 13:30:23.0 | 0 | 20 | NULL | 0 | 209743848 | 419 | 32 | D5095073 | NULL | NULL | NULL | 100.67.254.45 | 100.111.211.166 | 36412 | 36412 | 589D | BAE6802 | NULL | NULL | NULL | 0 | NULL | NULL | NULL | NULL | NULL | NULL | | | | | | | | | | | 2016101714 |
+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+
10 rows selected (0.6 seconds)

create table tmp_mr_s1_mme1030 as
select a.length,a.city,a.interface,a.xdr_id,a.rat,a.imsi,a.imei,a.msisdn,a.procedure_start_time,a.procedure_end_time,a.mme_ue_s1ap_id,a.mme_group_id,a.mme_code,a.user_ipv4,a.tac,a.cell_id,a.other_tac,a.other_eci
from default.d_s1mme a join r_hive_db.tmp_wifi1030 b on a.msisdn=b.imisdn and a.p_hour>='20161030' and a.p_hour<'20161031'; 0: jdbc:hive2://1.8.15.2:21066/> create table tmp_mr_s1_mme_enbs1030 as
0: jdbc:hive2://1.8.15.2:21066/> select cell_id/256 from tmp_mr_s1_mme1030;
0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme_cellids1030 as select distinct cast(cell_id as bigint) as cellid from tmp_mr_s1_mme1030; 0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapfiles;
+---------------------------+--+
| set |
+---------------------------+--+
| hive.merge.mapfiles=true |
+---------------------------+--+
1 row selected (0.022 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapredfields;
+---------------------------------------+--+
| set |
+---------------------------------------+--+
| hive.merge.mapredfields is undefined |
+---------------------------------------+--+
1 row selected (0.022 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.size.per.task=1024000000;
No rows affected (0.012 seconds)
0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.smallfiles.avgsize=1024000000;
No rows affected (0.012 seconds)
0: jdbc:hive2://1.8.15.62:21066/> use r_hive_db;
No rows affected (0.031 seconds)
0: jdbc:hive2://1.8.15.62:21066/> insert overwrite directory '/dt/' row format delimited fields terminated by '|' select * from tmp_mr_s1_mme_cellids1030;
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:17
INFO : Submitting tokens for job: job_1475071482566_2477152
INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422634 for jc_rc)
INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98
INFO : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477152/
INFO : Starting Job = job_1475071482566_2477152, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477152/
INFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job -kill job_1475071482566_2477152
INFO : Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 0
INFO : 2016-11-03 14:40:52,492 Stage-1 map = 0%, reduce = 0%
INFO : 2016-11-03 14:40:58,835 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 28.78 sec
INFO : 2016-11-03 14:40:59,892 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 33.55 sec
INFO : 2016-11-03 14:41:10,486 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 37.13 sec
INFO : 2016-11-03 14:41:11,549 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 41.13 sec
INFO : MapReduce Total cumulative CPU time: 41 seconds 130 msec
INFO : Ended Job = job_1475071482566_2477152
INFO : Stage-3 is filtered out by condition resolver.
INFO : Stage-2 is selected by condition resolver.
INFO : Stage-4 is filtered out by condition resolver.
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1475071482566_2477181
INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422663 for jc_rc)
INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98
INFO : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477181/
INFO : Starting Job = job_1475071482566_2477181, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477181/
INFO : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job -kill job_1475071482566_2477181
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
INFO : 2016-11-03 14:41:22,190 Stage-2 map = 0%, reduce = 0%
INFO : 2016-11-03 14:41:28,571 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.2 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 200 msec
INFO : Ended Job = job_1475071482566_2477181
INFO : Moving data to directory /dt from hdfs://hacluster/dt/.hive-staging_hive_2016-11-03_14-40-43_774_4317869403646242426-140183/-ext-10000
No rows affected (46.604 seconds) [rc@vq18ptkh01 dt]$ hadoop fs -ls /dt
16/11/03 14:46:18 INFO hdfs.PeerCache: SocketCache disabled.
Found 1 items
-rwxrwxrwx+ 3 jc_rc supergroup 26819 2016-11-03 14:41 /dt/000000_0
[rc@vq18ptkh01 dt]$ hadoop fs -copyToLocal /dt/000000_0
16/11/03 14:46:33 INFO hdfs.PeerCache: SocketCache disabled.
[rc@vq18ptkh01 dt]$ ls
000000_0
[rc@vq18ptkh01 dt]$ [rc@vq18ptkh01 dt]$ ls
000000_0 000001_0 000002_0 000003_0 000004_0 000005_0
[rc@vq18ptkh01 dt]$ ftp 10.70.41.126 21
Connected to 10.70.41.126 (10.70.41.126).
220 10.70.41.126 FTP server ready
Name (10.70.41.126:rc): joy
331 Password required for joy.
Password:
230 User joy logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> put 000000_0 /Temp/a_dt/
local: 000000_0 remote: /Temp/a_dt/
227 Entering Passive Mode (10,70,41,126,168,163).
550 /Temp/a_dt/: Not a regular file
ftp> put
(local-file) 000000_0
(remote-file) /Temp/a_dt/000000_0
local: 000000_0 remote: /Temp/a_dt/000000_0
227 Entering Passive Mode (10,70,41,126,168,207).
150 Opening BINARY mode data connection for /Temp/a_dt/000000_0
226 Transfer complete.
1049905992 bytes sent in 33 secs (31787.20 Kbytes/sec)
ftp> put 000001_0 /Temp/a_dt/000001_0
local: 000001_0 remote: /Temp/a_dt/000001_0
227 Entering Passive Mode (10,70,41,126,168,255).
150 Opening BINARY mode data connection for /Temp/a_dt/000001_0
452 Transfer aborted. No space left on device
ftp> put 000002_0 /Temp/a_dt/000002_0
local: 000002_0 remote: /Temp/a_dt/000002_0
227 Entering Passive Mode (10,70,41,126,169,20).
150 Opening BINARY mode data connection for /Temp/a_dt/000002_0
452 Transfer aborted. No space left on device
ftp> put 000003_0 /Temp/a_dt/000003_0
local: 000003_0 remote: /Temp/a_dt/000003_0
227 Entering Passive Mode (10,70,41,126,169,40).
150 Opening BINARY mode data connection for /Temp/a_dt/000003_0
452 Transfer aborted. No space left on device
ftp> put 000004_0 /Temp/a_dt/000004_0
local: 000004_0 remote: /Temp/a_dt/000004_0
227 Entering Passive Mode (10,70,41,126,169,66).
150 Opening BINARY mode data connection for /Temp/a_dt/000004_0
452 Transfer aborted. No space left on device
ftp> put 000005_0 /Temp/a_dt/000005_0
local: 000005_0 remote: /Temp/a_dt/000005_0
227 Entering Passive Mode (10,70,41,126,169,85).
150 Opening BINARY mode data connection for /Temp/a_dt/000005_0
226 Transfer complete.
23465237 bytes sent in 0.747 secs (31391.79 Kbytes/sec)
ftp>

查询hdfs文件内容,如果文件过大时不能一次加载,可以使用:

hadoop fs -cat /user/my/ab.txt |more