Sqoop2 MySQL与HDFS导入导出

时间:2021-07-05 07:50:55

本篇博客主要介绍sqoop2中如何进行数据的导入导出,笔者使用的版本为1.99.5,安装及配置请参考博客:http://blog.csdn.net/hg_harvey/article/details/77803520

Sqoop 命令使用官网:
http://sqoop.apache.org/docs/1.99.5/CommandLineClient.html

Sqoop 导入导出官网:
https://sqoop.apache.org/docs/1.99.5/Sqoop5MinutesDemo.html

在使用sqoop导入导出数据前,先使用如下命令来查看下详情模式的状态(默认为false)

show option --name verbose

Sqoop2 MySQL与HDFS导入导出

如果想要在客户端打印更多的详细信息,可以使用如下命令来设置(出现错误,方便排查)

set option --name verbose --value true

Sqoop2 MySQL与HDFS导入导出

Sqoop 1.99.5版本中支持的连接器如下

show connector

Sqoop2 MySQL与HDFS导入导出

查看link

show link

Sqoop2 MySQL与HDFS导入导出

查看job

show job

Sqoop2 MySQL与HDFS导入导出

上面可以看到是没有link以及和job的,下面我们开始使用sqoop2进行数据的导入导出

1.创建link对象

  • (1).创建mysql的link对象
sqoop:000> create link -c 1
Creating link for connector with id 1
Please fill following values to create new link object
Name: mysql-link

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.242.1:3306/test
Username: root
Password: ****
JDBC Connection Properties:
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 1
  • (2).创建hdfs的link对象
sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-mysql-link

Link configuration

HDFS URI: hdfs://192.168.242.166:9000
New link was successfully created with validation status OK and persistent id 2
  • (3).查看创建的link对象
sqoop:000> show link
+----+------------+-----------+---------+

| Id | Name | Connector | Enabled |
+----+------------+-----------+---------+

| 1 | mysql-link | 1 | true |
| 2 | hdfs-link | 3 | true |
+----+------------+-----------+---------+

详细信息:

sqoop:000> show link -a    
2 link(s) to show:
link with id 1 and name mysql-link (Enabled: true, Created by hadoop at 17-9-4 下午2:16, Updated by hadoop at 17-9-4 下午2:16)
Using Connector id 1
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.242.1:3306/test
Username: root
Password:
JDBC Connection Properties:
link with id 2 and name hdfs-link (Enabled: true, Created by hadoop at 17-9-4 下午2:18, Updated by hadoop at 17-9-4 下午2:19)
Using Connector id 3
Link configuration
HDFS URI: hdfs://192.168.242.166:9000

2.从MySQL导入数据到HDFS

实现需求:
将mysql表tb_student的数据导入到HDFS的/user/hadoop/sqoop/tb_student路径下

  • (1).创建job
sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: from-mysql-to-hdfs-import

From database configuration

Schema name: test
Table name: tb_student
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:

ToJob configuration

Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /user/hadoop/sqoop/tb_student

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK and persistent id 1
  • (2).查看创建的job
sqoop:000> show job
+----+---------------------------+----------------+--------------+---------+

| Id | Name | From Connector | To Connector | Enabled |
+----+---------------------------+----------------+--------------+---------+

| 1 | from-mysql-to-hdfs-import | 1 | 3 | true |
+----+---------------------------+----------------+--------------+--------`
+

详细信息:

sqoop:000> show job -jid 1
1 job(s) to show:
Job with id 1 and name from-mysql-to-hdfs-import (Enabled: true, Created by hadoop at 17-9-4 下午2:21, Updated by hadoop at 17-9-4 下午2:21)
Using link id 1 and Connector id 1
From database configuration
Schema name: test
Table name: tb_student
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:
Throttling resources
Extractors: 1
Loaders: 1
ToJob configuration
Override null value:
Null value:
Output format: TEXT_FILE
Compression format: NONE
Custom compression format:
Output directory: /user/hadoop/sqoop/tb_student
  • (3).启动任务(执行MapReduce作业)
sqoop:000> start job -j 1 -s 
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-09-04 14:24:23 CST
Lastly updated by: hadoop
External ID: job_1504505043464_0001
http://hdpcomprs:8088/proxy/application_1504505043464_0001/
Source Connector schema: Schema{name=test.tb_student,columns=[
FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=name,nullable=true,type=TEXT,charSize=null},
Text{name=sex,nullable=true,type=TEXT,charSize=null},
FixedPoint{name=age,nullable=true,type=FIXED_POINT,byteSize=4,signed=true}]}
2017-09-04 14:24:23 CST: BOOTING - Progress is not available
2017-09-04 14:24:57 CST: BOOTING - 0.00 %
2017-09-04 14:25:08 CST: BOOTING - 0.00 %
2017-09-04 14:25:19 CST: BOOTING - 0.00 %
2017-09-04 14:25:31 CST: BOOTING - 0.00 %
2017-09-04 14:25:42 CST: BOOTING - 0.00 %
2017-09-04 14:25:55 CST: BOOTING - 0.00 %
2017-09-04 14:26:07 CST: BOOTING - 0.00 %
2017-09-04 14:26:21 CST: RUNNING - 0.00 %
2017-09-04 14:26:32 CST: RUNNING - 0.00 %
2017-09-04 14:26:44 CST: RUNNING - 0.00 %
2017-09-04 14:26:55 CST: RUNNING - 0.00 %
2017-09-04 14:27:20 CST: RUNNING - 50.00 %
2017-09-04 14:27:38 CST: RUNNING - 50.00 %
2017-09-04 14:27:48 CST: RUNNING - 50.00 %
2017-09-04 14:27:59 CST: RUNNING - 50.00 %
2017-09-04 14:28:10 CST: RUNNING - 100.00 %
2017-09-04 14:28:24 CST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_LARGE_READ_OPS: 0
FILE_WRITE_OPS: 0
HDFS_READ_OPS: 1
HDFS_BYTES_READ: 110
HDFS_LARGE_READ_OPS: 0
FILE_READ_OPS: 0
FILE_BYTES_WRITTEN: 266283
FILE_BYTES_READ: 114
HDFS_WRITE_OPS: 1
HDFS_BYTES_WRITTEN: 93
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.JobCounter
TOTAL_LAUNCHED_MAPS: 1
VCORES_MILLIS_REDUCES: 33449
MB_MILLIS_MAPS: 65964032
TOTAL_LAUNCHED_REDUCES: 1
SLOTS_MILLIS_REDUCES: 33449
VCORES_MILLIS_MAPS: 64418
MB_MILLIS_REDUCES: 34251776
SLOTS_MILLIS_MAPS: 64418
MILLIS_REDUCES: 33449
OTHER_LOCAL_MAPS: 1
MILLIS_MAPS: 64418
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 5
ROWS_WRITTEN: 5
org.apache.hadoop.mapreduce.TaskCounter
MAP_OUTPUT_MATERIALIZED_BYTES: 114
REDUCE_INPUT_RECORDS: 5
SPILLED_RECORDS: 10
MERGED_MAP_OUTPUTS: 1
VIRTUAL_MEMORY_BYTES: 4164173824
MAP_INPUT_RECORDS: 0
SPLIT_RAW_BYTES: 110
FAILED_SHUFFLE: 0
MAP_OUTPUT_BYTES: 98
REDUCE_SHUFFLE_BYTES: 114
PHYSICAL_MEMORY_BYTES: 270962688
GC_TIME_MILLIS: 715
REDUCE_INPUT_GROUPS: 5
COMBINE_OUTPUT_RECORDS: 0
SHUFFLED_MAPS: 1
REDUCE_OUTPUT_RECORDS: 5
MAP_OUTPUT_RECORDS: 5
COMBINE_INPUT_RECORDS: 0
CPU_MILLISECONDS: 8460
COMMITTED_HEAP_BYTES: 138481664
Shuffle Errors
CONNECTION: 0
WRONG_LENGTH: 0
BAD_ID: 0
WRONG_MAP: 0
WRONG_REDUCE: 0
IO_ERROR: 0
Job executed successfully
  • (4).查看运行结果
[hadoop@hdp ~]$ hadoop fs -ls /user/hadoop/sqoop/tb_student
Found 1 items
-rw-r--r-- 1 hadoop supergroup 93 2017-09-04 14:28 /user/hadoop/sqoop/tb_student/32048b64-ed4c-430f-824c-fe45bcff5455.txt
[hadoop@hdp ~]$ hadoop fs -text /user/hadoop/sqoop/tb_student/32048b64-ed4c-430f-824c-fe45bcff5455.txt
1,'张三','1',20
2,'李四','1',22
3,'李丽','0',18
4,'赵四','1',21
5,'王小丫','0',22

3.从HDFS导出数据到MySQL

实现需求:将HDFS的/user/hadoop/sqoop/tb_student路径下的数据导出到MySQL中的tb_student_from_hdfs中

  • (1).MySQL中创建表tb_student_from_hdfs
DROP TABLE IF EXISTS `tb_student_from_hdfs`;
CREATE TABLE `tb_student_from_hdfs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) DEFAULT NULL,
`sex` char(2) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
  • (2).创建job
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: from-hdfs-to-mysql-export

From Job configuration

Input directory: /user/hadoop/sqoop/tb_student
Override null value:
Null value:

To database configuration

Schema name: test
Table name: tb_student_from_hdfs
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK and persistent id 2
  • (3).查看创建的job
sqoop:000> show job
+----+---------------------------+----------------+--------------+---------+

| Id | Name | From Connector | To Connector | Enabled |
+----+---------------------------+----------------+--------------+---------+

| 1 | from-mysql-to-hdfs-import | 1 | 3 | true |
| 2 | from-hdfs-to-mysql-export | 3 | 1 | true |
+----+---------------------------+----------------+--------------+---------+

详细信息:

sqoop:000> show job -jid 2
1 job(s) to show:
Job with id 2 and name from-hdfs-to-mysql-export (Enabled: true, Created by hadoop at 17-9-4 下午2:41, Updated by hadoop at 17-9-4 下午2:41)
Using link id 2 and Connector id 3
From Job configuration
Input directory: /user/hadoop/sqoop/tb_student
Override null value:
Null value:
Throttling resources
Extractors: 1
Loaders: 1
To database configuration
Schema name: test
Table name: tb_student_from_hdfs
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:
  • (4).启动任务(执行MapReduce作业)
sqoop:000> start job -j 2 -s 
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-09-04 14:46:38 CST
Lastly updated by: hadoop
External ID: job_1504505043464_0002
http://hdpcomprs:8088/proxy/application_1504505043464_0002/
Target Connector schema: Schema{name=test.tb_student_from_hdfs,columns=[
FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=name,nullable=true,type=TEXT,charSize=null},
Text{name=sex,nullable=true,type=TEXT,charSize=null},
FixedPoint{name=age,nullable=true,type=FIXED_POINT,byteSize=4,signed=true}]}
2017-09-04 14:46:38 CST: BOOTING - Progress is not available
2017-09-04 14:46:57 CST: BOOTING - 0.00 %
2017-09-04 14:47:08 CST: BOOTING - 0.00 %
2017-09-04 14:47:19 CST: BOOTING - 0.00 %
2017-09-04 14:47:30 CST: BOOTING - 0.00 %
2017-09-04 14:47:41 CST: BOOTING - 0.00 %
2017-09-04 14:47:53 CST: RUNNING - 0.00 %
2017-09-04 14:48:05 CST: RUNNING - 0.00 %
2017-09-04 14:48:16 CST: RUNNING - 0.00 %
2017-09-04 14:48:27 CST: RUNNING - 0.00 %
2017-09-04 14:48:41 CST: RUNNING - 0.00 %
2017-09-04 14:48:56 CST: RUNNING - 50.00 %
2017-09-04 14:49:13 CST: RUNNING - 50.00 %
2017-09-04 14:49:24 CST: RUNNING - 50.00 %
2017-09-04 14:49:36 CST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_LARGE_READ_OPS: 0
FILE_WRITE_OPS: 0
HDFS_READ_OPS: 4
HDFS_BYTES_READ: 313
HDFS_LARGE_READ_OPS: 0
FILE_READ_OPS: 0
FILE_BYTES_WRITTEN: 264183
FILE_BYTES_READ: 114
HDFS_WRITE_OPS: 0
HDFS_BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.JobCounter
TOTAL_LAUNCHED_MAPS: 1
VCORES_MILLIS_REDUCES: 27258
MB_MILLIS_MAPS: 60837888
TOTAL_LAUNCHED_REDUCES: 1
SLOTS_MILLIS_REDUCES: 27258
VCORES_MILLIS_MAPS: 59412
MB_MILLIS_REDUCES: 27912192
SLOTS_MILLIS_MAPS: 59412
MILLIS_REDUCES: 27258
OTHER_LOCAL_MAPS: 1
MILLIS_MAPS: 59412
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 5
ROWS_WRITTEN: 5
org.apache.hadoop.mapreduce.TaskCounter
MAP_OUTPUT_MATERIALIZED_BYTES: 114
REDUCE_INPUT_RECORDS: 5
SPILLED_RECORDS: 10
MERGED_MAP_OUTPUTS: 1
VIRTUAL_MEMORY_BYTES: 4161024000
MAP_INPUT_RECORDS: 0
SPLIT_RAW_BYTES: 216
FAILED_SHUFFLE: 0
MAP_OUTPUT_BYTES: 98
REDUCE_SHUFFLE_BYTES: 114
PHYSICAL_MEMORY_BYTES: 278306816
GC_TIME_MILLIS: 678
REDUCE_INPUT_GROUPS: 5
COMBINE_OUTPUT_RECORDS: 0
SHUFFLED_MAPS: 1
REDUCE_OUTPUT_RECORDS: 5
MAP_OUTPUT_RECORDS: 5
COMBINE_INPUT_RECORDS: 0
CPU_MILLISECONDS: 6980
COMMITTED_HEAP_BYTES: 138534912
Shuffle Errors
CONNECTION: 0
WRONG_LENGTH: 0
BAD_ID: 0
WRONG_MAP: 0
WRONG_REDUCE: 0
IO_ERROR: 0
Job executed successfully
  • (5).查看运行结果
mysql> select * from tb_student_from_hdfs;
+----+--------+-----+-----+

| id | name | sex | age |
+----+--------+-----+-----+

| 1 | 张三 | 1 | 20 |
| 2 | 李四 | 1 | 22 |
| 3 | 李丽 | 0 | 18 |
| 4 | 赵四 | 1 | 21 |
| 5 | 王小丫 | 0 | 22 |
+----+--------+-----+-----+

5 rows in set