Sqoop2 MySQL与HDFS导入导出

本篇博客主要介绍sqoop2中如何进行数据的导入导出，笔者使用的版本为1.99.5，安装及配置请参考博客：http://blog.csdn.net/hg_harvey/article/details/77803520

Sqoop 命令使用官网：
http://sqoop.apache.org/docs/1.99.5/CommandLineClient.html

Sqoop 导入导出官网：
https://sqoop.apache.org/docs/1.99.5/Sqoop5MinutesDemo.html

在使用sqoop导入导出数据前，先使用如下命令来查看下详情模式的状态（默认为false）

show option --name verbose

Sqoop2 MySQL与HDFS导入导出

如果想要在客户端打印更多的详细信息，可以使用如下命令来设置（出现错误，方便排查）

set option --name verbose --value true

Sqoop2 MySQL与HDFS导入导出

Sqoop 1.99.5版本中支持的连接器如下

show connector

Sqoop2 MySQL与HDFS导入导出

查看link

show link

Sqoop2 MySQL与HDFS导入导出

查看job

show job

Sqoop2 MySQL与HDFS导入导出

上面可以看到是没有link以及和job的，下面我们开始使用sqoop2进行数据的导入导出

1.创建link对象

(1).创建mysql的link对象

sqoop:000> create link -c 1
Creating link for connector with id 1
Please fill following values to create new link object
Name: mysql-link

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.242.1:3306/test
Username: root
Password: ****
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and persistent id 1

(2).创建hdfs的link对象

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-mysql-link

Link configuration

HDFS URI: hdfs://192.168.242.166:9000
New link was successfully created with validation status OK and persistent id 2

(3).查看创建的link对象

sqoop:000> show link
+----+------------+-----------+---------+
| Id | Name | Connector | Enabled |
+----+------------+-----------+---------+
| 1  | mysql-link | 1         | true    |
| 2 | hdfs-link | 3 | true |
+----+------------+-----------+---------+

详细信息：

sqoop:000> show link -a    
2 link(s) to show: 
link with id 1 and name mysql-link (Enabled: true, Created by hadoop at 17-9-4 下午2:16, Updated by hadoop at 17-9-4 下午2:16)
Using Connector id 1
Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://192.168.242.1:3306/test
    Username: root
    Password: 
    JDBC Connection Properties: 
link with id 2 and name hdfs-link (Enabled: true, Created by hadoop at 17-9-4 下午2:18, Updated by hadoop at 17-9-4 下午2:19)
Using Connector id 3
Link configuration
    HDFS URI: hdfs://192.168.242.166:9000

2.从MySQL导入数据到HDFS

实现需求：
将mysql表tb_student的数据导入到HDFS的/user/hadoop/sqoop/tb_student路径下

(1).创建job

sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: from-mysql-to-hdfs-import

From database configuration

Schema name: test
Table name: tb_student
Table SQL statement: 
Table column names: 
Partition column name: 
Null value allowed for the partition column: 
Boundary query: 

ToJob configuration

Override null value: 
Null value: 
Output format: 
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format: 
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format: 
Output directory: /user/hadoop/sqoop/tb_student

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 1

(2).查看创建的job

sqoop:000> show job
+----+---------------------------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+---------------------------+----------------+--------------+---------+
| 1  | from-mysql-to-hdfs-import | 1              | 3            | true    |
+----+---------------------------+----------------+--------------+--------`
+

详细信息：

sqoop:000> show job -jid 1
1 job(s) to show: 
Job with id 1 and name from-mysql-to-hdfs-import (Enabled: true, Created by hadoop at 17-9-4 下午2:21, Updated by hadoop at 17-9-4 下午2:21)
Using link id 1 and Connector id 1
  From database configuration
    Schema name: test
    Table name: tb_student
    Table SQL statement: 
    Table column names: 
    Partition column name: 
    Null value allowed for the partition column: 
    Boundary query: 
  Throttling resources
    Extractors: 1
    Loaders: 1
  ToJob configuration
    Override null value: 
    Null value: 
    Output format: TEXT_FILE
    Compression format: NONE
    Custom compression format: 
    Output directory: /user/hadoop/sqoop/tb_student

(3).启动任务（执行MapReduce作业）

sqoop:000> start job -j 1 -s 
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-09-04 14:24:23 CST
Lastly updated by: hadoop
External ID: job_1504505043464_0001
    http://hdpcomprs:8088/proxy/application_1504505043464_0001/
Source Connector schema: Schema{name=test.tb_student,columns=[
    FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
    Text{name=name,nullable=true,type=TEXT,charSize=null},
    Text{name=sex,nullable=true,type=TEXT,charSize=null},
    FixedPoint{name=age,nullable=true,type=FIXED_POINT,byteSize=4,signed=true}]}
2017-09-04 14:24:23 CST: BOOTING  - Progress is not available
2017-09-04 14:24:57 CST: BOOTING  - 0.00 %
2017-09-04 14:25:08 CST: BOOTING  - 0.00 %
2017-09-04 14:25:19 CST: BOOTING  - 0.00 %
2017-09-04 14:25:31 CST: BOOTING  - 0.00 %
2017-09-04 14:25:42 CST: BOOTING  - 0.00 %
2017-09-04 14:25:55 CST: BOOTING  - 0.00 %
2017-09-04 14:26:07 CST: BOOTING  - 0.00 %
2017-09-04 14:26:21 CST: RUNNING  - 0.00 %
2017-09-04 14:26:32 CST: RUNNING  - 0.00 %
2017-09-04 14:26:44 CST: RUNNING  - 0.00 %
2017-09-04 14:26:55 CST: RUNNING  - 0.00 %
2017-09-04 14:27:20 CST: RUNNING  - 50.00 %
2017-09-04 14:27:38 CST: RUNNING  - 50.00 %
2017-09-04 14:27:48 CST: RUNNING  - 50.00 %
2017-09-04 14:27:59 CST: RUNNING  - 50.00 %
2017-09-04 14:28:10 CST: RUNNING  - 100.00 %
2017-09-04 14:28:24 CST: SUCCEEDED 
Counters:
    org.apache.hadoop.mapreduce.FileSystemCounter
        FILE_LARGE_READ_OPS: 0
        FILE_WRITE_OPS: 0
        HDFS_READ_OPS: 1
        HDFS_BYTES_READ: 110
        HDFS_LARGE_READ_OPS: 0
        FILE_READ_OPS: 0
        FILE_BYTES_WRITTEN: 266283
        FILE_BYTES_READ: 114
        HDFS_WRITE_OPS: 1
        HDFS_BYTES_WRITTEN: 93
    org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
        BYTES_WRITTEN: 0
    org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
        BYTES_READ: 0
    org.apache.hadoop.mapreduce.JobCounter
        TOTAL_LAUNCHED_MAPS: 1
        VCORES_MILLIS_REDUCES: 33449
        MB_MILLIS_MAPS: 65964032
        TOTAL_LAUNCHED_REDUCES: 1
        SLOTS_MILLIS_REDUCES: 33449
        VCORES_MILLIS_MAPS: 64418
        MB_MILLIS_REDUCES: 34251776
        SLOTS_MILLIS_MAPS: 64418
        MILLIS_REDUCES: 33449
        OTHER_LOCAL_MAPS: 1
        MILLIS_MAPS: 64418
    org.apache.sqoop.submission.counter.SqoopCounters
        ROWS_READ: 5
        ROWS_WRITTEN: 5
    org.apache.hadoop.mapreduce.TaskCounter
        MAP_OUTPUT_MATERIALIZED_BYTES: 114
        REDUCE_INPUT_RECORDS: 5
        SPILLED_RECORDS: 10
        MERGED_MAP_OUTPUTS: 1
        VIRTUAL_MEMORY_BYTES: 4164173824
        MAP_INPUT_RECORDS: 0
        SPLIT_RAW_BYTES: 110
        FAILED_SHUFFLE: 0
        MAP_OUTPUT_BYTES: 98
        REDUCE_SHUFFLE_BYTES: 114
        PHYSICAL_MEMORY_BYTES: 270962688
        GC_TIME_MILLIS: 715
        REDUCE_INPUT_GROUPS: 5
        COMBINE_OUTPUT_RECORDS: 0
        SHUFFLED_MAPS: 1
        REDUCE_OUTPUT_RECORDS: 5
        MAP_OUTPUT_RECORDS: 5
        COMBINE_INPUT_RECORDS: 0
        CPU_MILLISECONDS: 8460
        COMMITTED_HEAP_BYTES: 138481664
    Shuffle Errors
        CONNECTION: 0
        WRONG_LENGTH: 0
        BAD_ID: 0
        WRONG_MAP: 0
        WRONG_REDUCE: 0
        IO_ERROR: 0
Job executed successfully

(4).查看运行结果

[hadoop@hdp ~]$ hadoop fs -ls /user/hadoop/sqoop/tb_student
Found 1 items
-rw-r--r--   1 hadoop supergroup         93 2017-09-04 14:28 /user/hadoop/sqoop/tb_student/32048b64-ed4c-430f-824c-fe45bcff5455.txt
[hadoop@hdp ~]$ hadoop fs -text /user/hadoop/sqoop/tb_student/32048b64-ed4c-430f-824c-fe45bcff5455.txt
1,'张三','1',20
2,'李四','1',22
3,'李丽','0',18
4,'赵四','1',21
5,'王小丫','0',22

3.从HDFS导出数据到MySQL

实现需求：将HDFS的/user/hadoop/sqoop/tb_student路径下的数据导出到MySQL中的tb_student_from_hdfs中

(1).MySQL中创建表tb_student_from_hdfs

DROP TABLE IF EXISTS `tb_student_from_hdfs`;
CREATE TABLE `tb_student_from_hdfs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) DEFAULT NULL,
`sex` char(2) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

(2).创建job

sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: from-hdfs-to-mysql-export

From Job configuration

Input directory: /user/hadoop/sqoop/tb_student
Override null value: 
Null value: 

To database configuration

Schema name: test
Table name: tb_student_from_hdfs
Table SQL statement: 
Table column names: 
Stage table name: 
Should clear stage table: 

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 2

(3).查看创建的job

sqoop:000> show job
+----+---------------------------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+---------------------------+----------------+--------------+---------+
| 1  | from-mysql-to-hdfs-import | 1              | 3            | true    |
| 2 | from-hdfs-to-mysql-export | 3 | 1 | true |
+----+---------------------------+----------------+--------------+---------+

详细信息：

sqoop:000> show job -jid 2
1 job(s) to show: 
Job with id 2 and name from-hdfs-to-mysql-export (Enabled: true, Created by hadoop at 17-9-4 下午2:41, Updated by hadoop at 17-9-4 下午2:41)
Using link id 2 and Connector id 3
  From Job configuration
    Input directory: /user/hadoop/sqoop/tb_student
    Override null value: 
    Null value: 
  Throttling resources
    Extractors: 1
    Loaders: 1
  To database configuration
    Schema name: test
    Table name: tb_student_from_hdfs
    Table SQL statement: 
    Table column names: 
    Stage table name: 
    Should clear stage table:

(4).启动任务（执行MapReduce作业）

sqoop:000> start job -j 2 -s 
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-09-04 14:46:38 CST
Lastly updated by: hadoop
External ID: job_1504505043464_0002
    http://hdpcomprs:8088/proxy/application_1504505043464_0002/
Target Connector schema: Schema{name=test.tb_student_from_hdfs,columns=[
    FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
    Text{name=name,nullable=true,type=TEXT,charSize=null},
    Text{name=sex,nullable=true,type=TEXT,charSize=null},
    FixedPoint{name=age,nullable=true,type=FIXED_POINT,byteSize=4,signed=true}]}
2017-09-04 14:46:38 CST: BOOTING  - Progress is not available
2017-09-04 14:46:57 CST: BOOTING  - 0.00 %
2017-09-04 14:47:08 CST: BOOTING  - 0.00 %
2017-09-04 14:47:19 CST: BOOTING  - 0.00 %
2017-09-04 14:47:30 CST: BOOTING  - 0.00 %
2017-09-04 14:47:41 CST: BOOTING  - 0.00 %
2017-09-04 14:47:53 CST: RUNNING  - 0.00 %
2017-09-04 14:48:05 CST: RUNNING  - 0.00 %
2017-09-04 14:48:16 CST: RUNNING  - 0.00 %
2017-09-04 14:48:27 CST: RUNNING  - 0.00 %
2017-09-04 14:48:41 CST: RUNNING  - 0.00 %
2017-09-04 14:48:56 CST: RUNNING  - 50.00 %
2017-09-04 14:49:13 CST: RUNNING  - 50.00 %
2017-09-04 14:49:24 CST: RUNNING  - 50.00 %
2017-09-04 14:49:36 CST: SUCCEEDED 
Counters:
    org.apache.hadoop.mapreduce.FileSystemCounter
        FILE_LARGE_READ_OPS: 0
        FILE_WRITE_OPS: 0
        HDFS_READ_OPS: 4
        HDFS_BYTES_READ: 313
        HDFS_LARGE_READ_OPS: 0
        FILE_READ_OPS: 0
        FILE_BYTES_WRITTEN: 264183
        FILE_BYTES_READ: 114
        HDFS_WRITE_OPS: 0
        HDFS_BYTES_WRITTEN: 0
    org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
        BYTES_WRITTEN: 0
    org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
        BYTES_READ: 0
    org.apache.hadoop.mapreduce.JobCounter
        TOTAL_LAUNCHED_MAPS: 1
        VCORES_MILLIS_REDUCES: 27258
        MB_MILLIS_MAPS: 60837888
        TOTAL_LAUNCHED_REDUCES: 1
        SLOTS_MILLIS_REDUCES: 27258
        VCORES_MILLIS_MAPS: 59412
        MB_MILLIS_REDUCES: 27912192
        SLOTS_MILLIS_MAPS: 59412
        MILLIS_REDUCES: 27258
        OTHER_LOCAL_MAPS: 1
        MILLIS_MAPS: 59412
    org.apache.sqoop.submission.counter.SqoopCounters
        ROWS_READ: 5
        ROWS_WRITTEN: 5
    org.apache.hadoop.mapreduce.TaskCounter
        MAP_OUTPUT_MATERIALIZED_BYTES: 114
        REDUCE_INPUT_RECORDS: 5
        SPILLED_RECORDS: 10
        MERGED_MAP_OUTPUTS: 1
        VIRTUAL_MEMORY_BYTES: 4161024000
        MAP_INPUT_RECORDS: 0
        SPLIT_RAW_BYTES: 216
        FAILED_SHUFFLE: 0
        MAP_OUTPUT_BYTES: 98
        REDUCE_SHUFFLE_BYTES: 114
        PHYSICAL_MEMORY_BYTES: 278306816
        GC_TIME_MILLIS: 678
        REDUCE_INPUT_GROUPS: 5
        COMBINE_OUTPUT_RECORDS: 0
        SHUFFLED_MAPS: 1
        REDUCE_OUTPUT_RECORDS: 5
        MAP_OUTPUT_RECORDS: 5
        COMBINE_INPUT_RECORDS: 0
        CPU_MILLISECONDS: 6980
        COMMITTED_HEAP_BYTES: 138534912
    Shuffle Errors
        CONNECTION: 0
        WRONG_LENGTH: 0
        BAD_ID: 0
        WRONG_MAP: 0
        WRONG_REDUCE: 0
        IO_ERROR: 0
Job executed successfully

(5).查看运行结果

mysql> select * from tb_student_from_hdfs;
+----+--------+-----+-----+
| id | name | sex | age |
+----+--------+-----+-----+
|  1 | 张三   | 1   |  20 |
|  2 | 李四   | 1   |  22 |
|  3 | 李丽   | 0   |  18 |
|  4 | 赵四   | 1   |  21 |
| 5 | 王小丫 | 0 | 22 |
+----+--------+-----+-----+
5 rows in set

秒客网

Sqoop2 MySQL与HDFS导入导出

1.创建link对象

2.从MySQL导入数据到HDFS

3.从HDFS导出数据到MySQL

相关文章