Sqoop2学习(二)—将MySQL数据库中数据导入到HDFS中

时间:2021-12-21 07:46:34

问题导读:

        1、Sqoop客户端如何创建服务端连接?        

        Sqoop的安装此处不再介绍,此处接我上一篇博文:Sqoop2学习(一)—Sqoop1.99.3介绍与安装

一、配置服务端地址及端口

        要想和服务端连接,需要在客户端配置服务端的信息,如下:

set server --host secondmgt --port 12000 --webapp sqoop
        secondmgt:是服务端的主机名,12000是Sqoop的端口号。

二、验证连接是否正常

sqoop:000> show version -all
client version:
Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f
Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
server version:
Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f
Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
API versions:
[v1]
        结果显示客户端和服务端版本信息,说明,客户端已经连上服务端。

三、查看sqoop shell支持的命令

sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:
exit (\x ) Exit the shell
history (\H ) Display, manage and recall edit-line history
help (\h ) Display this help message
set (\st ) Configure various client options and settings
show (\sh ) Display various objects and configuration options
create (\cr ) Create new object in Sqoop repository
delete (\d ) Delete existing object in Sqoop repository
update (\up ) Update objects in Sqoop repository
clone (\cl ) Create new object based on existing one
start (\sta) Start job
stop (\stp) Stop job
status (\stu) Display status of a job
enable (\en ) Enable object in Sqoop repository
disable (\di ) Disable object in Sqoop repository
四、查看Sqoop服务器上注册的连接器

sqoop:000> show connector
+----+------------------------+---------+------------------------------------------------------+
| Id | Name | Version | Class |
+----+------------------------+---------+------------------------------------------------------+
| 1 | generic-jdbc-connector | 1.99.3 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector |
+----+------------------------+---------+------------------------------------------------------+
五、创建连接
sqoop:000> create connection --cid 1Creating connection for connector with id 1Please fill following values to create new connection objectName: My FirstConnection configurationJDBC Driver Class: com.mysql.jdbc.DriverJDBC Connection String: jdbc:mysql://secondmgt:3306/sqoopdbUsername: hivePassword: ****JDBC Connection Properties:There are currently 0 values in the map:entry#  //此处直接敲回车Security related configuration optionsMax connections: 100New connection was successfully created with validation status FINE and persistent id 1sqoop:000>
六、查看创建的连接

sqoop:000> show connection
+----+----------+-----------+---------+
| Id | Name | Connector | Enabled |
+----+----------+-----------+---------+
| 1 | My First | 1 | true |
+----+----------+-----------+---------+
七、创建Job

sqoop:000> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: Second Job

Database configuration

Schema name: sqoopdb
Table name: stu
Table SQL statement:
Table column names: id,name
Partition column name: name
Nulls in partition column:
Boundary query:

Output configuration

Storage type:
0 : HDFS
Choose: 0
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 1
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
Choose: 0
Output directory: /output/sqoopsecond

Throttling resources

Extractors:
Loaders:
New job was successfully created with validation status FINE and persistent id 3
八、提交Job

sqoop:000> start job --jid 3
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: hadoopUser
Creation date: 2015-01-16 10:30:17 CST
Lastly updated by: hadoopUser
External ID: job_1421373857783_0001
http://secondmgt:8088/proxy/application_1421373857783_0001/
2015-01-16 10:30:17 CST: BOOTING - Progress is not available
九、查看Job执行状态

sqoop:000> status job --jid 3
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: hadoopUser
Creation date: 2015-01-16 10:30:17 CST
Lastly updated by: hadoopUser
External ID: job_1421373857783_0001
http://secondmgt:8088/proxy/application_1421373857783_0001/
2015-01-16 10:31:18 CST: RUNNING - 45.00 %
sqoop:000> status job --jid 3
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: hadoopUser
Creation date: 2015-01-16 10:30:17 CST
Lastly updated by: hadoopUser
External ID: job_1421373857783_0001
http://secondmgt:8088/proxy/application_1421373857783_0001/
2015-01-16 10:31:24 CST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 448084
TOTAL_LAUNCHED_MAPS: 10
SLOTS_MILLIS_REDUCES: 0
OTHER_LOCAL_MAPS: 10
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 1562419200
SPILLED_RECORDS: 0
FAILED_SHUFFLE: 0
CPU_MILLISECONDS: 28060
COMMITTED_HEAP_BYTES: 846725120
VIRTUAL_MEMORY_BYTES: 8838815744
MAP_OUTPUT_RECORDS: 4
SPLIT_RAW_BYTES: 1255
GC_TIME_MILLIS: 596
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_READ_OPS: 0
FILE_WRITE_OPS: 0
FILE_BYTES_READ: 0
FILE_LARGE_READ_OPS: 0
HDFS_BYTES_READ: 1255
FILE_BYTES_WRITTEN: 919590
HDFS_LARGE_READ_OPS: 0
HDFS_BYTES_WRITTEN: 930
HDFS_READ_OPS: 40
HDFS_WRITE_OPS: 20
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 4
Job executed successfully
十、显示创建的Job

sqoop:000> show job
+----+------------+--------+-----------+---------+
| Id | Name | Type | Connector | Enabled |
+----+------------+--------+-----------+---------+
| 2 | First Job | IMPORT | 1 | true |
| 3 | Second Job | IMPORT | 1 | true |
+----+------------+--------+-----------+---------+
十一、删除创建的Job

sqoop:000> delete job --jid 3
sqoop:000> show job
+----+-----------+--------+-----------+---------+
| Id | Name | Type | Connector | Enabled |
+----+-----------+--------+-----------+---------+
| 2 | First Job | IMPORT | 1 | true |
+----+-----------+--------+-----------+---------+
十二、查看HDFS,是否将MySQL中数据已导入

[hadoopUser@secondmgt ~]$ hadoop fs -ls /output/sqoopsecond/*
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 0 2015-01-16 10:31 /output/sqoopsecond/_SUCCESS
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 104 2015-01-16 10:30 /output/sqoopsecond/part-m-00000.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00001.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00002.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00003.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00004.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00005.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00006.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 105 2015-01-16 10:31 /output/sqoopsecond/part-m-00007.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00008.seq
Found 1 items
-rw-r--r-- 2 hadoopUser supergroup 119 2015-01-16 10:31 /output/sqoopsecond/part-m-00009.seq