Sqoop2入门之导入关系型数据库数据到HDFS上(sqoop2-1.99.4版本)

时间:2024-12-07 13:02:56

sqoop2-1.99.4和sqoop2-1.99.3版本操作略有不同:新版本中使用link代替了老版本的connection,其他使用类似。

sqoop2-1.99.4环境搭建参见:Sqoop2环境搭建

sqoop2-1.99.3版本实现参见:Sqoop2入门之导入关系型数据库数据到HDFS上

启动sqoop2-1.99.4版本客户端:

$SQOOP2_HOME/bin/sqoop.sh client
set server --host hadoop000 --port --webapp sqoop

查看所有connector:

show connector --all
 connector(s) to show:
Connector with id 1:
Name: hdfs-connector
Class: org.apache.sqoop.connector.hdfs.HdfsConnector
Version: 1.99.-cdh5.3.0 Connector with id 2:
Name: generic-jdbc-connector
Class: org.apache.sqoop.connector.jdbc.GenericJdbcConnector
Version: 1.99.-cdh5.3.0

查询所有link:

show link

删除指定link:

delete link --lid x

查询所有job:

show job

删除指定job:

delete job --jid 1

创建generic-jdbc-connector类型的connector

create link --cid 2
Name: First Link
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://hadoop000:3306/hive
Username: root
Password: ****
JDBC Connection Properties:
There are currently values in the map:
entry# protocol=tcp
There are currently values in the map:
protocol = tcp
entry#
New link was successfully created with validation status OK and persistent id 3
show link
+----+-------------+-----------+---------+
| Id | Name | Connector | Enabled |
+----+-------------+-----------+---------+
| 3 | First Link | 2 | true |
+----+-------------+-----------+---------+

创建hdfs-connector类型的connector:

create link -cid 1
Name: Second Link
HDFS URI: hdfs://hadoop000:8020
New link was successfully created with validation status OK and persistent id 4
show link
+----+-------------+-----------+---------+
| Id | Name | Connector | Enabled |
+----+-------------+-----------+---------+
| | First Link | | true |
| | Second Link | | true |
+----+-------------+-----------+---------+
show link -all
link(s) to show:
link with id and name First Link (Enabled: true, Created by null at -- ??:, Updated by null at -- ??:)
Using Connector id
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://hadoop000:3306/hive
Username: root
Password:
JDBC Connection Properties:
protocol = tcp
link with id and name Second Link (Enabled: true, Created by null at -- ??:, Updated by null at -- ??:)
Using Connector id
Link configuration
HDFS URI: hdfs://hadoop000:8020

根据connector id创建job:

create job -f 3 -t 4
Creating job for links with from id and to id
Please fill following values to create new job object
Name: Sqoopy From database configuration Schema name: hive
Table name: TBLS
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query: ToJob configuration Output format:
: TEXT_FILE
: SEQUENCE_FILE
Choose:
Compression format:
: NONE
: DEFAULT
: DEFLATE
: GZIP
: BZIP2
: LZO
: LZ4
: SNAPPY
: CUSTOM
Choose:
Custom compression format:
Output directory: hdfs://hadoop000:8020/sqoop2/tbls_import_demo_sqoop1.99.4 Throttling resources Extractors:
Loaders:
New job was successfully created with validation status OK and persistent id 2

查询所有job:

show job
+----+--------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+--------+----------------+--------------+---------+
| | Sqoopy | | | true |
+----+--------+----------------+--------------+---------+

启动指定的job:  该job执行完后查看HDFS上的文件(hdfs fs -ls hdfs://hadoop000:8020/sqoop2/tbls_import_demo_sqoop1.99.4/)

start job --jid 

查看指定job的执行状态:

status job --jid 

停止指定的job:

stop job --jid 

在start job(如:start job --jid 2)时常见错误:

Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception

在sqoop客户端设置查看job详情:

set option --name verbose --value true
show job --jid