使用sqoop1.99.6将mysql数据导入到hdfs

时间:2022-12-15 07:49:50

环境:sqoop-1.99.6,hadoop-2.6.0

1. 安装sqoop-1.99.6

1)下载sqoop-1.99.6,网址为http://archive.apache.org/dist/sqoop/

2)解压安装包,tar -zxvf sqoop*.tar.gz

3)  配置环境变量,vi /etc/profile,添加

export SQOOP_HOME=你的安装路径

export CATALINE_BASE=$SQOOP_HOME/server

export LOGDIR=$SQOOP_HOME/logs

退出命令行,source /etc/profile,使配置生效

4)配置${SQOOP_HOME}/server/conf/catalina.properties,找到如下代码,配置为

#common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn/*.jar,/usr/lib/hadoop-yarn/lib/*.jar,/usr/lib/hive/lib/*.jar
#
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/soft/hive-1.2.1/lib/*.jar

5)配置${SQOOP_HOME}/server/conf/sqoop.properties文件,找到如下代码,配置为:

# Hadoop configuration directory

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/soft/hadoop-2.6.0/etc/hadoop/

6) 把mysql的驱动包mysql-connector-java-5.1.16-bin.jar 放到${SQOOP_HOME}/server/lib/里

7) 测试是否安装成功,先启动hadoop,再启动sqoop。

2. 在sqoop安装路径下执行命令,bin/sqoop.sh server start,然后执行bin/sqoop.sh client启动客户端,并创建所需的link

[root@Master sqoop-1.99.6]# bin/sqoop.sh client
Sqoop home directory: /usr/soft/sqoop-1.99.6
Sqoop Shell: Type 'help' or '\h' for help.
1)执行命令show connector查看connector

sqoop:000> show connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+----+------------------------+---------+------------------------------------------------------+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+---------+------------------------------------------------------+----------------------+
| 1 | kite-connector | 1.99.6 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| 2 | kafka-connector | 1.99.6 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| 3 | hdfs-connector | 1.99.6 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+---------+------------------------------------------------------+----------------------+
从代码中可以看出有4个connector,我们这里是从mysql导入数据到hdfs,所以需要使用Id为3和4的connector

2)首先创建 generic-jdbc-connector类型的connector,注释在代码中

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: jdbclink //用于标识link的
Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver //指定jdbc启动时所需要加载的driver类
JDBC Connection String: jdbc:mysql://Master:3306/test //这里Master是localhost就是你的主机名,3306端口号是固定的,test是你所用的数据库名称
Username: root //链接数据库的用户名
Password: ****** //用户密码
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry#
New link was successfully created with validation status OK and persistent id 8
查看link是否创建成功

sqoop:000> show link
+----+----------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+----------+--------------+------------------------+---------+
| 8 | jdbclink | 4 | generic-jdbc-connector | true |
+----+----------+--------------+------------------------+---------+

3)接着创建hdfs的link

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfslink

Link configuration

HDFS URI: hdfs://Master:8020 //就是hadoop中配置的hdfs-site.xml中属性fs.defaultFS的值
Hadoop conf directory:
New link was successfully created with validation status OK and persistent id 9
sqoop:000> show link
+----+----------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+----------+--------------+------------------------+---------+
| 8 | jdbclink | 4 | generic-jdbc-connector | true |
| 9 | hdfslink | 3 | hdfs-connector | true |
+----+----------+--------------+------------------------+---------+

3. 创建传输任务job,create job -f 8 -t 9,-f即from,也就是数据源的位置,-t即to,也就是数据源要存储的目的地位置。

sqoop:000> create job -f 8 -t 9
Creating job for links with from id 8 and to id 9
Please fill following values to create new job object
Name: job1 //标识符

From database configuration

Schema name: test //数据库名字
Table name: my_user //表名字
Table SQL statement: //SQL查询语句
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:

Incremental read

Check column:
Last value:

To HDFS configuration

Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0 //数据在hdfs上的存储格式
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0 //指定压缩算法
Custom compression format:
Output directory: hdfs://Master:8020/sqoop2/my_user_import //存储在hdfs文件系统的路径
Append mode:

Throttling resources

Extractors:
Loaders:
New job was successfully created with validation status OK and persistent id 3
sqoop:000> show job
+----+------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+------+----------------+--------------+---------+
| 3 | job1 | 4 | 3 | true |
+----+------+----------------+--------------+---------+
查看job

sqoop:000> show job
+----+------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+------+----------------+--------------+---------+
| 3 | job1 | 4 | 3 | true |
+----+------+----------------+--------------+---------+
启动job任务

sqoop:000> start job --jid 3
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-08-28 14:27:26 CST
Lastly updated by: root
External ID: job_1502944428192_0003
http://10.226.118.24:8888/proxy/application_1502944428192_0003/
2017-08-28 14:27:26 CST: BOOTING - Progress is not available
通过设置可查看具体的出错信息

set option --name verbose --value true

再次执行

sqoop:000> status job --jid 3
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-08-28 14:27:26 CST
Lastly updated by: root
External ID: job_1502944428192_0003
http://10.226.118.24:8888/proxy/application_1502944428192_0003/
2017-08-28 14:27:56 CST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 2241
TOTAL_LAUNCHED_UBERTASKS: 6
NUM_UBER_SUBMAPS: 6
MB_MILLIS_MAPS: 2294784
TOTAL_LAUNCHED_MAPS: 6
MILLIS_MAPS: 2241
VCORES_MILLIS_MAPS: 2241
OTHER_LOCAL_MAPS: 6
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 868519936
SPILLED_RECORDS: 0
FAILED_SHUFFLE: 0
CPU_MILLISECONDS: 1640
COMMITTED_HEAP_BYTES: 159473664
VIRTUAL_MEMORY_BYTES: 10597224448
MAP_OUTPUT_RECORDS: 7
SPLIT_RAW_BYTES: 655
GC_TIME_MILLIS: 726
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_READ_OPS: 0
FILE_WRITE_OPS: 0
FILE_BYTES_READ: 0
FILE_LARGE_READ_OPS: 0
HDFS_BYTES_READ: 2500
FILE_BYTES_WRITTEN: 0
HDFS_LARGE_READ_OPS: 0
HDFS_BYTES_WRITTEN: 1629714
HDFS_READ_OPS: 267
HDFS_WRITE_OPS: 33
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_WRITTEN: 7
ROWS_READ: 7
Job executed successfully
说明任务完成

4. 在hdfs上查看数据是否导入成功

在浏览器输入Master:50070

使用sqoop1.99.6将mysql数据导入到hdfs



然后进入my_user_import,显示如下

使用sqoop1.99.6将mysql数据导入到hdfs
使用sqoop1.99.6将mysql数据导入到hdfs

说明导入数据成功。

查看导入的数据,在${HADOOP_HOME}下执行如下命令,就可以看到对应的存储内容。

bin/hdfs dfs -text /sqoop2/my_user_import/01dde47f-e009-4cf1-9170-e0a3f76d0466.txt