Sqoop_ 简单介绍

时间:2023-03-08 15:28:36
Sqoop_ 简单介绍

一、基本作用

概念: Sqoop被称为协作框架,是在Hadoop.2.X生态系统的辅助型框架,简单说,就是一个数据转换工具,类似的协作框架有文件收集库框架Flume,任务协调框架Oozie,大数据Web工具Hue
过程: 数据源(RDBMS)取得数据<--->数据清洗/数据分析<--->HDFS/HBASE/HDFS
作用: Sql-to-Hadoop,是连接关系型数据库和Hadoop的桥梁,以mapreduce为底层,通过参数与与mapreduce模板封装成jar包,提交给Yarn,利用MapReduce加快数据传输速度,批处理方式进行数据传输
版本: 1.4.x 为Sqoop1 1.99.x为Sqoop2
二进制下载包下载地址: http://archive.cloudera.com/cdh5/cdh/5/

二、简单配置

sqoop-1.4.5-cdh5.3.6/conf
sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6
export HADOOP_MAPRED_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6
export HIVE_HOME=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6

三、简单使用

# 连接mysql数据库时注意将mysql的驱动jar包放入lib目录下
$ bin/sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
$ bin/sqoop list-databases --connect jdbc:mysql://10.0.0.108:3306 --username root --password root
$ bin/sqoop list-tables --connect jdbc:mysql://10.0.0.108:3306/mysql --username root --password root
$ bin/sqoop import --help
$ bin/sqoop export --help