sqoop的所有操作脚本在$SQOOP_HOME/bin 目录下
sqoop脚本
1、
[[email protected] bin]$ sqoop
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
Try 'sqoop help' for usage.
这个报错不用管
2、使用“sqoop help”查看sqoop脚本的用法
[[email protected] bin]$ sqoop help
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/01/06 13:44:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
这个报错不用管
3、使用“sqoop help command”查看sqoop下命令的具体用法
Common arguments:
--connect <jdbc-uri> Specify JDBC connect
string
--connection-manager <class-name> Specify connection manager
class name
--connection-param-file <properties-file> Specify connection
parameters file
--driver <class-name> Manually specify JDBC
driver class to use
--hadoop-home <hdir> Override
$HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home <dir> Override
$HADOOP_MAPRED_HOME_ARG
--help Print usage instructions
-P Read password from console
--password <password> Set authentication
password
--password-alias <password-alias> Credential provider
password alias
--password-file <password-file> Set authentication
password file path
--relaxed-isolation Use read-uncommitted
isolation for imports
--skip-dist-cache Skip copying jars to
distributed cache
--username <username> Set authentication
username
--verbose Print more information
while working
Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
4、列出mysql数据库中的所有databases
sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password vincent
执行以上命令报错(原因:找不到mysql驱动包)
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
解决方案:
将mysql驱动的jar包拷贝到SQOOP_HOME/lib目录下即可
如果hive的元信息数据库是mysql,且hive 能正常使用,则执行以下命令即可
cp $HIVE_HOME/lib/mysql-connector-java-5.1.45-bin.jar $SQOOP_HOME/lib/
注意拷贝后改jar 包的属组与属主
再次执行:
sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password vincent
效果如下:
18/01/06 14:13:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/06 14:13:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/06 14:13:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
mysql
vincent_hive
对应的mysql中执行show databases;效果如下:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| vincent_hive |
+--------------------+
3 rows in set (0.01 sec)
5、列出mysql数据库中子数据库vincent_hive中所有的表:
sqoop list-tables --connect jdbc:mysql://localhost:3306/vincent_hive --username root--password vincent
6、将mysql 中的表数据导入到hdfs上
sqoop import \
--connect jdbc:mysql://localhost:3306/mysql \
--username root \
--password vincent \
--table user
导入的数据在hdfs的“/user/${执行该操作的用户}/${导入的表} ”目录下
任务执行会在$HADOOP_HOME目录下创建一个“表名.java”的文件,该文件存放的是对应任务的MR程序
报错(找不到java-json.jar包):
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742)
at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369)
at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 15 more
解决方法:
在网上找一个java-json.jar包上传至$SQOOP_HOME/bin目录下即可
再次执行
sqoop import \
--connect jdbc:mysql://localhost:3306/mysql \
--username root \
--password vincent \
--table user
日志如下:
18/01/06 15:18:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/06 15:18:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/06 15:18:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/06 15:18:55 INFO tool.CodeGenTool: Beginning code generation
18/01/06 15:18:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
18/01/06 15:18:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
18/01/06 15:18:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop
Note: /tmp/sqoop-hadoop/compile/fdf0c2f391ba20be1190c670d7b71bb5/user.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/06 15:19:05 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fdf0c2f391ba20be1190c670d7b71bb5/user.jar
18/01/06 15:19:05 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/06 15:19:05 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/06 15:19:05 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/06 15:19:05 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/06 15:19:05 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
18/01/06 15:19:05 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
18/01/06 15:19:05 INFO mapreduce.ImportJobBase: Beginning import of user
18/01/06 15:19:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/06 15:19:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/06 15:19:11 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/01/06 15:19:20 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/06 15:19:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`Host`), MAX(`Host`) FROM `user`
18/01/06 15:19:20 WARN db.TextSplitter: Generating splits for a textual index column.
18/01/06 15:19:20 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
18/01/06 15:19:20 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
18/01/06 15:19:20 INFO mapreduce.JobSubmitter: number of splits:5
18/01/06 15:19:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515112781641_0004
18/01/06 15:19:24 INFO impl.YarnClientImpl: Submitted application application_1515112781641_0004
18/01/06 15:19:25 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1515112781641_0004/
18/01/06 15:19:25 INFO mapreduce.Job: Running job: job_1515112781641_0004
18/01/06 15:20:07 INFO mapreduce.Job: Job job_1515112781641_0004 running in uber mode : false
18/01/06 15:20:07 INFO mapreduce.Job: map 0% reduce 0%
18/01/06 15:22:44 INFO mapreduce.Job: map 20% reduce 0%
18/01/06 15:22:50 INFO mapreduce.Job: map 40% reduce 0%
18/01/06 15:22:58 INFO mapreduce.Job: map 60% reduce 0%
18/01/06 15:22:59 INFO mapreduce.Job: map 80% reduce 0%
18/01/06 15:23:01 INFO mapreduce.Job: map 100% reduce 0%
18/01/06 15:23:02 INFO mapreduce.Job: Job job_1515112781641_0004 completed successfully
18/01/06 15:23:03 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=690395
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=655
HDFS: Number of bytes written=492
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=10
Job Counters
Killed map tasks=1
Launched map tasks=6
Other local map tasks=6
Total time spent by all maps in occupied slots (ms)=824619
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=824619
Total vcore-seconds taken by all map tasks=824619
Total megabyte-seconds taken by all map tasks=844409856
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=655
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=34641
CPU time spent (ms)=18840
Physical memory (bytes) snapshot=551452672
Virtual memory (bytes) snapshot=7501139968
Total committed heap usage (bytes)=239206400
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=492
18/01/06 15:23:03 INFO mapreduce.ImportJobBase: Transferred 492 bytes in 232.3861 seconds (2.1172 bytes/sec)
18/01/06 15:23:03 INFO mapreduce.ImportJobBase: Retrieved 4 records.
注:number of splits:5表明有五个map,最终“/user/${执行该操作的用户}/${导入的表} ”目录下会有五个maptask的输出文件及一个最终结果输出文件
7、将mysql 中vincent_hive子数据库的表数据导入到hdfs上
sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive \
--username root --password vincent \
--table emp \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 --null-non-string 0 \
--columns "EMPNO,ENAME,JOB,SAL,COMM" \
--target-dir EMP_COLUMN_WHERE \
--where 'SAL>2000'
注:
--mapreduce-job-name FromMySQL2HDFS 指定mr任务的名称为romMySQL2HDFS
--delete-target-dir 如果之前执行过该任务,这个参数会清除之前该任务的输出文件及在$HADOOP_HOME生成的mr程序文件
--fields-terminated-by '\t' 指定字段与字段之间分割符为‘\t’
-lines-terminated-by '\n' 指定列与列之间分割符为‘\n’
-m 1 指定map数量为1,map数量为1可以导入没有主键且未指定主键的表
--split-by "name"指定name字段为主键
--null-non-string 0 将非string类型的null值转换成0
--columns "EMPNO,ENAME,JOB,SAL,COMM" 只导入emp表指定列的数据
--where 'SAL>2000' 过滤出SAL>2000的行
--target-dir EMP_COLUMN_WHERE 指定输出文件保存目录
8、将查询语句的内容导入hdfs
sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive \
--username root --password vincent \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 --null-non-string 0 \
--target-dir EMP_COLUMN_QUERY \
--query 'select * from emp where empno>=7900 and $CONDITIONS'
注意:“$HADOOP_HOME”代表的是字符串"$HADOOP_HOME"
'$HADOOP_HOME'和“\$HADOOP_HOME”代表的是hadoop家目录
9、将命令写入文件中再执行
vi emp.txt,格式如下(参数值在参数名的下一行):
import
--connect
jdbc:mysql://localhost:3306/vincent
--username
root
--password
vincent
--mapreduce-job-name
FromMySQL2HDFS
-m
1
--null-non-string
0
--target-dir
EMP_COLUMN_QUERY
--query
"select * from emp where empno>=7900 and \$CONDITIONS"
sqoop --options-file emp.txt
执行文件中的命令
10、执行命令后将结果打印在屏幕上
sqoop eval \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root --password vincent \
--query "select * from emp"
11、将hdfs上信息导出到mysql中(mysql中事先要存在表结构满足要求的表)
sqoop export \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root \
--password vincent \
--table emp_demo \
--export-dir /user/hadoop/emp \
-fields-terminated-by '\t'
--table emp_demo 指定导出信息目标表
-export-dir /user/hadoop/emp指定导出信息源路径
12、将mysql数据导入到hive
sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root --password vincent \
--table emp \
--create-hive-table \
--hive-database ruoze_hive \
--hive-import \
--hive-overwrite \
--hive-table emp_import \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 --null-non-string 0
create-hive-table 创建一张hive表,将数据导入到这张表中
但不建议使用这个参数自动在hive中创建一张表,这样做会造成字段的数据类型错误
建议先在hive中手动创建表存储导入的数据
hive-database ruoze_hive 使用ruo_ze子数据库
hive-import 导入到hive必须加这个参数,否则就是导入到hdfs上
hive-overwrite 支持覆盖导入
hive-table emp_import 导入的表名为emp_import
mapreduce-job-name 指定job名字
报错:
18/01/06 21:13:51 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/01/06 21:13:51 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
... 12 more
解决办法:
在$HIVE_HOME/lib目录下执行以下命令即可:
cp hive-common-1.1.0-cdh5.7.0.jar $SQOOP_HOME/lib/
cp hive-shims* $SQOOP_HOME/lib/
13、将hive数据导出到mysql中
sqoop export --connect jdbc:mysql://localhost:3306/vincent_hive --username root --password vincent --table emp_like --export-dir /user/hive/warehouse/ruoze.db/emp_import --fields-terminated-by '\t'
mysql中的emp_like表要事先创建
--export-dir 指定要导出的数据的文件所在目录
--fields-terminated-by '\t' 不指定分隔符可能会出现问题
13、sqoop job
sqoop job --create ruozejob -- import --connect jdbc:mysql://localhost:3306/vincent_hive \
--username root \
--password vincent \
--table emp \
--delete-target-dir
注意:
-- import 有个空格哦
命令:
sqoop job --list 查看存在的sqoop job
sqoop job --show ruoze_job 查看ruoze_job ,输入密码为mysql登录密码
sqoop job --exec ruoze_job 执行ruo_job,输入密码为mysql登录密码
sqoop job --delete ruoze_job
如何解决输密码问题?
【来自@若泽大数据】