通过sqoop 实现hdfs与mysql的数据导入导出

时间:2021-08-15 07:50:33

做这个实验需要安装好了hadoop环境,还需要一台mysql服务器(可以是单独的主机,也可以是hadoop集群中的任一一台)

先搞清sqoop,mysql,hdfs 3者之间的关系

mysql 与 hadoop集群之间没有直接联系,是通过第三方软件sqoop实现的

mysql --> sqoop --> hdfs

mysql <-- sqoop <-- hdfs


1 下载安装sqoop,在hadoop集群的任一一个节点上执行

tar fvxz sqoop-1.3.0-cdh3u5.tar.gz

mv sqoop-1.3.0-cdh3u5/  sqoop

拷贝相关的jar

[kyo@hadoop1 ~]$ cp hadoop/hadoop-core-0.20.2-cdh3u5.jar /home/kyo/sqoop/lib/

[kyo@hadoop1 ~]$ cp mysql-connector-java-5.1.22-bin.jar/home/kyo/sqoop/lib/

以便sqoophadoopmysql通信

设置相关环境变量,等下执行sqoop命令的时候如果报错,按照提示再设置也来得急

export HADOOP_HOME=/home/kyo/hadoop/


2 在mysql服务器上给sqoop程序提供一个可以用来连接的账号sqoop,密码sqoop

mysql> grant all privileges on *.* to'sqoop'@'%' identified by 'sqoop';

Query OK, 0 rows affected (0.00 sec)


mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)


test库里面建立测试表test,并少量填充数据

mysql> use test

Database changed

mysql> create table test (a int);

Query OK, 0 rows affected (0.02 sec)


mysql> insert into test values (1);

Query OK, 1 row affected (0.02 sec)


mysql> insert into test values (2);

Query OK, 1 row affected (0.00 sec)


mysql> insert into test values (3);

Query OK, 1 row affected (0.00 sec)



3 测试sqoop连接mysql

本例中mysql与haoop集群在一起(ip都是192.168.0.110,按照您实际情况填写,用户名密码都是刚才设定的)

hadoop1$/home/kyo/sqoop/bin/sqooplist-databases --connect jdbc:mysql://192.168.0.110:3306/--username sqoop--password sqoop

12/12/16 15:13:45 WARN tool.BaseSqoopTool:Setting your password on the command-line is insecure. Consider using -Pinstead.

12/12/16 15:13:45 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.

information_schema

mysql

test


看到了mysql,test库,则连接没有问题


4 从mysql导数据到hdfs

jdbc:mysql://192.168.0.110:3306/test这个test是库名--table test 这个是表名

hadoop1$/home/kyo/sqoop/bin/sqoopimport  --connectjdbc:mysql://192.168.0.110:3306/test --username sqoop --password sqoop--tabletest -m 1

以下是输出

12/12/16 15:15:53 WARN tool.BaseSqoopTool: Setting your password onthe command-line is insecure. Consider using -P instead.

12/12/16 15:15:53 INFO manager.MySQLManager:Preparing to use a MySQL streaming resultset.

12/12/16 15:15:53 INFO tool.CodeGenTool:Beginning code generation

12/12/16 15:15:53 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 15:15:53 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 15:15:53 INFOorm.CompilationManager: HADOOP_HOME is /home/kyo/hadoop

12/12/16 15:15:55 INFOorm.CompilationManager: Writing jar file: /tmp/sqoop-kyo/compile/b1c01b3bbbd600cbbb81fbbf3db1a7e6/test.jar

12/12/16 15:15:55 WARNmanager.MySQLManager: It looks like you are importing from mysql.

12/12/16 15:15:55 WARNmanager.MySQLManager: This transfer can be faster! Use the --direct

12/12/16 15:15:55 WARN manager.MySQLManager:option to exercise a MySQL-specific fast path.

12/12/16 15:15:55 INFOmanager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

12/12/16 15:15:55 INFOmapreduce.ImportJobBase: Beginning import of test

12/12/16 15:16:05 INFO mapred.JobClient:Running job: job_201211252330_0007

12/12/16 15:16:06 INFOmapred.JobClient:  map 0% reduce 0%

12/12/16 15:17:09 INFOmapred.JobClient:  map 100% reduce 0%

12/12/16 15:17:13 INFO mapred.JobClient:Job complete: job_201211252330_0007

12/12/16 15:17:13 INFO mapred.JobClient:Counters: 16

12/12/16 15:17:13 INFOmapred.JobClient:   Job Counters

12/12/16 15:17:13 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=42305

12/12/16 15:17:13 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

12/12/16 15:17:13 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

12/12/16 15:17:13 INFO mapred.JobClient:     Launched map tasks=1

12/12/16 15:17:13 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=0

12/12/16 15:17:13 INFOmapred.JobClient:   FileSystemCounters

12/12/16 15:17:13 INFOmapred.JobClient:     HDFS_BYTES_READ=87

12/12/16 15:17:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=62784

12/12/16 15:17:13 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=6

12/12/16 15:17:13 INFOmapred.JobClient:   Map-Reduce Framework

12/12/16 15:17:13 INFOmapred.JobClient:     Map input records=3

12/12/16 15:17:13 INFOmapred.JobClient:     Physical memory(bytes) snapshot=38944768

12/12/16 15:17:13 INFOmapred.JobClient:     Spilled Records=0

12/12/16 15:17:13 INFOmapred.JobClient:     CPU time spent(ms)=700

12/12/16 15:17:13 INFOmapred.JobClient:     Total committedheap usage (bytes)=5177344

12/12/16 15:17:13 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=372609024

12/12/16 15:17:13 INFOmapred.JobClient:     Map outputrecords=3

12/12/16 15:17:13 INFOmapred.JobClient:     SPLIT_RAW_BYTES=87

12/12/16 15:17:13 INFOmapreduce.ImportJobBase: Transferred 6 bytes in 78.0204 seconds (0.0769bytes/sec)

12/12/16 15:17:13 INFOmapreduce.ImportJobBase: Retrieved 3 records.




去hdfs中查看一下数据是否已经导入进来

hadoop1$hadoop/bin/hadoop fs -ls

Found 4 items

drwxr-xr-x  - kyo    supergroup          0 2012-11-26 01:08 /user/kyo/input

drwxrwxrwx  - oracle supergroup          02012-12-16 02:50 /user/kyo/olhcache

drwxr-xr-x  - oracle oinstall            02012-12-16 01:19 /user/kyo/oracle

drwxrwxrwx  - kyo    supergroup          0 2012-12-16 15:17 /user/kyo/test


hadoop1$hadoop/bin/hadoop fs -ls/user/kyo/test

Found 3 items

-rw-r--r--  2 kyo supergroup          02012-12-16 15:17 /user/kyo/test/_SUCCESS

drwxrwxrwx  - kyo supergroup          02012-12-16 15:16 /user/kyo/test/_logs

-rw-r--r--  2 kyo supergroup          62012-12-16 15:17 /user/kyo/test/part-m-00000




5 从hdfs导数据到mysql

先在mysql里面删除刚才填充的数据mysql> use testReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A Database changedmysql> truncate test;Query OK, 0 rows affected (0.02 sec)
执行下面的命令把数据从hdfs导入到mysql里面去(hdfs://192.168.0.110:9000/user/kyo/test/  在hdfs上存放mysql数据的目录

hadoop1$/home/kyo/sqoop/bin/sqoopexport  --connect jdbc:mysql://192.168.0.110:3306/test--username sqoop --password sqoop --table test --export-dirhdfs://192.168.0.110:9000/user/kyo/test/

以下是输出

12/12/16 15:20:44 WARN tool.BaseSqoopTool:Setting your password on the command-line is insecure. Consider using -P instead.

12/12/16 15:20:44 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.

12/12/16 15:20:44 INFO tool.CodeGenTool:Beginning code generation

12/12/16 15:20:44 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 15:20:44 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 15:20:44 INFOorm.CompilationManager: HADOOP_HOME is /home/kyo/hadoop

12/12/16 15:20:45 INFOorm.CompilationManager: Writing jar file:/tmp/sqoop-kyo/compile/76904e70e5111b613381de7834461ce2/test.jar

12/12/16 15:20:45 INFOmapreduce.ExportJobBase: Beginning export of test

12/12/16 15:20:46 INFOinput.FileInputFormat: Total input paths to process : 1

12/12/16 15:20:46 INFOinput.FileInputFormat: Total input paths to process : 1

12/12/16 15:20:48 INFO mapred.JobClient:Running job: job_201211252330_0008

12/12/16 15:20:49 INFOmapred.JobClient:  map 0% reduce 0%

12/12/16 15:20:56 INFOmapred.JobClient:  map 100% reduce 0%

12/12/16 15:20:56 INFO mapred.JobClient:Job complete: job_201211252330_0008

12/12/16 15:20:56 INFO mapred.JobClient:Counters: 16

12/12/16 15:20:56 INFOmapred.JobClient:   Job Counters

12/12/16 15:20:56 INFOmapred.JobClient:     SLOTS_MILLIS_MAPS=4658

12/12/16 15:20:56 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

12/12/16 15:20:56 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

12/12/16 15:20:56 INFOmapred.JobClient:     Launched maptasks=1

12/12/16 15:20:56 INFOmapred.JobClient:     Data-local maptasks=1

12/12/16 15:20:56 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=0

12/12/16 15:20:56 INFOmapred.JobClient:   FileSystemCounters

12/12/16 15:20:56 INFOmapred.JobClient:     HDFS_BYTES_READ=131

12/12/16 15:20:56 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=62603

12/12/16 15:20:56 INFOmapred.JobClient:   Map-Reduce Framework

12/12/16 15:20:56 INFOmapred.JobClient:     Map input records=3

12/12/16 15:20:56 INFOmapred.JobClient:     Physical memory(bytes) snapshot=38912000

12/12/16 15:20:56 INFOmapred.JobClient:     Spilled Records=0

12/12/16 15:20:56 INFOmapred.JobClient:     CPU time spent(ms)=410

12/12/16 15:20:56 INFO mapred.JobClient:     Total committed heap usage (bytes)=5177344

12/12/16 15:20:56 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=371920896

12/12/16 15:20:56 INFOmapred.JobClient:     Map outputrecords=3

12/12/16 15:20:56 INFOmapred.JobClient:     SPLIT_RAW_BYTES=119

12/12/16 15:20:56 INFOmapreduce.ExportJobBase: Transferred 131 bytes in 10.5227 seconds (12.4493bytes/sec)

12/12/16 15:20:56 INFOmapreduce.ExportJobBase: Exported 3 records.



进入到mysql数据库中查看,数据又有了

mysql> use test

Database changed

mysql> select * from test;

+------+

| a   |

+------+

|   1 |

|   2 |

|   3 |

+------+

3 rows in set (0.00 sec)