做这个实验需要安装好了hadoop环境,还需要一台mysql服务器(可以是单独的主机,也可以是hadoop集群中的任一一台)
先搞清sqoop,mysql,hdfs 3者之间的关系
mysql 与 hadoop集群之间没有直接联系,是通过第三方软件sqoop实现的
mysql --> sqoop --> hdfs
mysql <-- sqoop <-- hdfs
1 下载安装sqoop,在hadoop集群的任一一个节点上执行
tar fvxz sqoop-1.3.0-cdh3u5.tar.gz
mv sqoop-1.3.0-cdh3u5/ sqoop
拷贝相关的jar包
[kyo@hadoop1 ~]$ cp hadoop/hadoop-core-0.20.2-cdh3u5.jar /home/kyo/sqoop/lib/
[kyo@hadoop1 ~]$ cp mysql-connector-java-5.1.22-bin.jar/home/kyo/sqoop/lib/
以便sqoop与hadoop,mysql通信
设置相关环境变量,等下执行sqoop命令的时候如果报错,按照提示再设置也来得急
export HADOOP_HOME=/home/kyo/hadoop/
2 在mysql服务器上给sqoop程序提供一个可以用来连接的账号sqoop,密码sqoop
mysql> grant all privileges on *.* to'sqoop'@'%' identified by 'sqoop';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
在test库里面建立测试表test,并少量填充数据
mysql> use test
Database changed
mysql> create table test (a int);
Query OK, 0 rows affected (0.02 sec)
mysql> insert into test values (1);
Query OK, 1 row affected (0.02 sec)
mysql> insert into test values (2);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values (3);
Query OK, 1 row affected (0.00 sec)
3 测试sqoop连接mysql
本例中mysql与haoop集群在一起(ip都是192.168.0.110,按照您实际情况填写,用户名密码都是刚才设定的)
hadoop1$/home/kyo/sqoop/bin/sqooplist-databases --connect jdbc:mysql://192.168.0.110:3306/--username sqoop--password sqoop
12/12/16 15:13:45 WARN tool.BaseSqoopTool:Setting your password on the command-line is insecure. Consider using -Pinstead.
12/12/16 15:13:45 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
mysql
test
看到了mysql,test库,则连接没有问题
4 从mysql导数据到hdfs
(jdbc:mysql://192.168.0.110:3306/test ,这个test是库名,--table test 这个是表名)
hadoop1$/home/kyo/sqoop/bin/sqoopimport --connectjdbc:mysql://192.168.0.110:3306/test --username sqoop --password sqoop--tabletest -m 1
以下是输出
12/12/16 15:15:53 WARN tool.BaseSqoopTool: Setting your password onthe command-line is insecure. Consider using -P instead.
12/12/16 15:15:53 INFO manager.MySQLManager:Preparing to use a MySQL streaming resultset.
12/12/16 15:15:53 INFO tool.CodeGenTool:Beginning code generation
12/12/16 15:15:53 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 15:15:53 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 15:15:53 INFOorm.CompilationManager: HADOOP_HOME is /home/kyo/hadoop
12/12/16 15:15:55 INFOorm.CompilationManager: Writing jar file: /tmp/sqoop-kyo/compile/b1c01b3bbbd600cbbb81fbbf3db1a7e6/test.jar
12/12/16 15:15:55 WARNmanager.MySQLManager: It looks like you are importing from mysql.
12/12/16 15:15:55 WARNmanager.MySQLManager: This transfer can be faster! Use the --direct
12/12/16 15:15:55 WARN manager.MySQLManager:option to exercise a MySQL-specific fast path.
12/12/16 15:15:55 INFOmanager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/12/16 15:15:55 INFOmapreduce.ImportJobBase: Beginning import of test
12/12/16 15:16:05 INFO mapred.JobClient:Running job: job_201211252330_0007
12/12/16 15:16:06 INFOmapred.JobClient: map 0% reduce 0%
12/12/16 15:17:09 INFOmapred.JobClient: map 100% reduce 0%
12/12/16 15:17:13 INFO mapred.JobClient:Job complete: job_201211252330_0007
12/12/16 15:17:13 INFO mapred.JobClient:Counters: 16
12/12/16 15:17:13 INFOmapred.JobClient: Job Counters
12/12/16 15:17:13 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=42305
12/12/16 15:17:13 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
12/12/16 15:17:13 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
12/12/16 15:17:13 INFO mapred.JobClient: Launched map tasks=1
12/12/16 15:17:13 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/12/16 15:17:13 INFOmapred.JobClient: FileSystemCounters
12/12/16 15:17:13 INFOmapred.JobClient: HDFS_BYTES_READ=87
12/12/16 15:17:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=62784
12/12/16 15:17:13 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=6
12/12/16 15:17:13 INFOmapred.JobClient: Map-Reduce Framework
12/12/16 15:17:13 INFOmapred.JobClient: Map input records=3
12/12/16 15:17:13 INFOmapred.JobClient: Physical memory(bytes) snapshot=38944768
12/12/16 15:17:13 INFOmapred.JobClient: Spilled Records=0
12/12/16 15:17:13 INFOmapred.JobClient: CPU time spent(ms)=700
12/12/16 15:17:13 INFOmapred.JobClient: Total committedheap usage (bytes)=5177344
12/12/16 15:17:13 INFOmapred.JobClient: Virtual memory(bytes) snapshot=372609024
12/12/16 15:17:13 INFOmapred.JobClient: Map outputrecords=3
12/12/16 15:17:13 INFOmapred.JobClient: SPLIT_RAW_BYTES=87
12/12/16 15:17:13 INFOmapreduce.ImportJobBase: Transferred 6 bytes in 78.0204 seconds (0.0769bytes/sec)
12/12/16 15:17:13 INFOmapreduce.ImportJobBase: Retrieved 3 records.
去hdfs中查看一下数据是否已经导入进来
hadoop1$hadoop/bin/hadoop fs -ls
Found 4 items
drwxr-xr-x - kyo supergroup 0 2012-11-26 01:08 /user/kyo/input
drwxrwxrwx - oracle supergroup 02012-12-16 02:50 /user/kyo/olhcache
drwxr-xr-x - oracle oinstall 02012-12-16 01:19 /user/kyo/oracle
drwxrwxrwx - kyo supergroup 0 2012-12-16 15:17 /user/kyo/test
hadoop1$hadoop/bin/hadoop fs -ls/user/kyo/test
Found 3 items
-rw-r--r-- 2 kyo supergroup 02012-12-16 15:17 /user/kyo/test/_SUCCESS
drwxrwxrwx - kyo supergroup 02012-12-16 15:16 /user/kyo/test/_logs
-rw-r--r-- 2 kyo supergroup 62012-12-16 15:17 /user/kyo/test/part-m-00000
5 从hdfs导数据到mysql
先在mysql里面删除刚才填充的数据mysql> use testReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A Database changedmysql> truncate test;Query OK, 0 rows affected (0.02 sec)
执行下面的命令把数据从hdfs导入到mysql里面去(hdfs://192.168.0.110:9000/user/kyo/test/ 在hdfs上存放mysql数据的目录)
hadoop1$/home/kyo/sqoop/bin/sqoopexport --connect jdbc:mysql://192.168.0.110:3306/test--username sqoop --password sqoop --table test --export-dirhdfs://192.168.0.110:9000/user/kyo/test/
以下是输出
12/12/16 15:20:44 WARN tool.BaseSqoopTool:Setting your password on the command-line is insecure. Consider using -P instead.
12/12/16 15:20:44 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/12/16 15:20:44 INFO tool.CodeGenTool:Beginning code generation
12/12/16 15:20:44 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 15:20:44 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 15:20:44 INFOorm.CompilationManager: HADOOP_HOME is /home/kyo/hadoop
12/12/16 15:20:45 INFOorm.CompilationManager: Writing jar file:/tmp/sqoop-kyo/compile/76904e70e5111b613381de7834461ce2/test.jar
12/12/16 15:20:45 INFOmapreduce.ExportJobBase: Beginning export of test
12/12/16 15:20:46 INFOinput.FileInputFormat: Total input paths to process : 1
12/12/16 15:20:46 INFOinput.FileInputFormat: Total input paths to process : 1
12/12/16 15:20:48 INFO mapred.JobClient:Running job: job_201211252330_0008
12/12/16 15:20:49 INFOmapred.JobClient: map 0% reduce 0%
12/12/16 15:20:56 INFOmapred.JobClient: map 100% reduce 0%
12/12/16 15:20:56 INFO mapred.JobClient:Job complete: job_201211252330_0008
12/12/16 15:20:56 INFO mapred.JobClient:Counters: 16
12/12/16 15:20:56 INFOmapred.JobClient: Job Counters
12/12/16 15:20:56 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=4658
12/12/16 15:20:56 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
12/12/16 15:20:56 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
12/12/16 15:20:56 INFOmapred.JobClient: Launched maptasks=1
12/12/16 15:20:56 INFOmapred.JobClient: Data-local maptasks=1
12/12/16 15:20:56 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/12/16 15:20:56 INFOmapred.JobClient: FileSystemCounters
12/12/16 15:20:56 INFOmapred.JobClient: HDFS_BYTES_READ=131
12/12/16 15:20:56 INFOmapred.JobClient: FILE_BYTES_WRITTEN=62603
12/12/16 15:20:56 INFOmapred.JobClient: Map-Reduce Framework
12/12/16 15:20:56 INFOmapred.JobClient: Map input records=3
12/12/16 15:20:56 INFOmapred.JobClient: Physical memory(bytes) snapshot=38912000
12/12/16 15:20:56 INFOmapred.JobClient: Spilled Records=0
12/12/16 15:20:56 INFOmapred.JobClient: CPU time spent(ms)=410
12/12/16 15:20:56 INFO mapred.JobClient: Total committed heap usage (bytes)=5177344
12/12/16 15:20:56 INFOmapred.JobClient: Virtual memory(bytes) snapshot=371920896
12/12/16 15:20:56 INFOmapred.JobClient: Map outputrecords=3
12/12/16 15:20:56 INFOmapred.JobClient: SPLIT_RAW_BYTES=119
12/12/16 15:20:56 INFOmapreduce.ExportJobBase: Transferred 131 bytes in 10.5227 seconds (12.4493bytes/sec)
12/12/16 15:20:56 INFOmapreduce.ExportJobBase: Exported 3 records.
进入到mysql数据库中查看,数据又有了
mysql> use test
Database changed
mysql> select * from test;
+------+
| a |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)