以hdfsreader到hdfswriter为例进行说明:
1.datax的任务配置文件里需要指明使用的hadoop的配置文件,在datax+hadoop1.X的时候,可以直接使用hadoop1.X/conf/core-site.xml;
但是当要datax+hadoop2.X的时候,就需要将hadoop2.X/etc/core-site.xml和hadoop2.X/etc/hdfs-site.xml合成一个文件,同时可以命名为hadoop-site.xml.
2.在合成的hadoop-site.xml文件中,需要新增属性:
<property>
<name>fs.hdfs.impl</name> <!--hdfsreader/hdfswriter的dir是hdfs://时需要增加,表示hdfs路径-->
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>fs.file.impl</name> <!--hdfsreader/hdfswriter的dir是file://时需要增加,表示本地路径-->
<value>org.apache.hadoop.fs.LocalFileSystem</value>
</property>
3.针对hdfsreader中需要增加一些依赖包,包括:
-rw-r--r-- 1 hadoop hadoop 575389 Dec 18 16:24 commons-collections-3.2.1.jar
-rw-r--r-- 1 hadoop hadoop 62050 Dec 18 16:23 commons-logging-1.1.3.jar
-rw-r--r-- 1 hadoop hadoop 1648200 Dec 18 16:25 guava-11.0.2.jar
-rw-r--r-- 1 hadoop hadoop 3318401 Dec 18 16:26 hadoop-common-2.6.2.jar
-rw-r--r-- 1 hadoop hadoop 178199 Dec 18 16:26 hadoop-lzo-0.4.20-SNAPSHOT.jar
-rw-r--r-- 1 hadoop hadoop 16380 Dec 18 15:29 hdfsreader-1.0.0.jar
-rw-r--r-- 1 hadoop hadoop 18490 Dec 18 15:29 java-xmlbuilder-0.4.jar
-rw-r--r-- 1 hadoop hadoop 2019 Dec 18 15:29 ParamKey.java
-rwxr-xr-x 1 hadoop hadoop 18837 Dec 18 15:29 plugins-common-1.0.0.jar
需要把hdfsread/hadoop-0.19.2-core.jar(hadoop*-core*.jar)删除。
4.针对hdfswriter中需要增加一些依赖包,包括:
-rwxr-xr-x 1 hadoop hadoop 41123 Dec 18 16:40 commons-cli-1.2.jar
-rw-r--r-- 1 hadoop hadoop 575389 Dec 18 16:34 commons-collections-3.2.1.jar
-rw-r--r-- 1 hadoop hadoop 62050 Dec 18 16:34 commons-logging-1.1.3.jar
-rw-r--r-- 1 hadoop hadoop 1648200 Dec 18 16:34 guava-11.0.2.jar
-rwxr-xr-x 1 hadoop hadoop 67190 Dec 18 16:40 hadoop-auth-2.6.2.jar
-rw-r--r-- 1 hadoop hadoop 3318401 Dec 18 16:34 hadoop-common-2.6.2.jar
-rwxr-xr-x 1 hadoop hadoop 7915385 Dec 18 16:36 hadoop-hdfs-2.6.2.jar
-rw-r--r-- 1 hadoop hadoop 178199 Dec 18 16:34 hadoop-lzo-0.4.20-SNAPSHOT.jar
-rw-r--r-- 1 hadoop hadoop 14652 Dec 18 16:35 hdfswriter-1.0.0.jar
-rwxr-xr-x 1 hadoop hadoop 31212 Dec 18 16:43 htrace-core-3.0.4.jar
-rw-r--r-- 1 hadoop hadoop 18490 Dec 18 16:34 java-xmlbuilder-0.4.jar
-rw-r--r-- 1 hadoop hadoop 657766 Dec 18 15:28 libhadoop.so
-rw-r--r-- 1 hadoop hadoop 4374 Dec 18 15:28 ParamKey.java
-rwxr-xr-x 1 hadoop hadoop 18837 Dec 18 16:34 plugins-common-1.0.0.jar
-rwxr-xr-x 1 hadoop hadoop 533455 Dec 18 16:43 protobuf-java-2.5.0.jar
需要把hdfsread/hadoop-0.19.2-core.jar(hadoop*-core*.jar)删除。
5.环境变量务必配置正确,比如:
PATH=$PATH:$HOME/app/bin. //错误,这种错误难以发现,且容易引发问题
PATH=$PATH:$HOME/app/bin:.: //正确,当前目录要单独用:隔开