在hadoop 里安装 sqoop 并把mysql数据导入hdfs

时间:2021-12-21 07:46:22

hadoop 2.6.0

sqoop:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz


然后把sqoop 解压。这里解压发生的问题是,

tar: /home/luis:归档中找不到
tar: 由于前次错误,将以上次的错误状态退出


tar -xzvf XXXXX -C ~/                  大写-C 不要忘记加上。


配置环境变量:sudo nano /etc/bash.bashrc

 加上SQOOP_HOME=/home/hadoop/sqoop-1.4.6
PATH=$PATH:$SQOOP_HOME

source/etc/profile
修改配置文件


在$SQOOP_HOME/conf     目录下拷贝sqoop-env-template.sh改名为sqoop-env.sh.
cp sqoop-env-template.sh sqoop-env.sh
#Setpath to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/luis/hadoop-2.6.0


#Setpath to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=/home/luis/hadoop-2.6.0


#setthe path to where bin/hbase is available
export HBASE_HOME=/home/luis/hbase-1.0.1.1


#Setthe path to where bin/hive is available
#export HIVE_HOME=


#Setthe path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6


修改$SQOOP_HOME/bin/configure-sqoop
注释掉HCatalog,Accumulo检查(除非你准备使用HCatalog,Accumulo等HADOOP上的组件)     
##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi


#if[ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi


#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
#  if [ -z "${HIVE_CONF_DIR}" ]; then
#   TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
#  fi
#  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi


#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#fi

同时我还把zookeeper 的内容给注释掉了。因为老是卡在那里报错。

#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
#  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#fi


就配置完了,然后运行命令,还是会有常见异常:

Streaming result set com.mysql.jdbc.RowDataDynamic@12e9d40f is still active
执行sqoop脚本时,出现这个异常是因为mysql的驱动的原因,使用最新的mysql驱动包。

但是即使我更新到最新的java链接mysql的包,放入lib 目录里还是会出错。

原因是,不止一个地方我放入了 java连接mysql 的包,要把所有共享基础包的地方的连接方式的包都换成最新的。比如黄底位置

$ sudo find -name mysql-connector-java*
./home/luis/下载/mysql-connector-java-5.1.32
./home/luis/下载/mysql-connector-java-5.1.32/mysql-connector-java-5.1.32-bin.jar
./home/luis/weka/weka-3-6-13/lib/mysql-connector-java-5.1.6-bin.jar
./home/luis/sqoop-1.4.6/lib/mysql-connector-java-5.1.32-bin.jar
./home/luis/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev.jython_4.2.0.201507041133/cachedir/packages/mysql-connector-java-5.1.6-bin.pkc
./usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/mysql-connector-java-5.1.32-bin.jar
./usr/share/java/mysql-connector-java-5.1.32-bin.jar


然后可以使用 命令把mysql 导入 hdfs了

导入全部 表 sqoop import-all-tables  --connect  jdbc:mysql://192.168.1.113:3306/weibocatch

1、每个表必须都只有一个列作为主键;
2、必须将每个表中所有的数据导入,而不是部分;
3、必须使用默认分隔列,且WHERE子句无任何强加的条件

导入单张表:

./sqoop import --connect jdbc:mysql://192.168.1.113:3306/weibocatch --username root --password xxxxx --table w_transfer -m1  这里的m 1 表示的是启动map 的个数

测试连接

sqoop list-databases --connect jdbc:mysql://172.16.247.140:3306/ --username xxx --password xxx

列出表

sqoop list-tables --connect jdbc:mysql://172.16.247.140:3306/sqoop --username hive --password 123456


然后就可以进到hadoop的hdfs里头看看内容了。

hdfs 的命令 ls  和 ls -al  ,rm , rmr