Hive/Hbase/Sqoop的安装教程
HIVE INSTALL
1.下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
2.上传到Linux指定目录,解压:
mkdir hive
mv apache-hive-2.3.3-bin.tar.gz hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3
### 安装目录为:/app/hive/apache-hive-2.3.3
3.配置环境变量:
sudo vi /etc/profile
添加环境变量:
export HIVE_HOME=/app/hive/apache-hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin
:wq #保存退出
4.修改HIVE配置文件:
配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):
cd /app/hive/apache-hive-2.3.3/conf
cp hive-env.sh.template hive-env.sh
###在文件中添加如下内容-- 去掉#,并把目录改为自己设定的目录
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
export HIVE_HOME=/app/hive/apache-hive-2.3.3
export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
export JAVA_HOME=/app/lib/jdk
创建hdfs文件目录:
cd /app/hive/apache-hive-2.3.3
mkdir hive_site_dir
cd hive_site_dir
hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
hdfs dfs -mkdir -p tmp
hdfs dfs -mkdir -p log
hdfs dfs -chmod -R 777 warehouse
hdfs dfs -chmod -R 777 tmp
hdfs dfs -chmod -R 777 log
创建临时文件夹:
cd /app/hive/apache-hive-2.3.3
mkdir tmp
配置文件hive-site.xml (在原有的基础上修改):
cp hive-default.xml.template hive-site.xml
vi hive-site.xml
>>配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
<!--mysql database connection setting -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property> <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property> <property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>szprd</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>szprd</value>
</property>
>>配置hdfs文件目录
<property>
<name>hive.exec.scratchdir</name>
<!--<value>/tmp/hive</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property> <property>
<name>hive.metastore.warehouse.dir</name>
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
</property> <property>
<name>hive.exec.local.scratchdir</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
<value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property> <property>
<name>hive.downloaded.resources.dir</name>
<!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
<value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property> <property>
<name>hive.querylog.location</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
<description>Location of Hive run time structured log file</description>
</property> <property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
修改完配置文件后,:wq 保存退出
5.下载合适版本的mysql驱动包,复制到HIVE安装目录的 lib目录下
https://dev.mysql.com/downloads/connector/j/
6.初始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
cd /app/hive/apache-hive-2.3.3/bin
./schematool -initSchema -dbType mysql
7.启动hive
hive #这里配置了环境变量后,可以在任意目录下执行 (/etc/profile)
8.实时查看日志启动hive命令(在hive安装目录的bin目录下执行):
./hive -hiveconf hive.root.logger=DEBUG,console
HBASE INSTALL
1.下载hbase安装包: http://hbase.apache.org/downloads.html
2.解压: tar -zxvf hbase-1.2.6.1-bin.tar.gz
3.配置环境变量: (加在最后面)
vi /etc/profile
#HBase Setting
export HBASE_HOME=/app/hbase/hbase-1.2.6.1
export PATH=$PATH:$HBASE_HOME/bin
4.编辑配置文件: hbase-env.sh
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录
编辑配置文件: hbase-site.xml
在configuration节点添加如下配置:
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.1.202:9000/hbase</value>
</property> <property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/vc/dev/MQ/ZK/zookeeper-3.4.12</value>
</property> <property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property> <property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property> <property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
</description>
</property>
5.启动zookeeper
进入zookeeper的安装目录下的bin目录,执行 ./zkServer.sh
然后启动客户端: ./zkCli.sh
启动成功后,输入: create /hbase hbase
6.启动hbase
进入Hbase的bin目录: ./start-hbase.sh
./hbase shell #这里启动成功后就可以开始执行hbase相关命令了
list #没有报错表示成功
7.web访问HBASE: http://10.28.85.149:16010/master-status #ip为当前服务器的ip,端口为16010
#Sqoop install
1.下载安装包: https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
2.解压: tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
更改文件名: mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0
3. 配置环境变量:/etc/profile
#Sqoop Setting
export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin
4.将mysql的驱动包复制到 Sqoop安装目录的lib目录下
https://dev.mysql.com/downloads/connector/j/
5.编辑配置文件: sqoop的安装目录下的 conf下
vi sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7 #Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7 #set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1 #Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3 #Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
6,输入命令:
sqoop help #查看相关的sqoop命令
sqoop version #查看sqoop的版本
ps:
关于停止hbase的命令: stop-hbase.sh ,出现关于pid的错误提示时,请参考这篇博文:https://blog.csdn.net/xiao_jun_0820/article/details/35222699
hadoop的安装教程:http://note.youdao.com/noteshare?id=0cae2da671de0f7175376abb8e705406
zookeeper的安装教程:http://note.youdao.com/noteshare?id=33e37b0967da40660920f755ba2c03f0
# hadoop 伪分布式模式安装
# 前提 JDK 安装成功 # 下载hadoop2.7.7
```
cd /home/vc/dev/hadoop wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
```
# 解压缩 ```
tar -zxvf hadoop-2.7.7.tar.gz
``` ## 配置hadoop的环境变量,在/etc/profile下追加 hadoop配置 ```
# hadoop home setting export HADOOP_HOME=/app/hadoop/hadoop-2.7.7
export HADOOP_INSTALL=${HADOOP_HOME}
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" ``` ## 修改 hadoop安装目录/etc/hadoop/hadoop-env.sh 文件 ```
# The java implementation to use.
export JAVA_HOME=/home/vc/dev/jdk/jdk1.8.0_161
```
### hadoop安装目录/etc/hadoop/core-site.xml ```
<configuration> <!-- 指定hadoop运行时产生文件的存储路径;指定被hadoop使用的目录,用于存储数据文件。-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/vc/dev/hadoop/hadoop-2.7.7/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<!-- 指定HDFS老大(namenode)的通信地址指定默认的文件系统。 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.202:9000</value>
</property>
</configuration> ``` ### 配置HDFS ,etc/hadoop/hdfs-site.xml ```
<configuration>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/name</value>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/data</value>
</property> </configuration>
~
```
### 设置hadoop 伪分布式下免密登入,Hadoop集群节点之间的免密登入务必配置成功,不然有各种问题 如果是单节点情况下免密登入测试`ssh localhost`,如果不能登入成功,执行下面命令: ```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys ```
### 伪分布式下不需要配置/etc/hosts文件,真分布式下需要配置各主机和IP的映射关系。 # hadoop伪分布式下启动
## 配置 hdfs
```
# 第一次启动hdfs需要格式化:出现询问输入Y or N,全部输Y即可
bin/hdfs namenode -format
# 启动 Start NameNode daemon and DataNode daemon: 启动HDFS,这个命令启动hadoop单节点集群
sbin/start-dfs.sh
```
通过上面启动后即可在web页面浏览 NameNode 节点信息:
![](http://one17356s.bkt.clouddn.com/18-8-24/97813052.jpg) ```
# 通过hadoop 命令在hdfs上创建目录
hadoop fs -mkdir /test
# 或者通过这个命令
hdfs dfs -mkdir /user # 上传文件 ```
![](http://one17356s.bkt.clouddn.com/18-8-24/33727958.jpg) ## 关闭 HDFS ```
./sbin/stop-dfs.sh ```
## 配置 yarn
### etc/hadoop/mapred-site.xml ```
<configuration> <!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration> ```
### etc/hadoop/yarn-site.xml
```
<configuration> <!-- Site specific YARN configuration properties --> <!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
``` ![](http://one17356s.bkt.clouddn.com/18-8-24/53993777.jpg)
![](http://one17356s.bkt.clouddn.com/18-8-24/28989509.jpg) ## yarn 启动和停止 ```
./sbin/start-yarn.sh
./sbin/stop-yarn.sh ``` ## 查看集群状态 ```
./bin/hadoop dfsadmin -report
```
# 伪分布式下测试 ```
//服务器上新建目录
mkdir ~/input
//进入服务器目录并将hadoop配置文件当做数据文件复制到input目录
cd ~/input
cp /app/hadoop/hadoop-2.7.7/etc/hadoop/*.xml ./
//将 input下的文件上传到hdfs分布式文件系统中/one目录下
hdfs dfs -put ./* /one
//检查上传到hdfs中的文件
hdfs dfs -ls /one
//执行jar文件,务必保证计算结果目录 /output 在hdfs上不存在。不然报错
hadoop jar /app/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep /one /output 'dfs[a-z.]+'
//将计算结果目录导出到服务器下~/input目录中
hdfs dfs -get /output
// 查看内容
cat output/* ```
--- # ZK 安装
# 下载zk解压并安装:(zookeeper-3.4.9.tar.gz)
# 设置环境变量
![](http://one17356s.bkt.clouddn.com/17-11-2/30838835.jpg)
# 改配置文件(配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfg文件名称改为zoo.cfg)
配置说明:
- tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
- dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
- clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。 ![](http://one17356s.bkt.clouddn.com/18-7-8/79348236.jpg)
4.1单机模式
- 下载zookeeper的安装包之后, 解压到合适目录. 进入zookeeper目录下的conf子目录, 创建`cp zoo_sample.cfg zoo.cfg`根据模板创建配置文件,并配置如下参数。
- tickTime=2000
- dataDir=/home/vc/dev/MQ/ZK/data
- dataLogDir=/home/vc/dev/MQ/ZK/log
- clientPort=2181
## 每个参数的含义说明 - tickTime: zookeeper中使用的tick基本时间单位, 毫秒值.
- dataDir: 数据目录. 可以是任意目录.
- dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
- clientPort: 监听client连接的端口号 # 启动zk
`/dev/Zk/zookeeper-3.4.9/bin$ ./zkServer.sh start`
![](http://one17356s.bkt.clouddn.com/17-11-2/76638495.jpg) # 查看是否起来
使用命令:`netstat -antp | grep 2181`
![](http://one17356s.bkt.clouddn.com/17-11-2/15616237.jpg) # 通过zCl.sh链接到zk服务 ```
./zkCli.sh -server localhost:2181 链接到本机zk服务
history 执行命令
quit 客户端断开zkserver链接 ```
![](http://one17356s.bkt.clouddn.com/18-8-27/4122129.jpg) # 关闭Zk服务
`./zkServer.sh stop` --- # [HIVE SQOOP HBASE安装博客链接:](https://www.cnblogs.com/DFX339/p/9550213.html) # HIVE-INSTALL
- 下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
- 上传到Linux指定目录,解压: ```
mkdir hive
mv apache-hive-2.3.3-bin.tar.gz hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3
### 安装目录为:/app/hive/apache-hive-2.3.3
``` - 配置环境变量: ```
sudo vi /etc/profile
添加:export HIVE_HOME=/app/hive/apache-hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin
:wq #保存退出
``` - 修改HIVE配置文件:
- 配置文件hive-env.sh (在原有的基础上修改,没有的项就添加): ```
cd /app/hive/apache-hive-2.3.3/conf
cp hive-env.sh.template hive-env.sh
在文件中添加如下内容(去掉#,并把目录改为自己设定的目录)
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
export HIVE_HOME=/app/hive/apache-hive-2.3.3
export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
export JAVA_HOME=/app/lib/jdk
``` - 创建hdfs文件目录: ```
cd /app/hive/apache-hive-2.3.3
mkdir hive_site_dir
cd hive_site_dir
hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
hdfs dfs -mkdir -p tmp
hdfs dfs -mkdir -p log
hdfs dfs -chmod -R 777 warehouse
hdfs dfs -chmod -R 777 tmp
hdfs dfs -chmod -R 777 log
创建临时文件夹:
cd /app/hive/apache-hive-2.3.3
mkdir tmp
``` - 配置文件hive-site.xml (在原有的基础上修改): ```
cp hive-default.xml.template hive-site.xml
vi hive-site.xml
``` - 配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName ```
<!--mysql database connection setting -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property> <property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property> <property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>szprd</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>szprd</value>
</property>
``` - 配置hdfs文件目录 ```
<property>
<name>hive.exec.scratchdir</name>
<!--<value>/tmp/hive</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property> <property>
<name>hive.metastore.warehouse.dir</name>
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
</property> <property>
<name>hive.exec.local.scratchdir</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
<value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property> <property>
<name>hive.downloaded.resources.dir</name>
<!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
<value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property> <property>
<name>hive.querylog.location</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
<description>Location of Hive run time structured log file</description>
</property> <property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
``` **修改完hive-site.xml 配置文件后,wq 保存退出** - 下载合适版本的mysql驱动包,放到HIVE安装目录的 lib目录下
https://dev.mysql.com/downloads/connector/j/ - 始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ ) ```
cd /app/hive/apache-hive-2.3.3/bin
./schematool -initSchema -dbType mysql
``` - 启动hive `hive #这里配置了环境变量后(/etc/profile),可以在任意目录下执行 ` - 实时查看日志启动hive命令(在hive安装目录的bin目录下执行): `./hive -hiveconf hive.root.logger=DEBUG,console` --- # HBASE INSTALL
- [下载hbase安装包:](http://hbase.apache.org/downloads.html) - 解压: `tar -zxvf hbase-1.2.6.1-bin.tar.gz` - 配置环境变量: (加在最后面) ```
vi /etc/profile
#HBase Setting
export HBASE_HOME=/app/hbase/hbase-1.2.6.1
export PATH=$PATH:$HBASE_HOME/bin
``` - 编辑配置文件: `hbase-env.sh` ```
# 默认为ture,表示使用内建的zk,false使用外部zk系统
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录
``` - 编辑配置文件: `hbase-site.xml`
在configuration节点添加如下配置: ```
<configuration>
<!-- 备份数据份数 -->
<name>dfs.replication</name> <value>1</value> </property> <!-- 配置hbase 在hadoop 中的根目录 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://10.28.85.149:9000/hbase</value>
</property> <!-- zk 监听的端口号,必须和zk系统监听的端口一致 -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!-- zk 属性文件中dataDir属性设置值一致 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/app/zookeeper/data</value>
</property> <!-- zk 根 znode 节点 -->
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property> <!-- hbase 是否是集群安装 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 如果你使用本地文件系统,LocalFileSystem 这个属性设置成 false -->
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>true</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
</description>
</property>
</configuration>
``` - 启动zookeeper
进入zookeeper的安装目录下的bin目录,执行 `./zkServer.sh` 然后启动客户端: ` ./zkCli.sh` 启动成功后,输入: ` create /hbase hbase` - 启动hbase 进入Hbase的bin目录: `./start-hbase.sh` ```
./hbase shell #这里启动成功后就可以开始执行hbase相关命令了
list #查看当前hbase库中的所有表,没有报错表示成功
``` - web访问HBASE: http://10.28.85.149:16010/master-status #ip为当前服务器的ip,端口为16010 --- # SQOOP INSTALL - [下载安装包](https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/) - 解压 ` tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz` 更改文件名: `mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0` - 配置环境变量: ```
#Sqoop Setting
export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin
``` - 将mysql的驱动包复制到 Sqoop安装目录的lib目录下
下载地址:https://dev.mysql.com/downloads/connector/j/ - 编辑配置文件: sqoop的安装目录下的 conf下 ```
vi sqoop-env.sh #Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7 #Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7 #set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1 #Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3 #Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
``` - 测试sqoop的安装
- sqoop help #可以查看到sqoop的相关命令 - 测试sqoop的连接: 查看此连接信息下的所有数据库 ```
sqoop list-databases \
--connect jdbc:mysql://10.28.85.148:3306/data_mysql2hive \
--username root \
--password Abcd1234
``` --- # oozie 安装
# 安装基于oozie-4.0.0-cdh5.3.6.tar.gz oozie 版本
安装之前准备条件:
- 可用的mysql数据库
- 已经安装好的hadoop集群
- oozie 最终编译好的安装包中 `oozie-server` 就是一个tomcat环境,不用另外安装tomcat 环境。
## 安装
- 下载编译后的压缩包:`wget http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6.tar.gz`
- 解压缩到所指定的目录 :`tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz`,这里使用的目录是: `/app/oozie`
- 设置全局环境变量:`sudo vim /etc/profile`
``` #oozie setting
export OOZIE_HOME=/app/oozie/oozie-4.0.0-cdh5.3.6
export PATH=$PATH:$OOZIE_HOME/bin
``` - 设置 ` Oozie安装目录/conf/oozie-env.sh ` 设置环境变量
同时oozie的web console 的端口也在这里进行设置:
`OOZIE_HTTP_PORT ` 设置 oozie web 服务的监听端口,默认是11000
``` export OOZIE_CONF=${OOZIE_HOME}/conf
export OOZIE_DATA=${OOZIE_HOME}/data
export OOZIE_LOG=${OOZIE_HOME}/logs
export CATALINA_BASE=${OOZIE_HOME}/oozie-server
export CATALINA_HOME=${OOZIE_HOME}/oozie-server
``` - 在Oozie根目录下创建libext文件夹,并将Oozie依赖的其他第三方jar移动到该目录下面。`mkdir libext` - 将下载的ext2.2添加到 libext 目录 :` cp ext-2.2.zip oozie-5.0.0/libext/`
- 添加hadoop lib下的包到libext目录,进入libext目录`cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/*.jar ./`和 ` cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/lib/*.jar ./`
- 添加对于存储元数据的mysql数据库的驱动(`mysql-connector-java-5.1.41.jar`) - hadoop 设置oozie 代理用户设置:
只需要替换xxx 为你oozie提交任务的用户名即可。
- hadoop.proxyuser.**xxx**.hosts - hadoop.proxyuser.**xxx**.groups
```
<!-- oozie -->
<property>
<name>hadoop.proxyuser.imodule.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.imodule.groups</name>
<value>*</value>
</property>
``` - 在hdfs上设置Oozie的公用jar文件夹, hadoop的默认端口号是8020,我改成了9000,所以这里注意一下: 遇到一个问题是:NameNode 处于 safe mode,需要关闭安全模式:`hdfs dfsadmin -safemode leave` ```
oozie-setup.sh sharelib create -fs hdfs://10.28.85.149:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
```
- 创建Oozie的war文件 先将hadoop相关包,mysql相关包,ext相关压缩包放到libext文件夹中,然后运行:`oozie-setup.sh prepare-war` 命令创建war包。 - oozie 安装目录conf/oozie-site.xml oozie.service.HadoopAccessorService.hadoop.configurations属性的值为本地hadoop目录的配置文件路径:
```
<configuration>
<property>
<name>oozie.services</name>
<value>
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.StoreService,
org.apache.oozie.service.CoordinatorStoreService,
org.apache.oozie.service.SLAStoreService,
org.apache.oozie.service.DBLiteWorkflowStoreService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.XLogStreamingService,
org.apache.oozie.service.JvmPauseMonitorService
</value>
</property>
<!-- 配置hadoop etc/hadoop目录 -->
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/app/hadoop/hadoop-2.7.7/etc/hadoop</value>
</property>
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>true</value>
</property> <property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://10.28.85.148:3306/ooize?createDatabaseIfNotExist=true</value>
</property> <property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
</property> <property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>Abcd1234</value>
</property> </configuration>
``` - 运行Oozie服务并检查是否安装完成
`oozied.sh run 或者oozied.sh start` (前者在前端运行,后者在后台运行)
- 关闭oozie 服务: `oozied.sh stop`
- 命令行检查oozie web 状态(`oozie admin -oozie http://10.28.85.149:11000/oozie -status `) 返回:`System mode: NORMAL`
- 然后使用shareliblist命令查看相关内容 `oozie admin -shareliblist -oozie http://localhost:11000/oozie`
- 页面访问:`http://10.28.85.149:11000/oozie/` **遇到 了一个问题** ```
Sep 03, 2018 4:36:47 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NullPointerException
at org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
at org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
at org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:154)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553)
at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:159)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
```
问题原因是工程目录下`WEB-INF/lib`目录和tomcat下lib目录都有servlet-api.jar ,jsp-api.jar 文件造成的。
`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `下 和`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/lib`两个目录下有具有相同的jar包造成了冲突。`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server`这个目录下就是oozie-server的tomcat 环境。目录下的lib目录就是tomcat运行时jar包。 解决办法:将`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `目录下的:servlet-api-2.5-6.1.14.jar, servlet-api-2.5.jar, jsp-api-2.1.jar 三个文件删除即可。 然后就可以顺利启动了
![](http://one17356s.bkt.clouddn.com/18-9-3/48205608.jpg) --- Pig的安装
# 前提
### hadoop 2.7.7 已安装
### jdk1.7+ # 安装
```
tar -xzvf pig-0.17.0.tar.gz # Pig setting export PIG_HOME=/app/pig/pig-0.17.0
export PATH=$PATH:$PIG_HOME/bin
```
# 测试 ```
-- 本地模式
pig -x local
-- mapreduce模式
pig -x mapreduce
```
![](http://one17356s.bkt.clouddn.com/18-8-28/13040171.jpg) ---
Hive/Hbase/Sqoop的安装教程的更多相关文章
-
Hive/hbase/sqoop的基本使用教程~
Hive/hbase/sqoop的基本使用教程~ ###Hbase基本命令start-hbase.sh #启动hbasehbase shell #进入hbase编辑命令 list ...
-
Centos搭建mysql/Hadoop/Hive/Hbase/Sqoop/Pig
目录: 准备工作 Centos安装 mysql Centos安装Hadoop Centos安装hive JDBC远程连接Hive Hbase和hive整合 Centos安装Hbase 准备工作: 配置 ...
-
HIVE 2.1.0 安装教程。(数据源mysql)
前期工作 安装JDK 安装Hadoop 安装MySQL 安装Hive 下载Hive安装包 可以从 Apache 其中一个镜像站点中下载最新稳定版的 Hive, apache-hive-2.1.0-bi ...
-
Hadoop 2.6.0-cdh5.4.0集群环境搭建和Apache-Hive、Sqoop的安装
搭建此环境主要用来hadoop的学习,因此我们的操作直接在root用户下,不涉及HA. Software: Hadoop 2.6.0-cdh5.4.0 Apache-hive-2.1.0-bin Sq ...
-
apache-hadoop-1.2.1、hbase、hive、mahout、nutch、solr安装教程
1 软件环境: VMware8.0 Ubuntu-12.10-desktop-i386 jdk-7u40-linux-i586.tar.gz hadoop-1.2.1.tar.gz eclipse-d ...
-
CDH5上安装Hive,HBase,Impala,Spark等服务
Apache Hadoop的服务的部署比較繁琐.须要手工编辑配置文件.下载依赖包等.Cloudera Manager以GUI的方式的管理CDH集群,提供向导式的安装步骤.因为须要对Hive,HBase ...
-
Hadoop、Zookeeper、Hbase分布式安装教程
参考: Hadoop安装教程_伪分布式配置_CentOS6.4/Hadoop2.6.0 Hadoop集群安装配置教程_Hadoop2.6.0_Ubuntu/CentOS ZooKeeper-3.3 ...
-
hive安装教程本地模式
1.安装模式介绍: Hive官网上介绍了Hive的3种安装方式,分别对应不同的应用场景. a.内嵌模式(元数据保村在内嵌的derby种,允许一个会话链接,尝试多个会话链接时会报错) b.本地模式(本地 ...
-
3.12-3.16 Hbase集成hive、sqoop、hue
一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...
随机推荐
-
iOS UITableView , UITableViewController ,UITableViewCell实现全国各省市遍历,选择相应的地区
我们先看一下效果 代码如下 首先是第一个页面 rootTableViewController.h #import <UIKit/UIKit.h> #im ...
-
zigbee学习之路(八):定时器1(中断)
一.前言 通过上次的实验,我们已经学会了定时器3的中断方式,这次,我们来看看定时器1通过中断怎么控制. 二.原理分析 定时器1的初始化跟前面提到的一样,也是要配置寄存器T1CTL,还要进行开中断的操作 ...
-
JavaScript和html5 canvas生成圆形印章
代码: function createSeal(id,company,name){ var canvas = document.getElementById(id); var context = ca ...
-
Strom简介,以及安装,和官方案例测试
一:简介 1.strom的两种形式 2.strom的特性 3.使用场景 4.集群架构 5.集群架构进程 6.组件 Nimbus 7.从节点Supervisor 8.组件worker 9.组件Execu ...
-
JavaScript继承学习笔记
JavaScript作为一个面向对象语言(JS是基于对象的),可以实现继承是必不可少的,但是由于本身并没有类的概念,所以不会像真正的面向对象编程语言通过类实现继承,但可以通过其他方法实现继承.(jav ...
-
CentOS FireFox Flash Player
yum install *firefox* yum install flash-plugin
-
GFStableList Adapter
STL中,list的优点是插入.删除性能极佳(时间复杂度只需O(1)即可),而且非常重要的在删除节点后,其迭代器不失效,但list查找却不擅长.map由于其实现的数据结构为rb-tree,因此,其插入 ...
-
Android 字体修改,所有的细节都在这里 | 开篇
版权声明: 本账号发布文章均来自公众号,承香墨影(cxmyDev),版权归承香墨影所有. 每周会统一更新到这里,如果喜欢,可关注公众号获取最新文章. 未经允许,不得转载. 序 在 Android 下使 ...
-
prime 又一个开源的基于graphql 的cms
prime 是一个开源的基于graphql 的cms,类似的已经又好多了,strapi 就是一个(graphql 是通过插件扩展的) graphcms 是一款不错的,但是是收费的,prime 是基于t ...
-
重启ssh服务出现Redirecting to /bin/systemctl restart sshd.service
转自:https://blog.csdn.net/caijunfen/article/details/70599138 CentOs 重启ssh服务的命令如下: # service sshd rest ...