集群组件:centos7+jdk1.8+hadoop-2.6.5+zookeeper3.4.12+hbase1.2.1+hive 2.1.1
虚拟机搭建的集群
宿主机IP:192.168.174.1
提供的网关IP:192.168.174.2
三台
192.168.174.101
192.168.174.102
192.168.174.103
安装系统、
配置静态IP
修改/etc/sysconfig/network-scripts/ifcfg-ens33(可能名字不一样)
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.174.101
NETMASK=255.255.255.0
GATEWAY=192.168.174.2
DNS1=8.8.8.8
DNS2=8.8.8.4
重启网络
/etc/init.d/network restart
ping www.baidu.com
连通表示正常
安装lrzsz
yum install lrzsz
判断Linux是32位还是64位
方法一:getconf LONG_BIT
方法二:uname -a
如果是64位机器,会输出x86_64,否则代表该机器是32位的
安装与卸载Jdk1.8
去官网下载jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
解压到安装目录
安装指令(用hadoop账号安装) tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/java/
安装完毕之后在/etc/profile文件末尾添加环境变量
[root@bogon software]# vi /etc/profile export JAVA_HOME=/usr/java/jdk1.8.0_171 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
使/etc/profile生效(使用hadoop账号)
[root@bogon jdk1.8.0_101]# source /etc/profile
检测安装是否成功
java -version
安装Hadoop2.6.5
(1)下载hadoop安装包,放到/home/hadoop目录下
(2)解压,输入命令,tar -xzvf hadoop-2.6.5.tar.gz
(3)在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name
总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs、mapreduce等主从关系。
SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys
(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中1行的注释,每台服务器都要设置,
#PubkeyAuthentication yes
(2)输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,
(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,
cat id_rsa.pub>> authorized_keys
ssh root@192.168.174.102 cat ~/.ssh/id_rsa.pub>> authorized_keys
ssh root@192.168.174.103 cat ~/.ssh/id_rsa.pub>> authorized_keys
(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录
(5)完成,ssh root@192.168.174.102、ssh root@192.168.174.103就不需要输入密码了
配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node1.zzy.com:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> </configuration>
配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node1.zzy.com:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的mapred-site.xml(没有的话,需要重命名那个临时的)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.174.101:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.174.101:19888</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>192.168.174.101:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>192.168.174.101:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>192.168.174.101:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>192.168.174.101:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.174.101:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>768</value> </property> </configuration>
配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了,
export JAVA_HOME=/usr/java/jdk1.8.0_171
配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点,
192.168.174.102
192.168.174.103
将配置好的Hadoop复制到各个节点对应位置上,通过scp传送,
scp -r /home/hadoop 192.168.174.102:/home/
scp -r /home/hadoop 192.168.174.103:/home/
在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.6.5目录
(1)初始化,输入命令,bin/hdfs namenode -format
(2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh
(3)停止的话,输入命令,sbin/stop-all.sh 也可以分开stop-dfs.sh 、stop-yarn.sh
(4)输入命令,jps,可以看到相关信息
Web访问,要先开放端口或者直接关闭防火墙
(1)输入命令,systemctl stop firewalld.service
(2)浏览器打开http://192.168.174.101:8088
(3)浏览器打开http://192.168.174.101:50070
至此,hadoop安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用。
//关闭防火墙
systemctl stop firewalld
//禁止开机启动
systemctl disable firewalld
//检测状态
systemctl status firewalld
参考文章:https://blog.csdn.net/sa14023053/article/details/51953836
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CentOS7中执行 service iptables start/stop 会报错Failed to start iptables.service: Unit iptables.service failed to load: No such file or directory.
在CentOS 7或RHEL 7或Fedora中防火墙由firewalld来管理,
如果要添加范围例外端口 如 1000-2000
语法命令如下:启用区域端口和协议组合
firewall-cmd [--zone=<zone>] --add-port=<port>[-<port>]/<protocol> [--timeout=<seconds>]
此举将启用端口和协议的组合。端口可以是一个单独的端口 <port> 或者是一个端口范围 <port>-<port> 。协议可以是 tcp 或 udp。
实际命令如下:
添加
firewall-cmd --zone=public --add-port=80/tcp --permanent (--permanent永久生效,没有此参数重启后失效)
firewall-cmd --zone=public --add-port=1000-2000/tcp --permanent
重新载入
firewall-cmd --reload
查看
firewall-cmd --zone=public --query-port=80/tcp
删除
firewall-cmd --zone=public --remove-port=80/tcp --permanent
当然你可以还原传统的管理方式。
执行一下命令:
- systemctl stop firewalld
- systemctl mask firewalld
并且安装iptables-services:
- yum install iptables-services
设置开机启动:
systemctl start iptables
systemctl restart iptables
systemctl reload iptables
保存设置:
- service iptables save
OK,再试一下应该就好使了
Call From localhost/127.0.0.1 to 192.168.174.101:9000 failed on connection exception: java.net.ConnectException: 拒绝连接
未开启服务
参考博客
https://www.linuxidc.com/Linux/2015-11/124800.htm
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
zookeeper安装
---zoo.cfg配置文件从临时文件重命名来的,里面不能用环境变量
---myid是自己创建的,里面的内容就是zoo.cfg里分配的本机的id
---只有当至少启动了三个节点之后,该命令(./bin/zkServer.sh status)才会产生结果。否则会显示:zookeeper Error contacting service. It is probably not running错误
---启动不了就要查看zookeeper.out日志文件
scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.103:/home/hadoop/
systemctl start firewalld
systemctl stop firewalld
systemctl status firewalld
firewall-cmd --zone=public --add-port=2888/tcp --permanent
firewall-cmd --zone=public --add-port=3888/tcp --permanent
firewall-cmd --zone=public --add-port=2181/tcp --permanent
firewall-cmd --reload
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/home/hadoop/zookeeper-3.4.12/data
dataLogDir=/home/hadoop/zookeeper-3.4.12/logs
server.1=node1.zzy.com:3181:4181
server.2=node2.zzy.com:3181:4181
server.3=node3.zzy.com:3181:4181
ssh -v -p 2888 hadoop@192.168.174.102
./bin/zkServer.sh start
./bin/zkServer.sh stop
./bin/zkServer.sh restart
./bin/zkServer.sh status
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12/
export PATH=$ZOOKEEPER_HOME/bin:$PATH
export PATH
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HBASE 安装
ZK_HOME=/home/hadoop/zookeeper-3.4.12
HBASE_HOME=/home/hadoop/hbase-1.2.1
hbase-env.sh----
export JAVA_HOME=/usr/java/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HBASE_HOME=/home/hadoop/hbase-1.2.1
export HBASE_CLASSPATH=/home/hadoop/hadoop-2.6.5/etc/hadoop
export HBASE_PID_DIR=/home/hadoop/hbase/pids
export HBASE_MANAGES_ZK=false
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://node1:9000/hbase</value>
<description>The directory shared byregion servers.</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>12181</value>
<description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>node1,node2,node3</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/hbase/tmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
scp -r /home/hadoop/hbase-1.2.1 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/hbase-1.2.1 192.168.174.103:/home/hadoop/
scp -r /home/hadoop/hbase 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/hbase 192.168.174.103:/home/hadoop/
最终的环境变量为
export JAVA_HOME=/usr/java/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export ZK_HOME=/home/hadoop/zookeeper-3.4.12
export HBASE_HOME=/home/hadoop/hbase-1.2.1
export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin
后台地址 http://192.168.174.101:16030/
参考:https://blog.csdn.net/pucao_cug/article/details/72229223
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
安装hive 2.1.1
export HIVE_HOME=/home/hadoop/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin
hive-site.xml
<property
<name>javax.jdo.option.ConnectionDriverName</name
<value>com.mysql.jdbc.Driver</value>
</property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.174.1/hive?createDatabaseIfNotExist=true</value>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<name>javax.jdo.option.ConnectionPassword</name><value>*******</value>
hive-env.sh文件
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HIVE_CONF_DIR=/home/hadoop/hive/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/hive/lib
对MySQL数据库初始化
cd $HIVE_HOME/bin
schematool -initSchema -dbType mysql
启动命令行 ./hive
参考文章:https://blog.csdn.net/jssg_tzw/article/details/72354470
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
scala安装---------
export SCALA_HOME=/home/hadoop/scala-2.10.4
export PATH=$PATH:${SCALA_HOME}/bin
spark安装---------
export SPARK_HOME=/home/hadoop/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:${SPARK_HOME}/bin
spark-env.sh----
export SCALA_HOME=/home/hadoop/scala-2.10.4
export JAVA_HOME=/usr/java/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_MASTER_IP=192.168.170.101
export SPARK_LOCAL_DIRS=/home/hadoop/spark-1.6.0-bin-hadoop2.6
export SPARK_WORKER_MEMORY=1g
scp -r /home/hadoop/scala-2.10.4 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/scala-2.10.4 192.168.174.103:/home/hadoop/
scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.103:/home/hadoop/
遇到的问题:提交job到yarn上,一直提示 Application report for application_1530263181961_0002 (state: ACCEPTED)
思路:
集群检测到资源不够用,有可能真的不够用,也有可能datanode异常,检测不到。
1.检测namenode,ResourceManger服务已启动。检测DataNode nodeManger已启动 ,并确定状态正常(running)
2.提交job参数driver-memory设置不大于500M(最少好像是这个范围),executor-memory设置几十M差不多能跑起来就行
最终我的脚本是
spark-submit \
--master yarn-cluster \
--num-executors 1 \
--executor-memory 20m \
--executor-cores 1 \
--driver-memory 512m \
--class local.test201806.YarnTest \
sparkdemo-1.0-SNAPSHOT.jar
---------------------------------------------------------------------------------------------
Telnet安装
-
yum install telnet-server 安装telnet服务
-
yum install telnet.* 安装telnet客户端