大数据平台环境搭建

时间:2022-04-27 14:18:35

集群组件:centos7+jdk1.8+hadoop-2.6.5+zookeeper3.4.12+hbase1.2.1+hive 2.1.1

虚拟机搭建的集群

宿主机IP:192.168.174.1
提供的网关IP:192.168.174.2
三台
192.168.174.101
192.168.174.102
192.168.174.103

 

安装系统、

配置静态IP

修改/etc/sysconfig/network-scripts/ifcfg-ens33(可能名字不一样)

BOOTPROTO=static

ONBOOT=yes

IPADDR=192.168.174.101
NETMASK=255.255.255.0
GATEWAY=192.168.174.2
DNS1=8.8.8.8
DNS2=8.8.8.4

重启网络

/etc/init.d/network restart

ping www.baidu.com

连通表示正常

 

 

安装lrzsz

yum install lrzsz

 

判断Linux是32位还是64位

方法一:getconf LONG_BIT

方法二:uname -a

如果是64位机器,会输出x86_64,否则代表该机器是32位的

 

安装与卸载Jdk1.8

去官网下载jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 
解压到安装目录

安装指令(用hadoop账号安装) tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/java/

 

安装完毕之后在/etc/profile文件末尾添加环境变量

[root@bogon software]# vi /etc/profile export JAVA_HOME=/usr/java/jdk1.8.0_171 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH

使/etc/profile生效(使用hadoop账号)

[root@bogon jdk1.8.0_101]# source /etc/profile

检测安装是否成功

  java -version

 

安装Hadoop2.6.5

(1)下载hadoop安装包,放到/home/hadoop目录下
(2)解压,输入命令,tar -xzvf hadoop-2.6.5.tar.gz
(3)在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name

 

总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs、mapreduce等主从关系。

 

SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys
(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中1行的注释,每台服务器都要设置,
#PubkeyAuthentication yes

(2)输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,
(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,
cat id_rsa.pub>> authorized_keys
ssh root@192.168.174.102 cat ~/.ssh/id_rsa.pub>> authorized_keys
ssh root@192.168.174.103 cat ~/.ssh/id_rsa.pub>> authorized_keys
(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录
(5)完成,ssh root@192.168.174.102、ssh root@192.168.174.103就不需要输入密码了

 

配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的core-site.xml
 

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1.zzy.com:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop-${user.name}</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>
</configuration>

 

配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的hdfs-site.xml
  

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node1.zzy.com:9001</value>
    </property>
    <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
    </property>
</configuration>

 

配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的mapred-site.xml(没有的话,需要重命名那个临时的)
  

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>192.168.174.101:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.174.101:19888</value>
    </property>
     <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.174.101:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.174.101:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.174.101:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.174.101:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.174.101:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>768</value>
    </property>
</configuration>

 

配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了,
export JAVA_HOME=/usr/java/jdk1.8.0_171

 

配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点,

192.168.174.102
192.168.174.103

 

将配置好的Hadoop复制到各个节点对应位置上,通过scp传送,
scp -r /home/hadoop 192.168.174.102:/home/
scp -r /home/hadoop 192.168.174.103:/home/

 

在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.6.5目录
(1)初始化,输入命令,bin/hdfs namenode -format
(2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh
(3)停止的话,输入命令,sbin/stop-all.sh 也可以分开stop-dfs.sh 、stop-yarn.sh
(4)输入命令,jps,可以看到相关信息

 

Web访问,要先开放端口或者直接关闭防火墙
(1)输入命令,systemctl stop firewalld.service
(2)浏览器打开http://192.168.174.101:8088
(3)浏览器打开http://192.168.174.101:50070

至此,hadoop安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用。

 

//关闭防火墙
systemctl stop firewalld
//禁止开机启动
systemctl disable firewalld
//检测状态
systemctl status firewalld

 

参考文章:https://blog.csdn.net/sa14023053/article/details/51953836

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

CentOS7中执行 service iptables start/stop   会报错Failed to start iptables.service: Unit iptables.service failed to load: No such file or directory.

在CentOS 7或RHEL 7或Fedora中防火墙由firewalld来管理,

如果要添加范围例外端口 如 1000-2000
语法命令如下:启用区域端口和协议组合
firewall-cmd [--zone=<zone>] --add-port=<port>[-<port>]/<protocol> [--timeout=<seconds>]
此举将启用端口和协议的组合。端口可以是一个单独的端口 <port> 或者是一个端口范围 <port>-<port> 。协议可以是 tcp 或 udp。
实际命令如下:

添加

firewall-cmd --zone=public --add-port=80/tcp --permanent (--permanent永久生效,没有此参数重启后失效)

firewall-cmd --zone=public --add-port=1000-2000/tcp --permanent 

重新载入
firewall-cmd --reload
查看
firewall-cmd --zone=public --query-port=80/tcp
删除
firewall-cmd --zone=public --remove-port=80/tcp --permanent

 

 

当然你可以还原传统的管理方式。

执行一下命令:

 
  1. systemctl stop firewalld  
  2. systemctl mask firewalld  


并且安装iptables-services:

  1. yum install iptables-services  


设置开机启动: 

systemctl enable iptables 
systemctl stop iptables  

systemctl start iptables  

systemctl restart iptables  

systemctl reload iptables  


保存设置:

 
  1. service iptables save  

 

OK,再试一下应该就好使了

 

 

 Call From localhost/127.0.0.1 to 192.168.174.101:9000 failed on connection exception: java.net.ConnectException: 拒绝连接

 未开启服务

 

参考博客

https://www.linuxidc.com/Linux/2015-11/124800.htm

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

 zookeeper安装

---zoo.cfg配置文件从临时文件重命名来的,里面不能用环境变量

---myid是自己创建的,里面的内容就是zoo.cfg里分配的本机的id

---只有当至少启动了三个节点之后,该命令(./bin/zkServer.sh status)才会产生结果。否则会显示:zookeeper Error contacting service. It is probably not running错误

---启动不了就要查看zookeeper.out日志文件

scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.103:/home/hadoop/

systemctl start firewalld
systemctl stop firewalld
systemctl status firewalld

firewall-cmd --zone=public --add-port=2888/tcp --permanent

firewall-cmd --zone=public --add-port=3888/tcp --permanent

firewall-cmd --zone=public --add-port=2181/tcp --permanent

firewall-cmd --reload

tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2

dataDir=/home/hadoop/zookeeper-3.4.12/data
dataLogDir=/home/hadoop/zookeeper-3.4.12/logs


server.1=node1.zzy.com:3181:4181
server.2=node2.zzy.com:3181:4181
server.3=node3.zzy.com:3181:4181

ssh -v -p 2888 hadoop@192.168.174.102


./bin/zkServer.sh start
./bin/zkServer.sh stop
./bin/zkServer.sh restart
./bin/zkServer.sh status


export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12/
export PATH=$ZOOKEEPER_HOME/bin:$PATH
export PATH

 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 HBASE 安装

ZK_HOME=/home/hadoop/zookeeper-3.4.12
HBASE_HOME=/home/hadoop/hbase-1.2.1


hbase-env.sh----

export JAVA_HOME=/usr/java/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HBASE_HOME=/home/hadoop/hbase-1.2.1
export HBASE_CLASSPATH=/home/hadoop/hadoop-2.6.5/etc/hadoop
export HBASE_PID_DIR=/home/hadoop/hbase/pids
export HBASE_MANAGES_ZK=false


hbase-site.xml

<configuration>

<property>
<name>hbase.rootdir</name>
<value>hdfs://node1:9000/hbase</value>
<description>The directory shared byregion servers.</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>12181</value>
<description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>node1,node2,node3</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/hbase/tmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

</configuration>


scp -r /home/hadoop/hbase-1.2.1 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/hbase-1.2.1 192.168.174.103:/home/hadoop/


scp -r /home/hadoop/hbase 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/hbase 192.168.174.103:/home/hadoop/

 

最终的环境变量为

export JAVA_HOME=/usr/java/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export ZK_HOME=/home/hadoop/zookeeper-3.4.12
export HBASE_HOME=/home/hadoop/hbase-1.2.1
export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin

 


后台地址 http://192.168.174.101:16030/

参考:https://blog.csdn.net/pucao_cug/article/details/72229223

 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

安装hive 2.1.1

export HIVE_HOME=/home/hadoop/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin

 

hive-site.xml

<property
<name>javax.jdo.option.ConnectionDriverName</name
<value>com.mysql.jdbc.Driver</value>
</property>

<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.174.1/hive?createDatabaseIfNotExist=true</value>


<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>

<name>javax.jdo.option.ConnectionPassword</name><value>*******</value>


hive-env.sh文件

export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HIVE_CONF_DIR=/home/hadoop/hive/conf
export HIVE_AUX_JARS_PATH=/home/hadoop/hive/lib

对MySQL数据库初始化
cd $HIVE_HOME/bin
schematool -initSchema -dbType mysql

启动命令行  ./hive

参考文章:https://blog.csdn.net/jssg_tzw/article/details/72354470

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


scala安装---------

export SCALA_HOME=/home/hadoop/scala-2.10.4
export PATH=$PATH:${SCALA_HOME}/bin

 

spark安装---------

export SPARK_HOME=/home/hadoop/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:${SPARK_HOME}/bin


spark-env.sh----

export SCALA_HOME=/home/hadoop/scala-2.10.4
export JAVA_HOME=/usr/java/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_MASTER_IP=192.168.170.101
export SPARK_LOCAL_DIRS=/home/hadoop/spark-1.6.0-bin-hadoop2.6
export SPARK_WORKER_MEMORY=1g

scp -r /home/hadoop/scala-2.10.4 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/scala-2.10.4 192.168.174.103:/home/hadoop/

 

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.102:/home/hadoop/
scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.103:/home/hadoop/

 

遇到的问题:提交job到yarn上,一直提示 Application report for application_1530263181961_0002 (state: ACCEPTED)

思路:

集群检测到资源不够用,有可能真的不够用,也有可能datanode异常,检测不到。

1.检测namenode,ResourceManger服务已启动。检测DataNode nodeManger已启动 ,并确定状态正常(running)

2.提交job参数driver-memory设置不大于500M(最少好像是这个范围),executor-memory设置几十M差不多能跑起来就行

最终我的脚本是

spark-submit \
--master yarn-cluster \
--num-executors 1 \
--executor-memory 20m \
--executor-cores 1 \
--driver-memory 512m \
--class local.test201806.YarnTest \
sparkdemo-1.0-SNAPSHOT.jar

 

 

---------------------------------------------------------------------------------------------

 

 

Telnet安装

  1. yum install telnet-server          安装telnet服务

  2. yum install telnet.*           安装telnet客户端