Centos 7 搭建hadoop-2.6.0和spark1.6.0完全分布式集群教程 (最小化配置)

时间:2022-09-24 20:54:27


1、        环境

系统:centos 7

Hadoop version:hadoop-2.6.0

SSH 面密码登陆

JDK1.8.0_65

 

集群搭建步骤

准备软件工作去官网上下载相关的软件

启动hadoop用户的sudo 权限:visudoer  将用户配置到root用户组

在/opt目录下创建software目录并修改权限归属为hadoop用户

建立software 目录:sudo mkdir software

修改software 目录的所有者权限:sudo chown –R hadoop:hadoop software

 

1、        卸载系统自带的openjdk

rpm –qa | grep java

rpm –e –nodeps pakagename

 

 

2、        安装jdk

 

cd /opt/software

解压jdk:tar –zxvf ~/Downloads/jdk1.8.0_65.tar.gz

配置JDK的环境变量

vim ~/.bashrc

在文件的最下面添加jdk的路径

exportJAVA_HOME=/opt/software/jdk-1.8.0_65

export JRE_HOME=/$JAVA_HOME/jre

exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

exportCLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib

保存并推出

使环境变量生效:source ~/.bashrc

 

3、        ssh 配置密码登陆

生成公钥和私钥:ssh-keygen –t rsa –P【连续输入两个enter】

将master的公钥安装到所有worker上

ssh-copy-id -i ~/.ssh/id_rsa.pubhadoop@worker1

测试是否成功:ssh worker1 如果直接远程过去了说明成功了。

 

 

 

4、        安装hadoop-2.6.0

4.1、解压hadoop 到/opt/software目录下

tar –zxvf~/Downloads/hadoop-2.6.0.tar.gz

4.2、hadoop的环境变量

vim ~/.bashrc

4.3、将hadoop的安装路径配置到文件的最下面

exportHADOOP_HOME=/opt/software/hadoop-2.6.0

exportPATH=$PATH:HADOOP_HOME/bin:$HADOOP_HOME/sbin

使生效:source ~/.bashrc

4.4、到hadoop-2.6.0/etc/hadoop目录下修改如下文件

1、hadoop-env.sh

2、mapred-env.sh

3、yarn-env.sh

4、core-site.xml

5、hdfs-site.xml

6、mapred-site.xml

7、yarn-site.xml

8、slaves

4.4.1、分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME 位置添加上jdk的安装路径

exportJAVA_HOME=/opt/software/jdk1.8.0_65

 

4.4.2、配置core-site.xml  (最小化配置)

<configuration>

   <property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

   </property>

   <property>

<name>io.file.buffer.size</name>

<value>1031072</value>

   </property>

   <property>

<name>hadoop.tmp.dir</name>

<value>/opt/software/hadoop-2.6.0/tmp</value>

  </property>

</configuration>

4.4.3、配置 hdfs-site.xml (最小化配置)

<configuration>

       <property>

       <name>dfs.replication</name>

       <value>2</value>

    </property>

    <property>

       <name>dfs.namenode.name.dir</name>

       <value>/opt/software/hadoop-2.6.0/dfs/name</value>

    </property>

    <property>

       <name>dfs.blocksize</name>

       <value>268435456</value>

    </property>

    <property>

       <name>dfs.namenode.handle.count</name>

       <value>100</value>

    </property>

    <property>

       <name>dfs.datanode.data.dir</name>

       <value>/opt/software/hadoop-2.6.0/dfs/data</value>

     </property>

</configuration>

 

 

4.4.4、mapred-site.xml的配置(最小化配置)

<property>

              <name>mapreduce.framework.name</name>

              <value>yarn</value>

       </property> 

 

4.4.5、yarn-site.xml的配置(最小化配置)

      <property>

              <name>yarn.resourcemanager.hostname</name>

              <value>master</value>

       </property>

       <property>

              <name>yarn.nodemanager.aux-services</name>

              <value>mapreduce_shuffle</value>

       </property>

 

4.4.5、slaves配置

将所有worker的hostname加入到这个文件中

 

5、        集群格式化

To starta Hadoop cluster you will need to start both the HDFS and YARN cluster.

Format anew distributed filesystem:

$$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

Start theHDFS with the following command, run on the designated NameNode:

$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstart namenode

Run ascript to start DataNodes on all slaves:

$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstart datanode

Start theYARN with the following command, run on the designated ResourceManager:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR startresourcemanager

Run ascript to start NodeManagers on all slaves:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR startnodemanager

Start astandalone WebAppProxy server. If multiple servers are used with load balancingit should be run on each of them:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config$HADOOP_CONF_DIR

Start theMapReduce JobHistory Server with the following command, run on the designatedserver:

$$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config$HADOOP_CONF_DIR

HadoopShutdown

Stop theNameNode with the following command, run on the designated NameNode:

$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstop namenode

Run ascript to stop DataNodes on all slaves:

$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstop datanode

Stop theResourceManager with the following command, run on the designatedResourceManager:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stopresourcemanager

Run ascript to stop NodeManagers on all slaves:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stopnodemanager

Stop theWebAppProxy server. If multiple servers are used with load balancing it shouldbe run on each of them:

$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config$HADOOP_CONF_DIR

Stop theMapReduce JobHistory Server with the following command, run on the designatedserver:

$$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config$HADOOP_CONF_DIR

 

 

spark1.6.0 集群的搭建

 

1、配置好hadoop集群

 

2、解压spark-1.6.0-bin-hadoop-2.6.0.tar.gz

 

3、配置spark的系统环境变量  ~/.bashrc  和  spark-env.sh

 

exportJAVA_HOME=/opt/software/jdk1.8.0_65

exportSCALA_HOME=/opt/software/scala-2.11.8

exportHADOO_HOME=/opt/software/hadoop-2.6.0

exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

exportSPARK_MASTER_IP=master

exportSPARK_WORKER_MEMORY=3g

exportSPARK_EXCUTOR_MEMORY=3g

exportSPARK_DRIVER_MEMORY=3g

exportSPARK_WORK_CORES=8

填加到spark-env.sh文件的末尾

 

# SparkDIR

exportSPARK_HOME=/opt/software/spark-1.6.0-bin-hadoop2.6

exportPATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

添加到~/.bashrc最下面,然后source ~/.bashrc

 

4、配置spark-default.conf.template

 

首先修改文件名称cp spark-default.conf.template spark-default.conf

添加配置信息

vimspark-default.conf

 

spark.excutor.extraJavaOptions      -XX:+PrintGCDetails -Dkey=value-Dnumbers="one two three"

spark.eventLog.enabled              true

spark.eventLog.dir                 hdfs://master:9000/historyserverforSpark

spark.yarn.historyServer.address    master:18080

spark.history.fs.logDirectory       hdfs://master:9000/historyserverforSpark

 

5、修改conf/slave

cpslave.template slave

vim slave将worker添加进去

worker1

worker2

worker3

6、测试是否安装成功

cd$SPARK_HOME/sbin 

./start-all.sh

 

jps

 

如果看到master 进程和 worker 进程就ok