1、 环境
系统:centos 7
Hadoop version:hadoop-2.6.0
SSH 面密码登陆
JDK1.8.0_65
集群搭建步骤
准备软件工作去官网上下载相关的软件
启动hadoop用户的sudo 权限:visudoer 将用户配置到root用户组
在/opt目录下创建software目录并修改权限归属为hadoop用户
建立software 目录:sudo mkdir software
修改software 目录的所有者权限:sudo chown –R hadoop:hadoop software
1、 卸载系统自带的openjdk
rpm –qa | grep java
rpm –e –nodeps pakagename
2、 安装jdk
cd /opt/software
解压jdk:tar –zxvf ~/Downloads/jdk1.8.0_65.tar.gz
配置JDK的环境变量
vim ~/.bashrc
在文件的最下面添加jdk的路径
exportJAVA_HOME=/opt/software/jdk-1.8.0_65
export JRE_HOME=/$JAVA_HOME/jre
exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
exportCLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib
保存并推出
使环境变量生效:source ~/.bashrc
3、 ssh 配置密码登陆
生成公钥和私钥:ssh-keygen –t rsa –P【连续输入两个enter】
将master的公钥安装到所有worker上
ssh-copy-id -i ~/.ssh/id_rsa.pubhadoop@worker1
测试是否成功:ssh worker1 如果直接远程过去了说明成功了。
4、 安装hadoop-2.6.0
4.1、解压hadoop 到/opt/software目录下
tar –zxvf~/Downloads/hadoop-2.6.0.tar.gz
4.2、hadoop的环境变量
vim ~/.bashrc
4.3、将hadoop的安装路径配置到文件的最下面
exportHADOOP_HOME=/opt/software/hadoop-2.6.0
exportPATH=$PATH:HADOOP_HOME/bin:$HADOOP_HOME/sbin
使生效:source ~/.bashrc
4.4、到hadoop-2.6.0/etc/hadoop目录下修改如下文件
1、hadoop-env.sh
2、mapred-env.sh
3、yarn-env.sh
4、core-site.xml
5、hdfs-site.xml
6、mapred-site.xml
7、yarn-site.xml
8、slaves
4.4.1、分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME 位置添加上jdk的安装路径
exportJAVA_HOME=/opt/software/jdk1.8.0_65
4.4.2、配置core-site.xml (最小化配置)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>1031072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/software/hadoop-2.6.0/tmp</value>
</property>
</configuration>
4.4.3、配置 hdfs-site.xml (最小化配置)
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/software/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handle.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/software/hadoop-2.6.0/dfs/data</value>
</property>
</configuration>
4.4.4、mapred-site.xml的配置(最小化配置)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
4.4.5、yarn-site.xml的配置(最小化配置)
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
4.4.5、slaves配置
将所有worker的hostname加入到这个文件中
5、 集群格式化
To starta Hadoop cluster you will need to start both the HDFS and YARN cluster.
Format anew distributed filesystem:
$$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
Start theHDFS with the following command, run on the designated NameNode:
$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstart namenode
Run ascript to start DataNodes on all slaves:
$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstart datanode
Start theYARN with the following command, run on the designated ResourceManager:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR startresourcemanager
Run ascript to start NodeManagers on all slaves:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR startnodemanager
Start astandalone WebAppProxy server. If multiple servers are used with load balancingit should be run on each of them:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config$HADOOP_CONF_DIR
Start theMapReduce JobHistory Server with the following command, run on the designatedserver:
$$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config$HADOOP_CONF_DIR
Stop theNameNode with the following command, run on the designated NameNode:
$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstop namenode
Run ascript to stop DataNodes on all slaves:
$$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfsstop datanode
Stop theResourceManager with the following command, run on the designatedResourceManager:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stopresourcemanager
Run ascript to stop NodeManagers on all slaves:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stopnodemanager
Stop theWebAppProxy server. If multiple servers are used with load balancing it shouldbe run on each of them:
$$HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config$HADOOP_CONF_DIR
Stop theMapReduce JobHistory Server with the following command, run on the designatedserver:
$$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config$HADOOP_CONF_DIR
spark1.6.0 集群的搭建
1、配置好hadoop集群
2、解压spark-1.6.0-bin-hadoop-2.6.0.tar.gz
3、配置spark的系统环境变量 ~/.bashrc 和 spark-env.sh
将
exportJAVA_HOME=/opt/software/jdk1.8.0_65
exportSCALA_HOME=/opt/software/scala-2.11.8
exportHADOO_HOME=/opt/software/hadoop-2.6.0
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportSPARK_MASTER_IP=master
exportSPARK_WORKER_MEMORY=3g
exportSPARK_EXCUTOR_MEMORY=3g
exportSPARK_DRIVER_MEMORY=3g
exportSPARK_WORK_CORES=8
填加到spark-env.sh文件的末尾
将
# SparkDIR
exportSPARK_HOME=/opt/software/spark-1.6.0-bin-hadoop2.6
exportPATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
添加到~/.bashrc最下面,然后source ~/.bashrc
4、配置spark-default.conf.template
首先修改文件名称cp spark-default.conf.template spark-default.conf
添加配置信息
vimspark-default.conf
spark.excutor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value-Dnumbers="one two three"
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/historyserverforSpark
spark.yarn.historyServer.address master:18080
spark.history.fs.logDirectory hdfs://master:9000/historyserverforSpark
5、修改conf/slave
cpslave.template slave
vim slave将worker添加进去
worker1
worker2
worker3
6、测试是否安装成功
cd$SPARK_HOME/sbin
./start-all.sh
jps
如果看到master 进程和 worker 进程就ok