集群规划
以三台服务器hadoop02、hadoop03、hadoop04为例,集群规划如下表所示:
搭建过程
1.所需环境
服务器已经安装好JDK-1.8,Scala -2.11.8,hadoop,zookeeper
2.下载Spark
http://spark.apache.org/downloads.html
3.上传安装包,解压,重命名
tar - zxvf spark-2.1.0-bin-hadoop2.6.tgz -C /home/hadoop/apps
mv spark-2.1.0-bin-hadoop2.6 spark
4.修改配置文件
a.进入conf目录并重命名并修改spark-env.sh.template文件
cd /home/apps/spark/conf/
mv spark-env.sh.template spark-env.sh
vi spark-env.sh
在该配置文件中添加如下配置:
export JAVA_HOME=/usr/local/java/jdk1.8.0_73
export SPARK_MASTER_PORT=7077
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop02:2181,hadoop03:2181,hadoop04:2181 -Dspark.deploy.zookeeper.dir=/spark"
#配置Spark on yarn
export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop-2.6.5/etc/hadoop
b.重命名并修改slaves.template文件
mv slaves.template slaves
vi slaves
在该文件中添加子节点所在的位置(Worker节点)
hadoop02
hadoop03
hadoop04
5.修改sbin目录下的启动文件(因为start-all.sh命令和hadoop冲突)
mv start-all.sh start-spark.sh
mv stop-all.sh stop-spark.sh
6.分发至各个节点
scp -r spark hadoop@hadoop03:$PWD
scp -r spark hadoop@hadoop04:$PWD
7.添加环境变量
vi .bashrc
export SPARK_HOME=/home/hadoop/apps/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin
source .bashrc
8.启动集群
首先各节点上执行zkServer.sh start启动zookeeper,在hadoop02上执行sbin/start-spark.sh脚本,然后在hadoop03上执行sbin/start-master.sh启动第二个Master。
History Server
hadoop的HistoryServer只能保存Mapreduce任务,因此spark应该使用自己的HistoryServer。如果要使用HistoryServer的功能,还应进行如下配置:
1.在启动之前在HDFS上创建/sparklog目录
2.vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop02:9000/sparklog
3.vi spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=25 -Dspark.history.fs.logDirectory=hdfs://hadoop02:9000/sparklog"
4.启动HistoryServer
start-history-server.sh