版本要求
java
版本:1.8.*(1.8.0_60)
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
scala
版本:2.11.*(2.11.8)
下载地址:http://www.scala-lang.org/download/2.11.8.html
zookeeper
版本:3.4.*(zookeeper-3.4.8)
下载地址:http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/
spark
版本:2.0.*(spark-2.1.0-bin-hadoop2.7)
下载地址:http://spark.apache.org/downloads.html
spark安装
前置条件
java安装
见链接http://www.cnblogs.com/molyeo/p/7007917.html
scala安装
见链接http://www.cnblogs.com/molyeo/p/7007917.html
zookeeper安装
见链接http://www.cnblogs.com/molyeo/p/7048867.html
压缩解压
tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
mv spark-2.1.0-bin-hadoop2.7 spark
环境变量
vi ~/.bash_profile
export JAVA_HOME=/wls/oracle/jdk
export SCALA_HOME=/wls/oracle/scala
export ZOOKEEPER_HOME=/wls/oracle/zookeeper
export HADOOP_HOME=/wls/oracle/hadoop
export HBASE_HOME=/wls/oracle/hbase
export SPARK_HOME=/wls/oracle/spark
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tool.jar
export JAVA_HOME SCALA_HOME ZOOKEEPER_HOME HADOOP_HOME SPARK_HOME
配置更改
spark集群主要涉及到如下配置文件的变更
spark-default.conf
spark-env.sh
slaves
spark-default.conf
spark.master spark://SZB-L0045546:7077
spark.executor.memory 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.cores.max 10
spark.scheduler.mode FIFO
spark.shuffle.compress true
spark.ui.port 4040
spark.eventLog.enabled true
spark.eventLog.dir /wls/oracle/bigdata/spark/sparkeventlog
spark.kryoserializer.buffer 512m
spark.rpc.numRetries 5
spark.port.maxRetries 16
spark.rpc.askTimeout 120s
spark.network.timeout 120s
spark.rpc.lookupTimeout 120s
spark.executor.extraJavaOptions -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=40 -XX:G1ReservePercent=10 -XX:G1HeapRegionSize=8M -XX:MaxPermSize=300M -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
spark-env.sh
export JAVA_HOME=/wls/oracle/jdk
export SCALA_HOME=/wls/oracle/scala
export SPARK_HOME=/wls/oracle/spark
export HADOOP_HOME=/wls/oracle/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
export SPARK_WORKER_MEMORY=28g
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_CORES=8
export SPARK_PID_DIR=/wls/oracle/bigdata/spark/sparkpids
export SPARK_LOCAL_DIRS=/wls/oracle/bigdata/spark/sparkdata
export SPARK_WORKER_DIR=/wls/oracle/bigdata/spark/sparkwork
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=300"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=SZB-L0045546,SZB-L0045551,SZB-L0045552 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_HOME/lib/*:$SPARK_CLASSPATH
:SCALA_HOME/lib/*:$SPARK_CLASSPATH:$SPARK_HOME/jars/*"
其中$SPARK_HOME/jars/为spark系统自带的jar包,而$SPARK_HOME/lib/为我们自定义的外部jar包,如kafka,mongo等应用相关jar包
slaves
列出所有worker节点的主机名
SZB-L0045551
SZB-L0045552
SZB-L0047815
SZB-L0047816
运维命令
启动集群
/wls/oracle/spark/sbin/start_all.sh
停止集群
/wls/oracle/spark/sbin/start_all.sh