一. 软件
- centos6.5
- jdk1.7
- hadoop-2.6.1.tar.gz(在64位平台重新编译好的版本)
- scala2.11.7.tgz
- spark-1.5.0-bin-hadoop2.6.tgz
二. 安装前准备
1. 在系统全局安装jdk
a. 解压
b. 配置环境变量(可以在/etc/profile.d/下面配置)
export JAVA_HOME=/usr/java/jdk1.7.0_21
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATHsource /etc/profile
c. 检验Java安装
java -version
2. 创建hadoop用户和组,并在/etc/sudoers中赋予root权限
# groupadd hadoop
# useradd -g hadoop hadoop
# passwd hadoop
# visodu
添加如下:
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
3. 修改主机名
vim /etc/hosts
vim /etc/sysconfig/network
hostname
4. 安装ssh服务并建立ssh互信无密码访问
a. 安装openssh服务
rpm –qa | grep ssh
yum install openssh
b. 生成公钥密钥对
以hadoop用户登录
ssh-keygen -t rsa
看到图形输出,表示密钥生成成功,目录下多出两个文件
私钥文件:id_rsa
公钥文件:id_rsa.pub
c. 将公钥文件id_rsa.pub内容放到authorized_keys文件中:
cat id_rsa.pub >> authorized_keys
d. 将公钥文件authorized_keys分发到各dataNode节点:
e. 验证ssh无密码登陆
5. 关闭防火墙
# service iptables stop
三. hadoop配置部署
1. 下载hadoop
http://mirrors.hust.edu.cn/apache/hadoop/common/
2. 配置文件
解压,tar zxvf hadoop-2.6.0.tar.gz
进入配置文件目录:cd hadoop-2.5.1/etc/hadoop
a. core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://nameNode:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
b. hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>nameNode:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
c. mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>nameNode:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>nameNode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>nameNode:19888</value>
</property>
</configuration>
d. yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>nameNode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>nameNode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>nameNode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>nameNode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>nameNode:8088</value>
</property>
</configuration>
e. slaves
把作为dataNode的机器名写入该文件中
f. 修改JAVA_HOME
分别在文件hadoop-env.sh和yarn-env.sh中添加JAVA_HOME配置
vim hadoop-env.sh
vim yarn-env.sh
g. 配置系统环境变量
export HADOOP_HOME
3. 格式化文件系统
bin/hdfs namenode –format
4. 启动停止服务
启动
./start-dfs.sh
./start-yarn.sh
停止
./stop-yarn.sh
./stop-dfs.sh
5. 验证
- 执行系统命令jps查看java进程
- http://ip:50070/dfshealth.jsp 打开NameNode web界面
- http://ip:19888/jobhistory 打开JobHistory web界面
- http://ip:8088/cluster 打开cluster web界面
四. 安装scala
1. 下载 scala2.11.7 http://www.scala-lang.org/
2. 将下载的 scala2.11.7.tgz 放到/usr/local/ 并解压 tar zxvf scala2.11.7.tgz
3. 配置环境变量:
vim /etc/profile
export SCALA_HOME=/usr/local/scala-2.11.7
export PATH=$PATH:$SCALA_HOME/binsource /etc/profile
4. 检测scala
scala -version
五. spark部署安装
1. 下载 spark1.5 http://mirrors.cnnic.cn/apache/
2. 解压spark-1.5.0-bin-hadoop2.6.tgz
3. 配置环境变量:
vim /etc/profile
export SPARK_HOME=/app/spark-1.5.0-bin-hadoop2.6
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin
source /etc/profile
4.进入到spark 的conf 目录下:
cp spark-env.sh.template spark-env.sh
并在 spark-env.sh 文件后加:
###jdk安装目录
export JAVA_HOME=/usr/local/jdk1.7.0_79
###scala安装目录
export SCALA_HOME=/usr/local/scala-2.11.7
###spark集群的master节点的ip
export SPARK_MASTER_IP=192.168.1.104
#export SPARK_WORKER_CORES=2
#export SPARK_WORKER_MEMORY=4g
#export SPARK_MASTER_IP=spark1
#export SPARK_MASTER_PORT=30111
#export SPARK_MASTER_WEBUI_PORT=30118
#export SPARK_WORKER_CORES=2
#export SPARK_WORKER_MEMORY=4g
#export SPARK_WORKER_PORT=30333
#export SPARK_WORKER_WEBUI_PORT=30119
#export SPARK_WORKER_INSTANCES=1
###指定的worker节点能够最大分配给Excutors的内存大小
export SPARK_WORKER_MEMORY=1g
###hadoop集群的配置文件目录
export HADOOP_CONF_DIR=/usr/local/hadoop26/etc/hadoop
###spark集群的配置文件目录
export SPARK_CONF_DIR=/usr/local/spark-1.4.0-bin-hadoop2.6/conf
#spark 性能调优
export SPARK_DAEMON_JAVA_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
6.修改conf目录下面的slaves文件将worker节点都加进去
7.启动spark:
bin/spark-shell
8.查看spark设置:http://ip:4040
更多的资料,参考:https://spark.apache.org/docs