在ubuntu上搭建spark,记录一下
环境:ubuntu 16.04
spark-2.3.1-bin-hadoop2.7.tgz
http://spark.apache.org/docs/latest/quick-start.html
使用已有的hadoop用户登录
1.安装JDK,配置Java环境;
2.安装Scala
注意Scala和Spark的版本对应关系
sudo apt install scala
添加如下内容到~/.bashrc中
export SCALA_HOME=/usr/share/scala-2.11
export PATH=$PATH:${SCALA_HOME}/bin
生效
source ~/.bashrc
3.下载spark
wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
tar zxvf spark-2.3.1-bin-hadoop2.7.tgz
sudo mkdir /usr/local/spark
mv spark-2.3.1-bin-hadoop2.7/* /usr/local/spark
为hadoop用户赋予spark目录权限
sudo chown -hR hadoop /usr/local/spark
4.运行测试
使用自带的Python Shell测试:
cd /usr/local/spark/bin
./pyspark
lines = sc.textFile("/usr/local/spark/README.md")
lines.count() 字数统计
lines.first()
使用自带的Spark Shell测试:
./bin/spark-shell
查看当前节点运行情况
cd sbin
./start-all.sh
在浏览器中输入http://localhost:8080
参考: