It is very easy to install a Spark cluster (Standalone mode). In my example, I used three machines.
All machines run a OS of ubuntu 12.04 32bit. One machine is named "master", the other two are
named "node01" and "node02" respectively. The name of a machine can be set in: /etc/hostname.
Further more, every nodes (machines) should the same user name.
1. On every node: Install Java and set Java environment in ~/.bashrc as:
#set java environment
export JAVA_HOME=/usr/local/jdk1.7.0_67
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
Note that in my example, I used Java jdk1.7.0_67 and put it under /usr/local.
2. On every node: Install Scala and set corresponding environment variables in ~/.bashrc as:
export SCALA_HOME=/usr/local/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
Note that in my example, I used Scala scala-2.10.4 and put it under /usr/local.
3. On every node: Install Spark.
Download any version of Spark from http://spark.apache.org/downloads.html , in my example, I
chose spark-1.1.0-bin-hadoop2.4.tgz and extract it to /usr/local.
Set in ~/.bashrc:
export SPARK_HOME=/usr/local/spark-1.1.0-bin-hadoop2.4
4. Set up ssh such that every two nodes in the cluster can ssh each other without password. This step
is also needed when you set up a hadoop cluster, there are abundant tutorials on the Internet, so
the details is omitted here.
5. On every node:
$ sudo vim /etc/hosts
and set the IP address of the nodes in the network. For example, I set the hosts file on every node to:
127.0.0.1 localhost
223.3.86.xxx master
223.3.81.xxx node01
223.3.70.xxx node02
6. On master node: Enter the root folder of Spark, and edit con/slaves. In my example:
$ cd /usr/local/spark-1.1.0-bin-hadoop2.4
$ sudo vim conf/slaves
Edit slaves file to:
master
node01
node02
7. On master node: Enter the root folder of Spark and start spark cluster.
$ cd /usr/local/spark-1.1.0-bin-hadoop2.4
$ sbin/start-all.sh
8. Open http://master:8080/ using your web browser to monitoring the cluster.
9. Run Spark examples:
Locally:
$ MASTER=local[4] $SPARK_HOME/bin/run-example SparkLR
On cluster:
$ MASTER=spark://master:7077 $SPARK_HOME/bin/run-example SparkLR
For any questions, feel free to contact me. Email: wuzimian2006@163.com QQ: 726590906