Spark Standalone Mode

It is very easy to install a Spark cluster (Standalone mode). In my example, I used three machines.

All machines run a OS of ubuntu 12.04 32bit. One machine is named "master", the other two are

named "node01" and "node02" respectively. The name of a machine can be set in: /etc/hostname.

Further more, every nodes (machines) should the same user name.

1. On every node: Install Java and set Java environment in ~/.bashrc as:

　　#set java environment

　　export JAVA_HOME=/usr/local/jdk1.7.0_67

　　export JRE_HOME=$JAVA_HOME/jre

　　export PATH=$JAVA_HOME/bin:$PATH

　　export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib

Note that in my example, I used Java jdk1.7.0_67 and put it under /usr/local.

2. On every node: Install Scala and set corresponding environment variables in ~/.bashrc as:

export SCALA_HOME=/usr/local/scala-2.10.4

export PATH=$SCALA_HOME/bin:$PATH

Note that in my example, I used Scala scala-2.10.4 and put it under /usr/local.

3. On every node: Install Spark.

Download any version of Spark from http://spark.apache.org/downloads.html , in my example, I

chose spark-1.1.0-bin-hadoop2.4.tgz and extract it to /usr/local.

Set in ~/.bashrc:

export SPARK_HOME=/usr/local/spark-1.1.0-bin-hadoop2.4

4. Set up ssh such that every two nodes in the cluster can ssh each other without password. This step

is also needed when you set up a hadoop cluster, there are abundant tutorials on the Internet, so

the details is omitted here.

5. On every node:

　　$ sudo vim /etc/hosts

and set the IP address of the nodes in the network. For example, I set the hosts file on every node to:

　　127.0.0.1 localhost

　　223.3.86.xxx master

　　223.3.81.xxx node01

　　223.3.70.xxx node02

6. On master node: Enter the root folder of Spark, and edit con/slaves. In my example:

　　$ cd /usr/local/spark-1.1.0-bin-hadoop2.4

　　$ sudo vim conf/slaves

Edit slaves file to:

　　master

　　node01

　　node02

7. On master node: Enter the root folder of Spark and start spark cluster.

　　$ cd /usr/local/spark-1.1.0-bin-hadoop2.4

　　$ sbin/start-all.sh

8. Open http://master:8080/ using your web browser to monitoring the cluster.

9. Run Spark examples:

Locally:

$ MASTER=local[4] $SPARK_HOME/bin/run-example SparkLR

On cluster:

$ MASTER=spark://master:7077 $SPARK_HOME/bin/run-example SparkLR

For any questions, feel free to contact me. Email: wuzimian2006@163.com QQ: 726590906

秒客网

Spark Standalone Mode

相关文章