本文介绍spark集群客户机的设置

安装程序

spark集群是standalone集群

在root帐号下，从spark集群的master上复制spark1.5.2的安装包到目录下，这样配置文件就已经复制过来，其实主要是zookeeper的配置。

owner为root，group和other用户都是可以读和运行

drwxr-xr-x  14 root  root  4.0K Nov 16 11:48 spark-1.5.2-bin-hadoop2.6

里面的metoastore.db的写权限需要放开

drwxrwxrwx  5 root root  126 Nov 16 14:15 metastore_db

配置环境变量

现在可以在这台机器上建立帐号，比如dean帐号。然后设置全局的环境变量，在/etc/profile中添加

export JAVA_HOME=/letv/javaexport MASTER=spark://10-149-*-*:7077,10-149-*-*:7077,10-149-*-*:7077export PATH=/letv/spark-1.5.2-bin-hadoop2.6/bin:$PATHexport PATH=$JAVA_HOME/bin:$PATH

特别是MASTER环境变量，能够让用户不需要每次输入--master参数

用dean登录后，运行spark-shell，然后到spark master的UI站点看看

ID: app-20151119144454-0036Name: Spark shellUser: deanCores: Unlimited (168 granted)Executor Memory: 1024.0 MBSubmit Date: Thu Nov 19 14:44:54 CST 2015State: RUNNINGApplication Detail UI

而集群总资源是

Alive Workers: 7Cores in use: 168 Total, 168 UsedMemory in use: 874.5 GB Total, 7.0 GB UsedApplications: 1 Running, 36 CompletedDrivers: 0 Running, 0 CompletedStatus: ALIVE

可以看到cpu核都被占据了，内存只用了1GB

资源管理

spark-shell提供几个资源控制的参数，下面是一个例子：

spark-shell --executor-memory 4G --total-executor-cores 10 --executor-cores 1

--executor-memory 默认1GB 是每个executor占用的内存

--total-executor-cores 所有executor总共使用的cpu核数

--executor-cores 每个executor使用的cpu核数

这样就限制了总cpu核数为10, executor数目为10

仍然想通过环境变量来简化启动命令，可惜不支持，必须传参数。要想想别的招。

改写脚本，就是$@的至修改为资源控制的参数，如下面修改spark-shell.sh

  else    export SPARK_SUBMIT_OPTS    RESOURCE_OPTIONS="--executor-memory 1G --total-executor-cores 10 --executor-cores 1 "    CMD_OPTIONS=$RESOURCE_OPTIONS$@    echo "CMD_OPTIONS: " $CMD_OPTIONS    "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" $CMD_OPTIONS  fi

增加了两个变量

RESOURCE_OPTIONS和CMD_OPTIONS，为了让用户看到发生了什么，用echo将参数打印出来，

比如启动spark-shell的时候：

$ spark-shellCMD_OPTIONS:  --executor-memory 1G --total-executor-cores 10 --executor-cores 1

那么如果需要针对不同的用户设置不同的资源权限呢，把脚本写的复杂点就可以了。这个不再多说。

：）

秒客网

spark-shell客户机设置

安装程序

配置环境变量

资源管理

相关文章