spark-08-spark: local 模式可调试

local运行模式是本地运行模式，在本地配置spark环境。可以再程序中调试，具体操作如下：

首先在本地配置spark环境，可以从官方网站下载。建议下载编译版的，我下载的是

spark-1.3.1-bin-hadoop2.4.tar.gz。然后下载scala，安装之后配置scala的环境变量。解压spark的压缩包之后，打开cmd进入spark的bin 目录，输入spark-shell，既可以进入spark本地模式。

程序提交任务

打开Eclipse，新建项目，导入spark/lib目录下的spark-assembly-1.3.1-hadoop2.4.0.jar。然后写一个程序进行测试，可以再spark/examples目录下找一个例子。本地提交在程序中的设置主要如下：

SparkContext sparkContext = new SparkContext(master, appName, sparkHome, jars)

master：运行模式，我们是本地模式，所以用local，如果需要多个线程执行，可以设置为local[2],表示2个线程
appName：应用的名字
sparkHome：spark本地目录，比如我的目录为d:/spark-1.3.1-bin-hadoop2.4
jars:第三方包

例子如下：

public class SparkPi {
     public static void main(String[] args) {
            SparkContext sparkConf = new SparkContext ("local" , "JavaSparkPi",
                      "D:\\spark-1.3.1-bin-hadoop2.4");
           JavaSparkContext jsc = new JavaSparkContext(sparkConf);

            int slices = (args. length == 1) ? Integer.parseInt(args[0]) : 2;
            int n = 100000 * slices;
           List<Integer> l = new ArrayList<Integer>(n);
            for ( int i = 0; i < n; i++) {
                l.add(i);
           }

           JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

            int count = dataSet.map( new Function<Integer, Integer>() {
                 @Override
                 public Integer call(Integer integer) {
                      double x = Math. random() * 2 - 1;
                      double y = Math. random() * 2 - 1;
                      return (x * x + y * y < 1) ? 1 : 0;
                }
           }).reduce( new Function2<Integer, Integer, Integer>() {
                 @Override
                 public Integer call(Integer integer, Integer integer2) {
                      return integer + integer2;
                }
           });

           System. out.println( "Pi is roughly " + 4.0 * count / n);

           jsc.stop();

     }
}

秒客网

spark-08-spark: local 模式可调试

相关文章

spark-08-spark: local 模式 可调试

相关文章

spark-08-spark: local 模式可调试