更多代码请见:https://github.com/xubo245/SparkLearning
eclipse安装scala和spark编译环境并上传到集群运行
本地环境:Window+eclipse4.3.2+scala.2.10.5+JDK1.7
1.scala安装,JDK安装简单,请自查
2.eclipse安装:http://www.eclipse.org/downloads/packages/release/Kepler/SR2
如果安装eclipse 4.5会装不上插件
3.eclipse安装插件
help->install new software
在http://scala-ide.org/download/prev-stable.html中找到http://download.scala-ide.org/sdk/helium/e37/scala210/stable/site输入,然后等待安装,不细讲
提醒:由于网络问题,可能需要install多次,前几次都会失败,多试几次,有时候可能需要五六次。。。
4.安装好后就可以new scala project
5.再导入spark的spark-assembly-1.5.2-hadoop2.6.0.jar
6.本地运行:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// scalastyle:off println
package test1
import scala.math.random
import org.apache.spark._
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi ").setMaster("local")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
println("slices:\n"+slices)
println("args.length:\n"+args.length)
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
// scalastyle:on println
运行结果:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/03 17:55:12 INFO SparkContext: Running Spark version 1.5.2
16/03/03 17:55:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/03 17:55:13 INFO SecurityManager: Changing view acls to: xubo
16/03/03 17:55:13 INFO SecurityManager: Changing modify acls to: xubo
16/03/03 17:55:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xubo); users with modify permissions: Set(xubo)
16/03/03 17:55:15 INFO Slf4jLogger: Slf4jLogger started
16/03/03 17:55:15 INFO Remoting: Starting remoting
16/03/03 17:55:15 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@202.38.84.241:50812]
16/03/03 17:55:15 INFO Utils: Successfully started service 'sparkDriver' on port 50812.
16/03/03 17:55:15 INFO SparkEnv: Registering MapOutputTracker
16/03/03 17:55:15 INFO SparkEnv: Registering BlockManagerMaster
16/03/03 17:55:15 INFO DiskBlockManager: Created local directory at C:\Users\xubo\AppData\Local\Temp\blockmgr-caa750e6-8702-4649-a5e8-2ba73598a383
16/03/03 17:55:15 INFO MemoryStore: MemoryStore started with capacity 730.6 MB
16/03/03 17:55:15 INFO HttpFileServer: HTTP File server directory is C:\Users\xubo\AppData\Local\Temp\spark-77137efd-98f7-465d-a2a1-da56af107107\httpd-40ccad09-750c-4574-b019-47a2a77b003c
16/03/03 17:55:15 INFO HttpServer: Starting HTTP Server
16/03/03 17:55:15 INFO Utils: Successfully started service 'HTTP file server' on port 50813.
16/03/03 17:55:15 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/03 17:55:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/03 17:55:16 INFO SparkUI: Started SparkUI at http://202.38.84.241:4040
16/03/03 17:55:16 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/03/03 17:55:16 INFO Executor: Starting executor ID driver on host localhost
16/03/03 17:55:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50820.
16/03/03 17:55:16 INFO NettyBlockTransferService: Server created on 50820
16/03/03 17:55:16 INFO BlockManagerMaster: Trying to register BlockManager
16/03/03 17:55:16 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50820 with 730.6 MB RAM, BlockManagerId(driver, localhost, 50820)
16/03/03 17:55:16 INFO BlockManagerMaster: Registered BlockManager
slices:
2
args.length:
0
16/03/03 17:55:17 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
16/03/03 17:55:17 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
16/03/03 17:55:17 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:38)
16/03/03 17:55:17 INFO DAGScheduler: Parents of final stage: List()
16/03/03 17:55:17 INFO DAGScheduler: Missing parents: List()
16/03/03 17:55:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
16/03/03 17:55:17 INFO MemoryStore: ensureFreeSpace(1848) called with curMem=0, maxMem=766075207
16/03/03 17:55:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1848.0 B, free 730.6 MB)
16/03/03 17:55:17 INFO MemoryStore: ensureFreeSpace(1195) called with curMem=1848, maxMem=766075207
16/03/03 17:55:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1195.0 B, free 730.6 MB)
16/03/03 17:55:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50820 (size: 1195.0 B, free: 730.6 MB)
16/03/03 17:55:17 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861
16/03/03 17:55:17 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
16/03/03 17:55:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/03/03 17:55:17 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2085 bytes)
16/03/03 17:55:17 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/03/03 17:55:17 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1031 bytes result sent to driver
16/03/03 17:55:17 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2085 bytes)
16/03/03 17:55:17 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/03/03 17:55:17 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 154 ms on localhost (1/2)
16/03/03 17:55:17 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1031 bytes result sent to driver
16/03/03 17:55:17 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 46 ms on localhost (2/2)
16/03/03 17:55:17 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.203 s
16/03/03 17:55:17 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/03 17:55:17 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.517522 s
Pi is roughly 3.14172
16/03/03 17:55:17 INFO SparkUI: Stopped Spark web UI at http://202.38.84.241:4040
16/03/03 17:55:17 INFO DAGScheduler: Stopping DAGScheduler
16/03/03 17:55:17 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/03/03 17:55:17 INFO MemoryStore: MemoryStore cleared
16/03/03 17:55:17 INFO BlockManager: BlockManager stopped
16/03/03 17:55:17 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/03 17:55:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/03/03 17:55:17 INFO SparkContext: Successfully stopped SparkContext
16/03/03 17:55:17 INFO ShutdownHookManager: Shutdown hook called
16/03/03 17:55:17 INFO ShutdownHookManager: Deleting directory C:\Users\xubo\AppData\Local\Temp\spark-77137efd-98f7-465d-a2a1-da56af107107
7.上传到集群, 请参考:http://blog.csdn.net/xubo245/article/details/50590065