一、idea运行wordcount
1、下载idea的社区版本(免费的)
http://www.jetbrains.com/idea/download/
2、安装scala插件
File-->Settings...----->Plugins
点击红色按钮,在搜索scala点击右面的初始化按钮安装scala插件
3、创建scala项目
File--->New Project....----->scala
选择好对应的SDK,完成创建
4、创建包类结构如下
/**
* 统计字符出现次数
*/
object WordCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println("Usage: <file>")
System.exit(1)
}
val conf = new SparkConf()
val sc = new SparkContext(conf)
val line = sc.textFile(args(0))
//输出到文件
line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).saveAsTextFile(args(1))
//输出到屏幕
line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
sc.stop()
}
}
引入spark包
File---->Project Structure...---->Libraries
点击 “+” 选择java
5、打包
File-->Project Structure...--->Artifacts
接下来build
Build--->Build Artifaces...--->build
地一次选择build第二次选择rebuild
7、执行
将文件上传至linux上
运行spark环境
命令如下:
[hadoop@master bin]$ ./spark-submit --master spark://192.168.189.136:7077 --class main.scala.com.spark.firstapp.WordCount --executor-memory 1g /opt/testspark/FirstSparkApp2.jar hdfs://master:9000/user/hadoop/input/README.txt hdfs://master:9000/user/hadoop/output
8、结果
(if,1)
(Commerce,,1)
(or,2)
(another,1)
(software.,2)
(laws,,1)
(BEFORE,1)
(source,1)
(Hadoop,,1)
(to,2)
(written,1)
(code,1)
(software,,2)
(Regulations,,1)
(more,2)
(regulations,1)
(see,1)
(of,5)
(libraries,1)
(by,1)
(exception,1)
(Control,1)
(Government,1)
(code.,1)
(eligible,1)
(both,1)
(License,1)
(Foundation,1)
(functions,1)
(and,6)
(software:,1)
(5D002.C.1,,1)
((TSU),1)
(Hadoop,1)
15/03/13 14:34:17 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/03/13 14:34:17 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
输出在屏幕上的结果如上,也可以查看hdfs上的结果文件
二、eclipse下创建scala运行wordcount
1、下载scala ide for eclipse
http://scala-ide.org/download/sdk.html
2、创建scala项目
File-->New-->Scala Project
项目结构如下
与上述代码一致
引入spark的编译包
3、打包
File--->Export--->java-->JAR file
4、运行
上传到linux上
命令一样,结果页一样参考上面的
[hadoop@master bin]$ ./spark-submit --master spark://192.168.189.136:7077 --class main.scala.com.spark.firstapp.WordCount --executor-memory 1g /opt/testspark/FirstSparkApp2.jar hdfs://master:9000/user/hadoop/input/README.txt hdfs://master:9000/user/hadoop/output
三、本地运行SparkPi,idea及eclipse都可
代码如下:
object SparkPi {
def main(args: Array[String]) {
//val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://192.168.189.136:7077").setJars(List("D:\\scala\\sparkjar\\sparktest.jar"))
//val spark = new SparkContext("spark://master:7070", "Spark Pi", "F:\\soft\\spark\\spark-1.1.0-bin-hadoop2.4", List("out\\artifacts\\sparkTest_jar\\sparkTest.jar"))
val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")//主要是这句
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x * x + y * y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}