intellij-idea打包Scala代码在spark中运行

、创建好Maven项目之后（记得添加Scala框架到该项目），修改pom.xml文件，添加如下内容：

<properties>

        <spark.version>2.1.</spark.version>

        <scala.version>2.11</scala.version>

    </properties>

    <dependencies>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-core_${scala.version}</artifactId>

            <version>${spark.version}</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-streaming_${scala.version}</artifactId>

            <version>${spark.version}</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-sql_${scala.version}</artifactId>

            <version>${spark.version}</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-hive_${scala.version}</artifactId>

            <version>${spark.version}</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-mllib_${scala.version}</artifactId>

            <version>${spark.version}</version>

        </dependency>

    </dependencies>

    <build>

        <plugins>

            <plugin>

                <groupId>org.scala-tools</groupId>

                <artifactId>maven-scala-plugin</artifactId>

                <executions>

                    <execution>

                        <goals>

                            <goal>compile</goal>

                            <goal>testCompile</goal>

                        </goals>

                    </execution>

                </executions>

                <configuration>

                    <scalaVersion>${scala.version}</scalaVersion>

                    <args>

                        <arg>-target:jvm-1.5</arg>

                    </args>

                </configuration>

            </plugin>

            <plugin>

                <artifactId>maven-compiler-plugin</artifactId>

                <version>3.6.</version>

                <configuration>

                    <source>1.8</source>

                    <target>1.8</target>

                </configuration>

            </plugin>

            <plugin>

                <groupId>org.apache.maven.plugins</groupId>

                <artifactId>maven-surefire-plugin</artifactId>

                <version>2.19</version>

                <configuration>

                    <skip>true</skip>

                </configuration>

            </plugin>

        </plugins>

    </build>

其中保存之后，需要点击下面的import change，这样相当于是下载jar包

二、编写一个Scala程序，统计单词的个数

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

object WordCount {

  def main(args: Array[String]) {

    if (args.length == ) {

      System.err.println("Usage: spark.example.WordCount <input> <output>")

      System.exit()

    }

    val input_path = args().toString

    val output_path = args().toString

    val conf = new SparkConf().setAppName("WordCount")

    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

    val sc = new SparkContext(conf)

    val inputFile = sc.textFile(input_path)

    val countResult = inputFile.flatMap(line => line.split(" "))

    .map(word => (word, ))

    .reduceByKey(_ + _)

    .map(x => x._1 + "\t" + x._2)

    .saveAsTextFile(output_path)

  }

}

三、打包

file->Porject Structure->Artifacts->绿色的加号->JAR->from modules...

然后填写定义的类名，选择copy to..选项（打包这一个类）

intellij-idea打包Scala代码在spark中运行

点击ok之后，然后build->build Artifacts->build，等待build完成。然后可以在项目的这个目录中找到刚刚打包的这个jar包

intellij-idea打包Scala代码在spark中运行

四、运行在spark集群上面

1. 把jar包放到能访问spark集群的机器上面

2. 运行

/usr/local/spark/bin/spark-submit --class WordCount --master spark://master:7077 /data/wangzai/package/WordCount.jar \

hdfs://master:9000/spark/test.data hdfs://master:9000/spark_output/spark_wordcount \

--executor-memory 1G \

--executor-cores 1 \

--num-executors 10

3. 结果

intellij-idea打包Scala代码在spark中运行

秒客网

intellij-idea打包Scala代码在spark中运行

相关文章