在spark-streaming程序中需要配置文件中的数据来完成某项统计时,需要把配置文件打到工程里,maven的配置如下:
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <skip>true</skip> </configuration> </plugin> </plugins> <resources> <resource> <directory>src/main/resources</directory> <includes> <include>**/*.txt</include> <include>*.txt</include> </includes> <filtering>true</filtering> </resource> </resources> </build>
这样在local模式下运行时没问题的,但是要放在yarn集群上就会出问题,需要用如下方式来调用:
spark-submit --class com.kingsoft.server.KssNodeStreaming --master yarn-cluster --driver-memory 2G --executor-memory 5G --num-executors 10 --jars /home/hadoop/spark-streaming-flume_2.10-1.0.1.jar,/home/hadoop/avro-ipc-1.7.5-cdh5.1.0.jar,/home/hadoop/flume-ng-sdk-1.5.0.1.jar,/home/hadoop/fastjson-1.1.41.jar --files /home/hadoop/idc_ip.txt,/home/hadoop/ipdata.txt /home/hadoop/SparkStreaming-0.0.1-SNAPSHOT.jar 0.0.0.0 58006
所以以后就直接用这种方式吧