文章目录
- refs
- 一、错误日志
- 二、日志详情
- 三、运行脚本以及代码
- 3.1、运行脚本
- 3.2、代码片段
- 四、问题排查及原因
- 五、注意事项
refs
- /-----/questions/47157793/spark-runs-in-local-but-cant-find-file-when-running-in-yarn
- /-----/questions/44231261/spark-yarn-file-does-not-exist-on-hdfs
一、错误日志
问题表现在:
1、在本地模式下可以运行正常
2、在集群模式中则会报下面错误
: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
二、日志详情
... ...
19/10/22 12:19:17 INFO :
client token: N/A
diagnostics: Application application_1569084228812_0100 failed 2 times due to AM Container for appattempt_1569084228812_0100_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hadoop-node-master:8088/proxy/application_1569084228812_0100/Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
at $(:1122)
at $(:1114)
at (:81)
at (:1114)
at (:251)
at $000(:61)
at $(:359)
at $(:357)
at (Native Method)
at (:422)
at (:1692)
at (:356)
at (:60)
at (:266)
at $(:511)
at (:266)
at (:1149)
at $(:624)
at (:748)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1571717942046
final status: FAILED
tracking URL: http://hadoop-node-master:8088/cluster/app/application_1569084228812_0100
user: root
Exception in thread "main" : Application application_1569084228812_0100 finished with failed status
at (:1132)
at $.main(:1178)
at ()
at .invoke0(Native Method)
at (:62)
at (:43)
at (:498)
at $.org$apache$spark$deploy$SparkSubmit$$runMain(:736)
at $.doRunMain$1(:185)
at $.submit(:210)
at $.main(:124)
at ()
19/10/22 12:19:17 INFO : Shutdown hook called
19/10/22 12:19:17 INFO : Deleting directory /data/server/spark-2.0.2-bin-hadoop2.6/spark-d85fba2d-cb14-4c09-9edc-ca1dc3b4106d
... ...
三、运行脚本以及代码
3.1、运行脚本
#!/bin/sh
hadoop fs -rm -r hdfs://hadoop-node-master:9000/output_cf
/data/server/spark/bin/spark-submit \
--master yarn-cluster \
--num-executors 2 \
--executor-memory 1g \
--executor-cores 2 \
--class . ./scalatest1008-1. \
hdfs://hadoop-node-master:9000/music_uis.data \
hdfs://hadoop-node-master:9000/output_cf
3.2、代码片段
object cf {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
// ("local[2]") // 如果要在集群模式中运行,则需要将些行注释
("CF Spark")
val sc = new SparkContext(conf)
val lines = (args(0))
val output_path = args(1).toString
}
}
四、问题排查及原因
代码中的模式和实际运行的模式不匹配,因为代码中是要在本地运行的,所以不会把文件分发至集群中的。
五、注意事项
还是要注意一些细节问题啊
梳理下Scala在Spark运行的步骤
- 1、在idea中编码
- 2、使用mvn clean, mvn install打包
- 3、拷贝打好的包至远程集群,或者直接在有hadoop, spark-commit的环境中运行
- 4、运行模式要与代码中的模式相匹配