前言
本文章部分内容翻译自:
http://spark.apache.org/docs/latest/submitting-applications.html
应用提交
Spark的bin目录中的spark-submit脚本用于在集群上启动应用程序。它可以通过统一的界面使用Spark支持的所有集群管理器,因此您不必为每个集群管理器配置应用程序。
捆绑应用程序的依赖关系
如果您的代码依赖于其他项目,则需要将它们与应用程序一起打包,以便将代码分发到Spark集群。为此,请创建包含代码及其依赖项的程序集jar(或“uber”jar)。sbt和Maven都有汇编插件。在创建程序集jar时,将Spark和Hadoop列为提供的依赖项;这些不需要捆绑,因为它们是由集群管理器在运行时提供的。一旦你有了一个组装的jar,你可以在传递你的jar时调用bin/spark-submit脚本。对于Python,您可以使用spark-submit的--py-files参数添加.py,.zip或.egg文件,以便与您的应用程序一起分发。如果您依赖多个Python文件,我们建议将它们打包成.zip或.egg。
使用spark-submit启动应用程序
捆绑用户应用程序后,可以使用bin/spark-submit脚本启动它。此脚本负责使用Spark及其依赖项设置类路径,并且可以支持Spark支持的不同集群管理器和部署模式:
./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
上述一些常用的选项分别是:
--class:应用程序的入口点(例如org.apache.spark.examples.SparkPi)
--master:集群的主URL(例如spark://23.195.26.187:7077)
--deploy-mode:是在工作节点(集群)上部署驱动程序还是在本地部署为外部客户端(客户端)(默认值:客户端)。
--conf:key = value格式的任意Spark配置属性。对于包含空格的值,用引号括起“key = value”(如图所示)。
application-jar:包含应用程序和所有依赖项的捆绑jar的路径。URL必须在群集内部全局可见,例如,hdfs://path或所有节点上都存在的file://path。
application-arguments:传递给主类的main方法的参数(如果有的话)。
常见的部署策略是从与您的工作机器物理位于同一位置的网关机器(例如,独立EC2集群中的主节点)提交您的应用程序。在此设置中,客户端模式是合适的。在客户端模式下,驱动程序直接在spark-submit进程中启动,该进程充当群集的客户端。应用程序的输入和输出附加到控制台。因此,该模式特别适用于涉及REPL的应用程序(例如Spark shell)。
或者,如果您的应用程序是从远离工作机器的计算机提交的(例如,在笔记本电脑上本地提交),则通常使用群集模式来最小化驱动程序和执行程序之间的网络延迟。目前,独立模式不支持Python应用程序的集群模式。
对于Python应用程序,只需传递一个.py文件代替<application-jar>而不是JAR,并使用--py-files将Python的.zip,.egg或.py文件添加到搜索路径中。
有一些特定于正在使用的集群管理器的选项。例如,对于具有集群部署模式的Spark独立集群,您还可以指定--supervise以确保驱动程序在失败且退出代码为非零时自动重新启动。要枚举所有可用于spark-submit的选项,请使用--help运行它。
各种模式运行spark任务
local模式
[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit ---.jar -- :: WARN NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classepplicable -- :: INFO SparkContext: - Running Spark version -- :: INFO SparkContext: - Submitted application: Spark Pi -- :: INFO SecurityManager: - Changing view acls to: root -- :: INFO SecurityManager: - Changing modify acls to: root -- :: INFO SecurityManager: - Changing view acls groups to: -- :: INFO SecurityManager: - Changing modify acls groups to: -- :: INFO SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users with view permiss(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -- :: INFO Utils: - Successfully started service . -- :: INFO SparkEnv: - Registering MapOutputTracker -- :: INFO SparkEnv: - Registering BlockManagerMaster -- :: INFO BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion -- :: INFO BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up -- :: INFO DiskBlockManager: - Created local directory at /tmp/blockmgr-4ddfef66---b029-05332cfa70a9 -- :: INFO MemoryStore: - MemoryStore started with capacity 413.9 MB -- :: INFO SparkEnv: - Registering OutputCommitCoordinator -- :: INFO log: - Logging initialized @9713ms -- :: INFO Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown -- :: INFO Server: - Started @9891ms -- :: INFO AbstractConnector: - Started ServerConnector@40e4ea87{HTTP/} -- :: INFO Utils: - Successfully started service . -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@1a38ba58{/jobs,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@24b52d3e{/jobs/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@15deb1dc{/jobs/job,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@5af5def9{/stages,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3a45c42a{/stages/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages/stage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@27a0a5a2{/stages/stage/json,null,AVAILABLE,@Sp -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@7692cd34{/stages/pool,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/pool/json,null,AVAILABLE,@Spar -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@32c0915e{/storage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@106faf11{/storage/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@70f43b45{/storage/rdd,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/rdd/json,null,AVAILABLE,@Spa -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@10ad20cb{/environment,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@7dd712e8{/environment/json,null,AVAILABLE,@Spa -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@2c282004{/executors,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@22ee2d0{/executors/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors/threadDump,null,AVAILABLE, -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/threadDump/json,null,AVAILrk} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@53bc1328{/static,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@e041f0c{/,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@6a175569{/api,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@4102b1b1{/jobs/job/kill,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/kill,null,AVAILABLE,@Sp -- :: INFO SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 -- :: INFO SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-2.4 spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758442663 -- :: INFO Executor: - Starting executor ID driver on host localhost -- :: INFO Utils: - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por -- :: INFO NettyBlockTransferService: - Server created on hadoop1.org.cn: -- :: INFO BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic -- :: INFO BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None) -- :: INFO BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@61a91912{/metrics/json,null,AVAILABLE,@Spark} -- :: INFO SparkContext: - Starting job: reduce at SparkPi.scala: -- :: INFO DAGScheduler: - Got job (reduce at SparkPi.scala:) with output partitions -- :: INFO DAGScheduler: - Final stage: ResultStage (reduce at SparkPi.scala:) -- :: INFO DAGScheduler: - Parents of final stage: List() -- :: INFO DAGScheduler: - Missing parents: List() -- :: INFO DAGScheduler: - Submitting ResultStage (MapPartitionsRDD[] at map at SparkPi.scala:), which has noparents -- :: INFO MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) -- :: INFO MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: -- :: INFO SparkContext: - Created broadcast -- :: INFO DAGScheduler: - Submitting missing tasks (MapPartitionsRDD[] at map at SparkPi.scfirst tasks are , )) -- :: INFO TaskSchedulerImpl: - Adding task tasks -- :: INFO TaskSetManager: - Starting task , localhost, executor driver, partition , PROCE bytes) -- :: INFO Executor: - Running task ) -- :: INFO Executor: - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1553 -- :: INFO TransportClientFactory: - Successfully created connection to hadoop1.org.cn/ afte( ms spent in bootstraps) -- :: INFO Utils: - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar to /tmp/spark-e9e2c8bda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-8422-2b637ce7a1fb/fetchFileTemp701584033085131304.tmp -- :: INFO Executor: - Adding file:/tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-84e7a1fb/spark-examples_2.-.jar to class loader -- :: INFO Executor: - Finished task ). bytes result sent to driver -- :: INFO TaskSetManager: - Starting task , localhost, executor driver, partition , PROCE bytes) -- :: INFO Executor: - Running task ) -- :: INFO Executor: - Finished task ). bytes result sent to driver -- :: INFO TaskSetManager: - Finished task ) ms on localhost (executor driver) (/ -- :: INFO TaskSetManager: - Finished task ) ms on localhost (executor driver) (/) -- :: INFO TaskSchedulerImpl: - Removed TaskSet 0.0, whose tasks have all completed, from pool -- :: INFO DAGScheduler: - ResultStage (reduce at SparkPi.scala:) finished in 3.577 s -- :: INFO DAGScheduler: - Job finished: reduce at SparkPi.scala:, took 4.506013 s Pi is roughly 3.142475712378562 -- :: INFO AbstractConnector: - Stopped Spark@40e4ea87{HTTP/} -- :: INFO SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040 -- :: INFO MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped! -- :: INFO MemoryStore: - MemoryStore cleared -- :: INFO BlockManager: - BlockManager stopped -- :: INFO BlockManagerMaster: - BlockManagerMaster stopped -- :: INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped! -- :: INFO SparkContext: - Successfully stopped SparkContext -- :: INFO ShutdownHookManager: - Shutdown hook called -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-3f8eab55-786c--9f99-ca779610ee0d -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9
standalone模式
[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.217 examples/jars/spark-examples_2.11-2.4.0.jar -- :: WARN NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classepplicable -- :: INFO SparkContext: - Running Spark version -- :: INFO SparkContext: - Submitted application: Spark Pi -- :: INFO SecurityManager: - Changing view acls to: root -- :: INFO SecurityManager: - Changing modify acls to: root -- :: INFO SecurityManager: - Changing view acls groups to: -- :: INFO SecurityManager: - Changing modify acls groups to: -- :: INFO SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users with view permiss(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -- :: INFO Utils: - Successfully started service . -- :: INFO SparkEnv: - Registering MapOutputTracker -- :: INFO SparkEnv: - Registering BlockManagerMaster -- :: INFO BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion -- :: INFO BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up -- :: INFO DiskBlockManager: - Created local directory at /tmp/blockmgr--9ee3---79c2902a5a4d -- :: INFO MemoryStore: - MemoryStore started with capacity 413.9 MB -- :: INFO SparkEnv: - Registering OutputCommitCoordinator -- :: INFO log: - Logging initialized @9124ms -- :: INFO Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown -- :: INFO Server: - Started @9269ms -- :: INFO AbstractConnector: - Started ServerConnector@3a7b503d{HTTP/} -- :: INFO Utils: - Successfully started service . -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@6058e535{/jobs,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@6e9c413e{/jobs/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3a45c42a{/jobs/job/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@47a64f7d{/stages/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@33d05366{/stages/stage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/stage/json,null,AVAILABLE,@Spa -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@32c0915e{/stages/pool,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@106faf11{/stages/pool/json,null,AVAILABLE,@Spa -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@70f43b45{/storage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@10ad20cb{/storage/rdd,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@7dd712e8{/storage/rdd/json,null,AVAILABLE,@Spa -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@2c282004{/environment,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@22ee2d0{/environment/json,null,AVAILABLE,@Spar -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/json,null,AVAILABLE,@Spark -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@53bc1328{/executors/threadDump,null,AVAILABLE, -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@26f143ed{/executors/threadDump/json,null,AVAILrk} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3c1e3314{/static,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@{/,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3f3c966c{/api,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3a71c100{/jobs/job/kill,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/stage/kill,null,AVAILABLE,@Sp -- :: INFO SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 -- :: INFO SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-2.4 spark://hadoop1.org.cn:40178/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758781088 -- :: INFO StandaloneAppClient$ClientEndpoint: - Connecting to master spark://192.168.217.201:7077... -- :: INFO TransportClientFactory: - Successfully created connection to / after ms ( ms sootstraps) -- :: INFO StandaloneSchedulerBackend: - Connected to Spark cluster with app ID app-- -- :: INFO Utils: - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por -- :: INFO NettyBlockTransferService: - Server created on hadoop1.org.cn: -- :: INFO BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with core(s) -- :: INFO StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .202th core(s), 1024.0 MB RAM -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with core(s) -- :: INFO StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .203th core(s), 1024.0 MB RAM -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with core(s) -- :: INFO StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .201th core(s), 1024.0 MB RAM -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING -- :: INFO BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None) -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING -- :: INFO BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@726a17c4{/metrics/json,null,AVAILABLE,@Spark} -- :: INFO StandaloneSchedulerBackend: - SchedulerBackend is ready for scheduling beginning after reached minRegisurcesRatio: 0.0 -- :: INFO CoarseGrainedSchedulerBackend$DriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.202:38810) with ID 0 -- :: INFO CoarseGrainedSchedulerBackend$DriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.203:47346) with ID 1 -- :: INFO BlockManagerMasterEndpoint: - Registering block manager with , None) -- :: INFO BlockManagerMasterEndpoint: - Registering block manager with , None) -- :: INFO SparkContext: - Starting job: reduce at SparkPi.scala: -- :: INFO DAGScheduler: - Got job (reduce at SparkPi.scala:) with output partitions -- :: INFO DAGScheduler: - Final stage: ResultStage (reduce at SparkPi.scala:) -- :: INFO DAGScheduler: - Parents of final stage: List() -- :: INFO DAGScheduler: - Missing parents: List() -- :: INFO DAGScheduler: - Submitting ResultStage (MapPartitionsRDD[] at map at SparkPi.scala:), which has noparents -- :: INFO MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) -- :: INFO MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: -- :: INFO SparkContext: - Created broadcast -- :: INFO DAGScheduler: - Submitting missing tasks (MapPartitionsRDD[] at map at SparkPi.scfirst tasks are , )) -- :: INFO TaskSchedulerImpl: - Adding task tasks -- :: INFO TaskSetManager: - Starting task , , partition , PROC, bytes) -- :: INFO TaskSetManager: - Starting task , , partition , PROC, bytes) -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: 1256.0 B, free: -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: 1256.0 B, free: -- :: INFO TaskSetManager: - Finished task ) ms on ) (/ -- :: INFO TaskSetManager: - Finished task ) ms on ) (/ -- :: INFO DAGScheduler: - ResultStage (reduce at SparkPi.scala:) finished in 16.998 s -- :: INFO TaskSchedulerImpl: - Removed TaskSet 0.0, whose tasks have all completed, from pool -- :: INFO DAGScheduler: - Job finished: reduce at SparkPi.scala:, took 20.610491 s Pi is roughly 3.1427357136785683 -- :: INFO AbstractConnector: - Stopped Spark@3a7b503d{HTTP/} -- :: INFO SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040 -- :: INFO StandaloneSchedulerBackend: - Shutting down all executors -- :: INFO CoarseGrainedSchedulerBackend$DriverEndpoint: - Asking each executor to shut down -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ ) -- :: INFO StandaloneSchedulerBackend: - Executor app--/ removed: Command exited with code -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with core(s) -- :: INFO StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .202th core(s), 1024.0 MB RAM -- :: INFO StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING -- :: INFO MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped! -- :: INFO MemoryStore: - MemoryStore cleared -- :: INFO BlockManager: - BlockManager stopped -- :: INFO BlockManagerMaster: - BlockManagerMaster stopped -- :: INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped! -- :: INFO SparkContext: - Successfully stopped SparkContext -- :: INFO ShutdownHookManager: - Shutdown hook called -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-e9155121-edfe-4f64-b917-be3f9f62220a -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-90e9b57b-c3d4--8eec-906f121d6b98 [root@hadoop1 spark--bin-hadoop2.]#
yarn-cluster模式
所谓的yarn集群模式,就是讲spark任务提交给yarn,让yarn去执行相关的任务,因此需要在spark-env.sh文件中添加export HADOOP_CONF_DIR=/usr/hdp/hadoop-2.8.3/etc/hadoop,然后去执行相关的任务:
[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.-.jar -- :: WARN NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -- :: INFO RMProxy: - Connecting to ResourceManager at hadoop1/ -- :: INFO Client: - Requesting a NodeManagers -- :: INFO Client: - Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container) -- :: INFO Client: - Will allocate AM container, with MB memory including MB overhead -- :: INFO Client: - Setting up container launch context for our AM -- :: INFO Client: - Setting up the launch environment for our AM container -- :: INFO Client: - Preparing resources for our AM container -- :: WARN Client: - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. -- :: INFO Client: - Uploading resource file:/tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb/__spark_libs__7092311691544510332.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_libs__7092311691544510332.zip -- :: INFO Client: - Uploading resource file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-.jar -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/spark-examples_2.11-2.4.0.jar -- :: INFO Client: - Uploading resource file:/tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb/__spark_conf__4473735302996115715.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_conf__.zip -- :: INFO SecurityManager: - Changing view acls to: root -- :: INFO SecurityManager: - Changing modify acls to: root -- :: INFO SecurityManager: - Changing view acls groups to: -- :: INFO SecurityManager: - Changing modify acls groups to: -- :: INFO SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -- :: INFO Client: - Submitting application application_1550757972410_0001 to ResourceManager -- :: INFO YarnClientImpl: - Submitted application application_1550757972410_0001 -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - client token: N/A diagnostics: [星期五 二月 :: + ] Scheduler has assigned a container for AM, waiting for AM container to be launched ApplicationMaster host: N/A ApplicationMaster RPC port: - queue: default start time: final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - client token: N/A diagnostics: N/A ApplicationMaster host: hadoop2.org.cn ApplicationMaster RPC port: queue: default start time: final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: RUNNING) -- :: INFO Client: - Application report for application_1550757972410_0001 (state: FINISHED) -- :: INFO Client: - client token: N/A diagnostics: N/A ApplicationMaster host: hadoop2.org.cn ApplicationMaster RPC port: queue: default start time: final status: SUCCEEDED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/ user: root -- :: INFO ShutdownHookManager: - Shutdown hook called -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-afb05931-e273-4e9f-b38a-d3ca234dfb34 -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb [root@hadoop1 spark--bin-hadoop2.]#
其他在python以及kubernets的相关运行程序就不在赘述了。
yarn-client模式
[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit ---.jar Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. -- :: WARN NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -- :: INFO SparkContext: - Running Spark version -- :: INFO SparkContext: - Submitted application: Spark Pi -- :: INFO SecurityManager: - Changing view acls to: root -- :: INFO SecurityManager: - Changing modify acls to: root -- :: INFO SecurityManager: - Changing view acls groups to: -- :: INFO SecurityManager: - Changing modify acls groups to: -- :: INFO SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -- :: INFO Utils: - Successfully started service . -- :: INFO SparkEnv: - Registering MapOutputTracker -- :: INFO SparkEnv: - Registering BlockManagerMaster -- :: INFO BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information -- :: INFO BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up -- :: INFO DiskBlockManager: - Created local directory at /tmp/blockmgr-f9baa979-e964-46a9-b034-475ba5148562 -- :: INFO MemoryStore: - MemoryStore started with capacity 413.9 MB -- :: INFO SparkEnv: - Registering OutputCommitCoordinator -- :: INFO log: - Logging initialized @9175ms -- :: INFO Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown -- :: INFO Server: - Started @9359ms -- :: INFO AbstractConnector: - Started ServerConnector@47a64f7d{HTTP/} -- :: INFO Utils: - Successfully started service . -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@49ef32e0{/jobs,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3be8821f{/jobs/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@64b31700{/jobs/job,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@bae47a0{/jobs/job/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@74a9c4b0{/stages,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@85ec632{/stages/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@1c05a54d{/stages/stage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@214894fc{/stages/stage/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@{/stages/pool,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@e362c57{/stages/pool/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@1c4ee95c{/storage,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@79c4715d{/storage/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@5aa360ea{/storage/rdd,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@6548bb7d{/storage/rdd/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@e27ba81{/environment,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@54336c81{/environment/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@1556f2dd{/executors,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@35e52059{/executors/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@62577d6{/executors/threadDump,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@49bd54f7{/executors/threadDump/json,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@6b5f8707{/static,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@17ae98d7{/,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@59221b97{/api,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@704b2127{/jobs/job/kill,null,AVAILABLE,@Spark} -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@3ee39da0{/stages/stage/kill,null,AVAILABLE,@Spark} -- :: INFO SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040 -- :: INFO SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-.jar at spark://hadoop1.org.cn:34169/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550766679506 -- :: INFO RMProxy: - Connecting to ResourceManager at hadoop1/ -- :: INFO Client: - Requesting a NodeManagers -- :: INFO Client: - Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container) -- :: INFO Client: - Will allocate AM container, with MB memory including MB overhead -- :: INFO Client: - Setting up container launch context for our AM -- :: INFO Client: - Setting up the launch environment for our AM container -- :: INFO Client: - Preparing resources for our AM container -- :: WARN Client: - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. -- :: INFO Client: - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_libs__4384247224971462772.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_libs__4384247224971462772.zip -- :: INFO Client: - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_conf__7312670304741942310.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_conf__.zip -- :: INFO SecurityManager: - Changing view acls to: root -- :: INFO SecurityManager: - Changing modify acls to: root -- :: INFO SecurityManager: - Changing view acls groups to: -- :: INFO SecurityManager: - Changing modify acls groups to: -- :: INFO SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -- :: INFO Client: - Submitting application application_1550757972410_0006 to ResourceManager -- :: INFO YarnClientImpl: - Submitted application application_1550757972410_0006 -- :: INFO SchedulerExtensionServices: - Starting Yarn extension services with app application_1550757972410_0006 and attemptId None -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: - queue: default start time: final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/ user: root -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: ACCEPTED) -- :: INFO Client: - Application report for application_1550757972410_0006 (state: RUNNING) -- :: INFO Client: - client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.217.203 ApplicationMaster RPC port: - queue: default start time: final status: UNDEFINED tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/ user: root -- :: INFO YarnClientSchedulerBackend: - Application application_1550757972410_0006 has started running. -- :: INFO Utils: - Successfully started service . -- :: INFO NettyBlockTransferService: - Server created on hadoop1.org.cn: -- :: INFO BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy -- :: INFO BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None) -- :: INFO BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None) -- :: INFO ContextHandler: - Started o.s.j.s.ServletContextHandler@1788cb61{/metrics/json,null,AVAILABLE,@Spark} -- :: INFO YarnClientSchedulerBackend: - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1550757972410_0006), /proxy/application_1550757972410_0006 -- :: INFO JettyUtils: - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill, /metrics/json. -- :: INFO YarnClientSchedulerBackend: - SchedulerBackend (ms) -- :: INFO YarnSchedulerBackend$YarnSchedulerEndpoint: - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) -- :: INFO SparkContext: - Starting job: reduce at SparkPi.scala: -- :: INFO DAGScheduler: - Got job (reduce at SparkPi.scala:) with output partitions -- :: INFO DAGScheduler: - Final stage: ResultStage (reduce at SparkPi.scala:) -- :: INFO DAGScheduler: - Parents of final stage: List() -- :: INFO DAGScheduler: - Missing parents: List() -- :: INFO DAGScheduler: - Submitting ResultStage (MapPartitionsRDD[] at map at SparkPi.scala:), which has no missing parents -- :: INFO MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB) -- :: INFO MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB) -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: 1256.0 B, free: 413.9 MB) -- :: INFO SparkContext: - Created broadcast -- :: INFO DAGScheduler: - Submitting missing tasks (MapPartitionsRDD[] at map at SparkPi.scala:) (first tasks are , )) -- :: INFO YarnScheduler: - Adding task tasks -- :: WARN YarnScheduler: - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources -- :: INFO YarnSchedulerBackend$YarnDriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.217.202:43729) with ID 1 -- :: INFO TaskSetManager: - Starting task , hadoop2.org.cn, executor , partition , PROCESS_LOCAL, bytes) -- :: INFO BlockManagerMasterEndpoint: - Registering block manager hadoop2.org.cn: with , hadoop2.org.cn, , None) -- :: INFO BlockManagerInfo: - Added broadcast_0_piece0 (size: 1256.0 B, free: 413.9 MB) -- :: INFO TaskSetManager: - Starting task , hadoop2.org.cn, executor , partition , PROCESS_LOCAL, bytes) -- :: INFO TaskSetManager: - Finished task ) ms on hadoop2.org.cn (executor ) (/) -- :: INFO TaskSetManager: - Finished task ) ms on hadoop2.org.cn (executor ) (/) -- :: INFO DAGScheduler: - ResultStage (reduce at SparkPi.scala:) finished in 34.651 s -- :: INFO YarnScheduler: - Removed TaskSet 0.0, whose tasks have all completed, from pool -- :: INFO DAGScheduler: - Job finished: reduce at SparkPi.scala:, took 37.594449 s Pi is roughly 3.14281571407857 -- :: INFO AbstractConnector: - Stopped Spark@47a64f7d{HTTP/} -- :: INFO SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040 -- :: INFO YarnClientSchedulerBackend: - Interrupting monitor thread -- :: INFO YarnClientSchedulerBackend: - Shutting down all executors -- :: INFO YarnSchedulerBackend$YarnDriverEndpoint: - Asking each executor to shut down -- :: INFO SchedulerExtensionServices: - Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) -- :: INFO YarnClientSchedulerBackend: - Stopped -- :: INFO MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped! -- :: INFO MemoryStore: - MemoryStore cleared -- :: INFO BlockManager: - BlockManager stopped -- :: INFO BlockManagerMaster: - BlockManagerMaster stopped -- :: INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped! -- :: INFO SparkContext: - Successfully stopped SparkContext -- :: INFO ShutdownHookManager: - Shutdown hook called -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-ea671cae-988b-4f5b-a85f-184ed2dba58d -- :: INFO ShutdownHookManager: - Deleting directory /tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f
Master URLS
传递给Spark的Master URL可以采用以下格式之一:
- local:使用一个工作线程在本地运行Spark(即根本没有并行性)。
- local[K]:使用K个工作线程在本地运行Spark(理想情况下,将其设置为计算机上的核心数)。
- local[K,F]:使用K个工作线程和F个maxFailures在本地运行Spark(有关此变量的说明,请参阅spark.task.maxFailures)
- local[*]:使用与计算机上的逻辑核心一样多的工作线程在本地运行Spark。
- local [*,F]:本地运行Spark,其中包含与计算机和F maxFailures上的逻辑核心一样多的工作线程。
- spark://HOST:PORT:连接到给定的Spark独立集群主服务器。端口必须是主服务器配置使用的端口,默认为7077。
- spark://HOST1:PORT1,HOST2:PORT2:使用Zookeeper的备用主服务器连接到给定的Spark独立群集。该列表必须具有使用Zookeeper设置的高可用性群集中的所有主主机。端口必须是每个主服务器配置使用的默认端口,默认为7077。
- mesos://HOST:PORT:连接到给定的Mesos群集。端口必须是您配置使用的端口,默认为5050。或者,对于使用ZooKeeper的Mesos集群,请使用mesos://zk://....要使用--deploy-mode集群进行提交,应将HOST:PORT配置为连接到MesosClusterDispatcher。
- yarn:以客户端或集群模式连接到YARN集群,具体取决于--deploy-mode的值。将根据HADOOP_CONF_DIR或YARN_CONF_DIR变量找到群集位置。
- k8s://HOST:PORT:以群集模式连接到Kubernetes群集。客户端模式目前不受支持,将来的版本将支持。HOST和PORT参考[Kubernetes API服务器](https://kubernetes.io/docs/reference/generated/kube-apiserver/)。它默认使用TLS连接。为了强制它使用不安全的连接,您可以使用k8s://http://HOST:PORT。
从文件加载配置
spark-submit脚本可以从属性文件加载默认的Spark配置值,并将它们传递给您的应用程序。默认情况下,它将从Spark目录中的conf/spark-defaults.conf中读取选项。有关更多详细信息,请参阅有关加载默认配置的部分。
以这种方式加载默认Spark配置可以避免某些标志需要spark-submit。例如,如果设置了spark.master属性,则可以安全地从spark-submit中省略--master标志。通常,在SparkConf上显式设置的配置值采用最高优先级,然后传递给spark-submit的标志,然后是默认文件中的值。
如果您不清楚配置选项的来源,可以通过使用--verbose选项运行spark-submit来打印细粒度的调试信息。
高级依赖管理
使用spark-submit时,应用程序jar以及--jars选项中包含的任何jar都将自动传输到群集。-jars之后提供的URL必须用逗号分隔。该列表包含在驱动程序和执行程序类路径中。目录扩展不适用于--jars。
Spark使用以下URL方案来允许传播jar的不同策略:
file:-绝对路径和文file://URI由驱动程序的HTTP文件服务器提供服务,每个执行程序从驱动程序HTTP服务器提取文件。
hdfs:、http:、https:、ftp:-这些从URI中按预期下拉文件和JAR。
local:- 以local:/开头的URI应该作为每个工作节点上的本地文件存在。这意味着不会产生任何网络IO,并且适用于推送给每个工作者或通过NFS,GlusterFS等共享的大型文件/JAR。
请注意,JAR和文件将复制到执行程序节点上的每个SparkContext的工作目录中。随着时间的推移,这会占用大量空间,需要进行清理。使用YARN,可以自动处理清理,使用Spark standalone,可以使用spark.worker.cleanup.appDataTtl属性配置自动清理。
用户还可以通过使用--packages提供以逗号分隔的Maven坐标列表来包含任何其他依赖项。使用此命令时将处理所有传递依赖项。可以使用标志--repositories以逗号分隔的方式添加其他存储库(或SBT中的解析程序)。(请注意,在某些情况下,可以在存储库URI中提供受密码保护的存储库的凭据,例如在https://user:password@host/ ....以这种方式提供凭据时要小心。)这些命令可以是与pyspark,spark-shell和spark-submit一起使用以包含Spark Packages。
对于Python,可以使用等效的--py-files选项将.egg,.zip和.py库分发给执行程序。
更多信息
部署应用程序后,集群模式概述描述了分布式执行中涉及的组件,以及如何监视和调试应用程序。
坚壁清野