spark各种模式提交任务介绍

时间:2024-09-20 20:33:35

前言

本文章部分内容翻译自:

http://spark.apache.org/docs/latest/submitting-applications.html

应用提交

Spark的bin目录中的spark-submit脚本用于在集群上启动应用程序。它可以通过统一的界面使用Spark支持的所有集群管理器,因此您不必为每个集群管理器配置应用程序。

捆绑应用程序的依赖关系

如果您的代码依赖于其他项目,则需要将它们与应用程序一起打包,以便将代码分发到Spark集群。为此,请创建包含代码及其依赖项的程序集jar(或“uber”jar)。sbt和Maven都有汇编插件。在创建程序集jar时,将Spark和Hadoop列为提供的依赖项;这些不需要捆绑,因为它们是由集群管理器在运行时提供的。一旦你有了一个组装的jar,你可以在传递你的jar时调用bin/spark-submit脚本。对于Python,您可以使用spark-submit的--py-files参数添加.py,.zip或.egg文件,以便与您的应用程序一起分发。如果您依赖多个Python文件,我们建议将它们打包成.zip或.egg。

使用spark-submit启动应用程序

捆绑用户应用程序后,可以使用bin/spark-submit脚本启动它。此脚本负责使用Spark及其依赖项设置类路径,并且可以支持Spark支持的不同集群管理器和部署模式:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

上述一些常用的选项分别是:

--class:应用程序的入口点(例如org.apache.spark.examples.SparkPi)

--master:集群的主URL(例如spark://23.195.26.187:7077)

--deploy-mode:是在工作节点(集群)上部署驱动程序还是在本地部署为外部客户端(客户端)(默认值:客户端)。

--conf:key = value格式的任意Spark配置属性。对于包含空格的值,用引号括起“key = value”(如图所示)。

application-jar:包含应用程序和所有依赖项的捆绑jar的路径。URL必须在群集内部全局可见,例如,hdfs://path或所有节点上都存在的file://path。

application-arguments:传递给主类的main方法的参数(如果有的话)。

常见的部署策略是从与您的工作机器物理位于同一位置的网关机器(例如,独立EC2集群中的主节点)提交您的应用程序。在此设置中,客户端模式是合适的。在客户端模式下,驱动程序直接在spark-submit进程中启动,该进程充当群集的客户端。应用程序的输入和输出附加到控制台。因此,该模式特别适用于涉及REPL的应用程序(例如Spark shell)。

或者,如果您的应用程序是从远离工作机器的计算机提交的(例如,在笔记本电脑上本地提交),则通常使用群集模式来最小化驱动程序和执行程序之间的网络延迟。目前,独立模式不支持Python应用程序的集群模式。

对于Python应用程序,只需传递一个.py文件代替<application-jar>而不是JAR,并使用--py-files将Python的.zip,.egg或.py文件添加到搜索路径中。

有一些特定于正在使用的集群管理器的选项。例如,对于具有集群部署模式的Spark独立集群,您还可以指定--supervise以确保驱动程序在失败且退出代码为非零时自动重新启动。要枚举所有可用于spark-submit的选项,请使用--help运行它。

各种模式运行spark任务

local模式

[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit ---.jar
-- :: WARN  NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classepplicable
-- :: INFO  SparkContext: - Running Spark version
-- :: INFO  SparkContext: - Submitted application: Spark Pi
-- :: INFO  SecurityManager: - Changing view acls to: root
-- :: INFO  SecurityManager: - Changing modify acls to: root
-- :: INFO  SecurityManager: - Changing view acls groups to:
-- :: INFO  SecurityManager: - Changing modify acls groups to:
-- :: INFO  SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users  with view permiss(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  SparkEnv: - Registering MapOutputTracker
-- :: INFO  SparkEnv: - Registering BlockManagerMaster
-- :: INFO  BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion
-- :: INFO  BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up
-- :: INFO  DiskBlockManager: - Created local directory at /tmp/blockmgr-4ddfef66---b029-05332cfa70a9
-- :: INFO  MemoryStore: - MemoryStore started with capacity 413.9 MB
-- :: INFO  SparkEnv: - Registering OutputCommitCoordinator
-- :: INFO  log: - Logging initialized @9713ms
-- :: INFO  Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
-- :: INFO  Server: - Started @9891ms
-- :: INFO  AbstractConnector: - Started ServerConnector@40e4ea87{HTTP/}
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@1a38ba58{/jobs,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@24b52d3e{/jobs/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@15deb1dc{/jobs/job,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@5af5def9{/stages,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3a45c42a{/stages/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages/stage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@27a0a5a2{/stages/stage/json,null,AVAILABLE,@Sp
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@7692cd34{/stages/pool,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/pool/json,null,AVAILABLE,@Spar
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@32c0915e{/storage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@106faf11{/storage/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@70f43b45{/storage/rdd,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/rdd/json,null,AVAILABLE,@Spa
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@10ad20cb{/environment,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@7dd712e8{/environment/json,null,AVAILABLE,@Spa
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@2c282004{/executors,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@22ee2d0{/executors/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors/threadDump,null,AVAILABLE,
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/threadDump/json,null,AVAILrk}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@53bc1328{/static,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@e041f0c{/,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@6a175569{/api,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@4102b1b1{/jobs/job/kill,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/kill,null,AVAILABLE,@Sp
-- :: INFO  SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040
-- :: INFO  SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-2.4 spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758442663
-- :: INFO  Executor: - Starting executor ID driver on host localhost
-- :: INFO  Utils: - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por
-- :: INFO  NettyBlockTransferService: - Server created on hadoop1.org.cn:
-- :: INFO  BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic
-- :: INFO  BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None)
-- :: INFO  BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@61a91912{/metrics/json,null,AVAILABLE,@Spark}
-- :: INFO  SparkContext: - Starting job: reduce at SparkPi.scala:
-- :: INFO  DAGScheduler: - Got job  (reduce at SparkPi.scala:) with  output partitions
-- :: INFO  DAGScheduler: - Final stage: ResultStage  (reduce at SparkPi.scala:)
-- :: INFO  DAGScheduler: - Parents of final stage: List()
-- :: INFO  DAGScheduler: - Missing parents: List()
-- :: INFO  DAGScheduler: - Submitting ResultStage  (MapPartitionsRDD[] at map at SparkPi.scala:), which has noparents
-- :: INFO  MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
-- :: INFO  MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size:
-- :: INFO  SparkContext: - Created broadcast
-- :: INFO  DAGScheduler: - Submitting  missing tasks  (MapPartitionsRDD[] at map at SparkPi.scfirst  tasks are , ))
-- :: INFO  TaskSchedulerImpl: - Adding task  tasks
-- :: INFO  TaskSetManager: - Starting task , localhost, executor driver, partition , PROCE  bytes)
-- :: INFO  Executor: - Running task )
-- :: INFO  Executor: - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar with timestamp 1553
-- :: INFO  TransportClientFactory: - Successfully created connection to hadoop1.org.cn/ afte( ms spent in bootstraps)
-- :: INFO  Utils: - Fetching spark://hadoop1.org.cn:48468/jars/spark-examples_2.11-2.4.0.jar to /tmp/spark-e9e2c8bda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-8422-2b637ce7a1fb/fetchFileTemp701584033085131304.tmp
-- :: INFO  Executor: - Adding file:/tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9/userFiles-e2c1980d-6d11-48f1-84e7a1fb/spark-examples_2.-.jar to class loader
-- :: INFO  Executor: - Finished task ).  bytes result sent to driver
-- :: INFO  TaskSetManager: - Starting task , localhost, executor driver, partition , PROCE  bytes)
-- :: INFO  Executor: - Running task )
-- :: INFO  Executor: - Finished task ).  bytes result sent to driver
-- :: INFO  TaskSetManager: - Finished task )  ms on localhost (executor driver) (/
-- :: INFO  TaskSetManager: - Finished task )  ms on localhost (executor driver) (/)
-- :: INFO  TaskSchedulerImpl: - Removed TaskSet 0.0, whose tasks have all completed, from pool
-- :: INFO  DAGScheduler: - ResultStage  (reduce at SparkPi.scala:) finished in 3.577 s
-- :: INFO  DAGScheduler: - Job  finished: reduce at SparkPi.scala:, took 4.506013 s
Pi is roughly 3.142475712378562
-- :: INFO  AbstractConnector: - Stopped Spark@40e4ea87{HTTP/}
-- :: INFO  SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040
-- :: INFO  MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped!
-- :: INFO  MemoryStore: - MemoryStore cleared
-- :: INFO  BlockManager: - BlockManager stopped
-- :: INFO  BlockManagerMaster: - BlockManagerMaster stopped
-- :: INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped!
-- :: INFO  SparkContext: - Successfully stopped SparkContext
-- :: INFO  ShutdownHookManager: - Shutdown hook called
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-3f8eab55-786c--9f99-ca779610ee0d
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-e9e2c8b5-0a08-4dda-9d3d-4a4f9671b0d9

standalone模式

[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.217 examples/jars/spark-examples_2.11-2.4.0.jar
-- :: WARN  NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classepplicable
-- :: INFO  SparkContext: - Running Spark version
-- :: INFO  SparkContext: - Submitted application: Spark Pi
-- :: INFO  SecurityManager: - Changing view acls to: root
-- :: INFO  SecurityManager: - Changing modify acls to: root
-- :: INFO  SecurityManager: - Changing view acls groups to:
-- :: INFO  SecurityManager: - Changing modify acls groups to:
-- :: INFO  SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users  with view permiss(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  SparkEnv: - Registering MapOutputTracker
-- :: INFO  SparkEnv: - Registering BlockManagerMaster
-- :: INFO  BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topologyion
-- :: INFO  BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up
-- :: INFO  DiskBlockManager: - Created local directory at /tmp/blockmgr--9ee3---79c2902a5a4d
-- :: INFO  MemoryStore: - MemoryStore started with capacity 413.9 MB
-- :: INFO  SparkEnv: - Registering OutputCommitCoordinator
-- :: INFO  log: - Logging initialized @9124ms
-- :: INFO  Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
-- :: INFO  Server: - Started @9269ms
-- :: INFO  AbstractConnector: - Started ServerConnector@3a7b503d{HTTP/}
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@6058e535{/jobs,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@6e9c413e{/jobs/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@57a4d5ee{/jobs/job,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3a45c42a{/jobs/job/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@36dce7ed{/stages,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@47a64f7d{/stages/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@33d05366{/stages/stage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@33aa93c{/stages/stage/json,null,AVAILABLE,@Spa
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@32c0915e{/stages/pool,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@106faf11{/stages/pool/json,null,AVAILABLE,@Spa
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@70f43b45{/storage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@26d10f2e{/storage/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@10ad20cb{/storage/rdd,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@7dd712e8{/storage/rdd/json,null,AVAILABLE,@Spa
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@2c282004{/environment,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@22ee2d0{/environment/json,null,AVAILABLE,@Spar
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@7bfc3126{/executors,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3e792ce3{/executors/json,null,AVAILABLE,@Spark
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@53bc1328{/executors/threadDump,null,AVAILABLE,
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@26f143ed{/executors/threadDump/json,null,AVAILrk}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3c1e3314{/static,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@{/,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3f3c966c{/api,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3a71c100{/jobs/job/kill,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/stage/kill,null,AVAILABLE,@Sp
-- :: INFO  SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040
-- :: INFO  SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-2.4 spark://hadoop1.org.cn:40178/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550758781088
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Connecting to master spark://192.168.217.201:7077...
-- :: INFO  TransportClientFactory: - Successfully created connection to / after  ms ( ms sootstraps)
-- :: INFO  StandaloneSchedulerBackend: - Connected to Spark cluster with app ID app--
-- :: INFO  Utils: - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on por
-- :: INFO  NettyBlockTransferService: - Server created on hadoop1.org.cn:
-- :: INFO  BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication polic
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with  core(s)
-- :: INFO  StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .202th  core(s), 1024.0 MB RAM
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with  core(s)
-- :: INFO  StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .203th  core(s), 1024.0 MB RAM
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with  core(s)
-- :: INFO  StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .201th  core(s), 1024.0 MB RAM
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING
-- :: INFO  BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None)
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING
-- :: INFO  BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@726a17c4{/metrics/json,null,AVAILABLE,@Spark}
-- :: INFO  StandaloneSchedulerBackend: - SchedulerBackend is ready for scheduling beginning after reached minRegisurcesRatio: 0.0
-- :: INFO  CoarseGrainedSchedulerBackend$DriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.202:38810) with ID 0
-- :: INFO  CoarseGrainedSchedulerBackend$DriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client:// (192.168.217.203:47346) with ID 1
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager  with , None)
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager  with , None)
-- :: INFO  SparkContext: - Starting job: reduce at SparkPi.scala:
-- :: INFO  DAGScheduler: - Got job  (reduce at SparkPi.scala:) with  output partitions
-- :: INFO  DAGScheduler: - Final stage: ResultStage  (reduce at SparkPi.scala:)
-- :: INFO  DAGScheduler: - Parents of final stage: List()
-- :: INFO  DAGScheduler: - Missing parents: List()
-- :: INFO  DAGScheduler: - Submitting ResultStage  (MapPartitionsRDD[] at map at SparkPi.scala:), which has noparents
-- :: INFO  MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
-- :: INFO  MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size:
-- :: INFO  SparkContext: - Created broadcast
-- :: INFO  DAGScheduler: - Submitting  missing tasks  (MapPartitionsRDD[] at map at SparkPi.scfirst  tasks are , ))
-- :: INFO  TaskSchedulerImpl: - Adding task  tasks
-- :: INFO  TaskSetManager: - Starting task , , partition , PROC,  bytes)
-- :: INFO  TaskSetManager: - Starting task , , partition , PROC,  bytes)
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size: 1256.0 B, free:
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size: 1256.0 B, free:
-- :: INFO  TaskSetManager: - Finished task )  ms on ) (/
-- :: INFO  TaskSetManager: - Finished task )  ms on ) (/
-- :: INFO  DAGScheduler: - ResultStage  (reduce at SparkPi.scala:) finished in 16.998 s
-- :: INFO  TaskSchedulerImpl: - Removed TaskSet 0.0, whose tasks have all completed, from pool
-- :: INFO  DAGScheduler: - Job  finished: reduce at SparkPi.scala:, took 20.610491 s
Pi is roughly 3.1427357136785683
-- :: INFO  AbstractConnector: - Stopped Spark@3a7b503d{HTTP/}
-- :: INFO  SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040
-- :: INFO  StandaloneSchedulerBackend: - Shutting down all executors
-- :: INFO  CoarseGrainedSchedulerBackend$DriverEndpoint: - Asking each executor to shut down
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ )
-- :: INFO  StandaloneSchedulerBackend: - Executor app--/ removed: Command exited with code
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor added: app--/ on worker-- () with  core(s)
-- :: INFO  StandaloneSchedulerBackend: - Granted executor ID app--/ on hostPort .202th  core(s), 1024.0 MB RAM
-- :: INFO  StandaloneAppClient$ClientEndpoint: - Executor updated: app--/ is now RUNNING
-- :: INFO  MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped!
-- :: INFO  MemoryStore: - MemoryStore cleared
-- :: INFO  BlockManager: - BlockManager stopped
-- :: INFO  BlockManagerMaster: - BlockManagerMaster stopped
-- :: INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped!
-- :: INFO  SparkContext: - Successfully stopped SparkContext
-- :: INFO  ShutdownHookManager: - Shutdown hook called
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-e9155121-edfe-4f64-b917-be3f9f62220a
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-90e9b57b-c3d4--8eec-906f121d6b98
[root@hadoop1 spark--bin-hadoop2.]# 

yarn-cluster模式

所谓的yarn集群模式,就是讲spark任务提交给yarn,让yarn去执行相关的任务,因此需要在spark-env.sh文件中添加export HADOOP_CONF_DIR=/usr/hdp/hadoop-2.8.3/etc/hadoop,然后去执行相关的任务:

[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode
 cluster examples/jars/spark-examples_2.-.jar
-- :: WARN  NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-- :: INFO  RMProxy: - Connecting to ResourceManager at hadoop1/
-- :: INFO  Client: - Requesting a  NodeManagers
-- :: INFO  Client: - Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
-- :: INFO  Client: - Will allocate AM container, with  MB memory including  MB overhead
-- :: INFO  Client: - Setting up container launch context for our AM
-- :: INFO  Client: - Setting up the launch environment for our AM container
-- :: INFO  Client: - Preparing resources for our AM container
-- :: WARN  Client: - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
-- :: INFO  Client: - Uploading resource file:/tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb/__spark_libs__7092311691544510332.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_libs__7092311691544510332.zip
-- :: INFO  Client: - Uploading resource file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-.jar -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/spark-examples_2.11-2.4.0.jar
-- :: INFO  Client: - Uploading resource file:/tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb/__spark_conf__4473735302996115715.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0001/__spark_conf__.zip
-- :: INFO  SecurityManager: - Changing view acls to: root
-- :: INFO  SecurityManager: - Changing modify acls to: root
-- :: INFO  SecurityManager: - Changing view acls groups to:
-- :: INFO  SecurityManager: - Changing modify acls groups to:
-- :: INFO  SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-- :: INFO  Client: - Submitting application application_1550757972410_0001 to ResourceManager
-- :: INFO  YarnClientImpl: - Submitted application application_1550757972410_0001
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: -
     client token: N/A
     diagnostics: [星期五 二月  :: + ] Scheduler has assigned a container for AM, waiting for AM container to be launched
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -
     queue: default
     start time:
     final status: UNDEFINED
     tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/
     user: root
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop2.org.cn
     ApplicationMaster RPC port:
     queue: default
     start time:
     final status: UNDEFINED
     tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/
     user: root
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: RUNNING)
-- :: INFO  Client: - Application report for application_1550757972410_0001 (state: FINISHED)
-- :: INFO  Client: -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop2.org.cn
     ApplicationMaster RPC port:
     queue: default
     start time:
     final status: SUCCEEDED
     tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0001/
     user: root
-- :: INFO  ShutdownHookManager: - Shutdown hook called
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-afb05931-e273-4e9f-b38a-d3ca234dfb34
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-2b75f51c-ce24--aa38-6d3262b1c7cb
[root@hadoop1 spark--bin-hadoop2.]# 

其他在python以及kubernets的相关运行程序就不在赘述了。

yarn-client模式

[root@hadoop1 spark--bin-hadoop2.]# ./bin/spark-submit ---.jar
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
-- :: WARN  NativeCodeLoader: - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-- :: INFO  SparkContext: - Running Spark version
-- :: INFO  SparkContext: - Submitted application: Spark Pi
-- :: INFO  SecurityManager: - Changing view acls to: root
-- :: INFO  SecurityManager: - Changing modify acls to: root
-- :: INFO  SecurityManager: - Changing view acls groups to:
-- :: INFO  SecurityManager: - Changing modify acls groups to:
-- :: INFO  SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  SparkEnv: - Registering MapOutputTracker
-- :: INFO  SparkEnv: - Registering BlockManagerMaster
-- :: INFO  BlockManagerMasterEndpoint: - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
-- :: INFO  BlockManagerMasterEndpoint: - BlockManagerMasterEndpoint up
-- :: INFO  DiskBlockManager: - Created local directory at /tmp/blockmgr-f9baa979-e964-46a9-b034-475ba5148562
-- :: INFO  MemoryStore: - MemoryStore started with capacity 413.9 MB
-- :: INFO  SparkEnv: - Registering OutputCommitCoordinator
-- :: INFO  log: - Logging initialized @9175ms
-- :: INFO  Server: - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
-- :: INFO  Server: - Started @9359ms
-- :: INFO  AbstractConnector: - Started ServerConnector@47a64f7d{HTTP/}
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@49ef32e0{/jobs,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3be8821f{/jobs/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@64b31700{/jobs/job,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@bae47a0{/jobs/job/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@74a9c4b0{/stages,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@85ec632{/stages/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@1c05a54d{/stages/stage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@214894fc{/stages/stage/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@{/stages/pool,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@e362c57{/stages/pool/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@1c4ee95c{/storage,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@79c4715d{/storage/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@5aa360ea{/storage/rdd,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@6548bb7d{/storage/rdd/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@e27ba81{/environment,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@54336c81{/environment/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@1556f2dd{/executors,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@35e52059{/executors/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@62577d6{/executors/threadDump,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@49bd54f7{/executors/threadDump/json,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@6b5f8707{/static,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@17ae98d7{/,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@59221b97{/api,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@704b2127{/jobs/job/kill,null,AVAILABLE,@Spark}
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@3ee39da0{/stages/stage/kill,null,AVAILABLE,@Spark}
-- :: INFO  SparkUI: - Bound SparkUI to 192.168.217.201, and started at http://hadoop1.org.cn:4040
-- :: INFO  SparkContext: - Added JAR file:/usr/hdp/spark--bin-hadoop2./examples/jars/spark-examples_2.-.jar at spark://hadoop1.org.cn:34169/jars/spark-examples_2.11-2.4.0.jar with timestamp 1550766679506
-- :: INFO  RMProxy: - Connecting to ResourceManager at hadoop1/
-- :: INFO  Client: - Requesting a  NodeManagers
-- :: INFO  Client: - Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
-- :: INFO  Client: - Will allocate AM container, with  MB memory including  MB overhead
-- :: INFO  Client: - Setting up container launch context for our AM
-- :: INFO  Client: - Setting up the launch environment for our AM container
-- :: INFO  Client: - Preparing resources for our AM container
-- :: WARN  Client: - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
-- :: INFO  Client: - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_libs__4384247224971462772.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_libs__4384247224971462772.zip
-- :: INFO  Client: - Uploading resource file:/tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f/__spark_conf__7312670304741942310.zip -> hdfs://hadoop1:9000/user/root/.sparkStaging/application_1550757972410_0006/__spark_conf__.zip
-- :: INFO  SecurityManager: - Changing view acls to: root
-- :: INFO  SecurityManager: - Changing modify acls to: root
-- :: INFO  SecurityManager: - Changing view acls groups to:
-- :: INFO  SecurityManager: - Changing modify acls groups to:
-- :: INFO  SecurityManager: - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
-- :: INFO  Client: - Submitting application application_1550757972410_0006 to ResourceManager
-- :: INFO  YarnClientImpl: - Submitted application application_1550757972410_0006
-- :: INFO  SchedulerExtensionServices: - Starting Yarn extension services with app application_1550757972410_0006 and attemptId None
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: -
     client token: N/A
     diagnostics: AM container is launched, waiting for AM container to Register with RM
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -
     queue: default
     start time:
     final status: UNDEFINED
     tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/
     user: root
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: ACCEPTED)
-- :: INFO  Client: - Application report for application_1550757972410_0006 (state: RUNNING)
-- :: INFO  Client: -
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.217.203
     ApplicationMaster RPC port: -
     queue: default
     start time:
     final status: UNDEFINED
     tracking URL: http://hadoop1:8088/proxy/application_1550757972410_0006/
     user: root
-- :: INFO  YarnClientSchedulerBackend: - Application application_1550757972410_0006 has started running.
-- :: INFO  Utils: - Successfully started service .
-- :: INFO  NettyBlockTransferService: - Server created on hadoop1.org.cn:
-- :: INFO  BlockManager: - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
-- :: INFO  BlockManagerMaster: - Registering BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager hadoop1.org.cn: with , None)
-- :: INFO  BlockManagerMaster: - Registered BlockManager BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  BlockManager: - Initialized BlockManager: BlockManagerId(driver, hadoop1.org.cn, , None)
-- :: INFO  ContextHandler: - Started o.s.j.s.ServletContextHandler@1788cb61{/metrics/json,null,AVAILABLE,@Spark}
-- :: INFO  YarnClientSchedulerBackend: - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1550757972410_0006), /proxy/application_1550757972410_0006
-- :: INFO  JettyUtils: - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill, /metrics/json.
-- :: INFO  YarnClientSchedulerBackend: - SchedulerBackend (ms)
-- :: INFO  YarnSchedulerBackend$YarnSchedulerEndpoint: - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
-- :: INFO  SparkContext: - Starting job: reduce at SparkPi.scala:
-- :: INFO  DAGScheduler: - Got job  (reduce at SparkPi.scala:) with  output partitions
-- :: INFO  DAGScheduler: - Final stage: ResultStage  (reduce at SparkPi.scala:)
-- :: INFO  DAGScheduler: - Parents of final stage: List()
-- :: INFO  DAGScheduler: - Missing parents: List()
-- :: INFO  DAGScheduler: - Submitting ResultStage  (MapPartitionsRDD[] at map at SparkPi.scala:), which has no missing parents
-- :: INFO  MemoryStore: - Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
-- :: INFO  MemoryStore: - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size: 1256.0 B, free: 413.9 MB)
-- :: INFO  SparkContext: - Created broadcast
-- :: INFO  DAGScheduler: - Submitting  missing tasks  (MapPartitionsRDD[] at map at SparkPi.scala:) (first  tasks are , ))
-- :: INFO  YarnScheduler: - Adding task  tasks
-- :: WARN  YarnScheduler: - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
-- :: INFO  YarnSchedulerBackend$YarnDriverEndpoint: - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.217.202:43729) with ID 1
-- :: INFO  TaskSetManager: - Starting task , hadoop2.org.cn, executor , partition , PROCESS_LOCAL,  bytes)
-- :: INFO  BlockManagerMasterEndpoint: - Registering block manager hadoop2.org.cn: with , hadoop2.org.cn, , None)
-- :: INFO  BlockManagerInfo: - Added broadcast_0_piece0  (size: 1256.0 B, free: 413.9 MB)
-- :: INFO  TaskSetManager: - Starting task , hadoop2.org.cn, executor , partition , PROCESS_LOCAL,  bytes)
-- :: INFO  TaskSetManager: - Finished task )  ms on hadoop2.org.cn (executor ) (/)
-- :: INFO  TaskSetManager: - Finished task )  ms on hadoop2.org.cn (executor ) (/)
-- :: INFO  DAGScheduler: - ResultStage  (reduce at SparkPi.scala:) finished in 34.651 s
-- :: INFO  YarnScheduler: - Removed TaskSet 0.0, whose tasks have all completed, from pool
-- :: INFO  DAGScheduler: - Job  finished: reduce at SparkPi.scala:, took 37.594449 s
Pi is roughly 3.14281571407857
-- :: INFO  AbstractConnector: - Stopped Spark@47a64f7d{HTTP/}
-- :: INFO  SparkUI: - Stopped Spark web UI at http://hadoop1.org.cn:4040
-- :: INFO  YarnClientSchedulerBackend: - Interrupting monitor thread
-- :: INFO  YarnClientSchedulerBackend: - Shutting down all executors
-- :: INFO  YarnSchedulerBackend$YarnDriverEndpoint: - Asking each executor to shut down
-- :: INFO  SchedulerExtensionServices: - Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
-- :: INFO  YarnClientSchedulerBackend: - Stopped
-- :: INFO  MapOutputTrackerMasterEndpoint: - MapOutputTrackerMasterEndpoint stopped!
-- :: INFO  MemoryStore: - MemoryStore cleared
-- :: INFO  BlockManager: - BlockManager stopped
-- :: INFO  BlockManagerMaster: - BlockManagerMaster stopped
-- :: INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: - OutputCommitCoordinator stopped!
-- :: INFO  SparkContext: - Successfully stopped SparkContext
-- :: INFO  ShutdownHookManager: - Shutdown hook called
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-ea671cae-988b-4f5b-a85f-184ed2dba58d
-- :: INFO  ShutdownHookManager: - Deleting directory /tmp/spark-bc395b60-843f-4e24-841c-1fb09330b89f

Master URLS

传递给Spark的Master URL可以采用以下格式之一:

  • local:使用一个工作线程在本地运行Spark(即根本没有并行性)。
  • local[K]:使用K个工作线程在本地运行Spark(理想情况下,将其设置为计算机上的核心数)。
  • local[K,F]:使用K个工作线程和F个maxFailures在本地运行Spark(有关此变量的说明,请参阅spark.task.maxFailures)
  • local[*]:使用与计算机上的逻辑核心一样多的工作线程在本地运行Spark。
  • local [*,F]:本地运行Spark,其中包含与计算机和F maxFailures上的逻辑核心一样多的工作线程。
  • spark://HOST:PORT:连接到给定的Spark独立集群主服务器。端口必须是主服务器配置使用的端口,默认为7077。
  • spark://HOST1:PORT1,HOST2:PORT2:使用Zookeeper的备用主服务器连接到给定的Spark独立群集。该列表必须具有使用Zookeeper设置的高可用性群集中的所有主主机。端口必须是每个主服务器配置使用的默认端口,默认为7077。
  • mesos://HOST:PORT:连接到给定的Mesos群集。端口必须是您配置使用的端口,默认为5050。或者,对于使用ZooKeeper的Mesos集群,请使用mesos://zk://....要使用--deploy-mode集群进行提交,应将HOST:PORT配置为连接到MesosClusterDispatcher。
  • yarn:以客户端或集群模式连接到YARN集群,具体取决于--deploy-mode的值。将根据HADOOP_CONF_DIR或YARN_CONF_DIR变量找到群集位置。
  • k8s://HOST:PORT:以群集模式连接到Kubernetes群集。客户端模式目前不受支持,将来的版本将支持。HOST和PORT参考[Kubernetes API服务器](https://kubernetes.io/docs/reference/generated/kube-apiserver/)。它默认使用TLS连接。为了强制它使用不安全的连接,您可以使用k8s://http://HOST:PORT。

从文件加载配置

spark-submit脚本可以从属性文件加载默认的Spark配置值,并将它们传递给您的应用程序。默认情况下,它将从Spark目录中的conf/spark-defaults.conf中读取选项。有关更多详细信息,请参阅有关加载默认配置的部分。

以这种方式加载默认Spark配置可以避免某些标志需要spark-submit。例如,如果设置了spark.master属性,则可以安全地从spark-submit中省略--master标志。通常,在SparkConf上显式设置的配置值采用最高优先级,然后传递给spark-submit的标志,然后是默认文件中的值。

如果您不清楚配置选项的来源,可以通过使用--verbose选项运行spark-submit来打印细粒度的调试信息。

高级依赖管理

使用spark-submit时,应用程序jar以及--jars选项中包含的任何jar都将自动传输到群集。-jars之后提供的URL必须用逗号分隔。该列表包含在驱动程序和执行程序类路径中。目录扩展不适用于--jars。

Spark使用以下URL方案来允许传播jar的不同策略:

file:-绝对路径和文file://URI由驱动程序的HTTP文件服务器提供服务,每个执行程序从驱动程序HTTP服务器提取文件。

hdfs:、http:、https:、ftp:-这些从URI中按预期下拉文件和JAR。

local:- 以local:/开头的URI应该作为每个工作节点上的本地文件存在。这意味着不会产生任何网络IO,并且适用于推送给每个工作者或通过NFS,GlusterFS等共享的大型文件/JAR。

请注意,JAR和文件将复制到执行程序节点上的每个SparkContext的工作目录中。随着时间的推移,这会占用大量空间,需要进行清理。使用YARN,可以自动处理清理,使用Spark standalone,可以使用spark.worker.cleanup.appDataTtl属性配置自动清理。

用户还可以通过使用--packages提供以逗号分隔的Maven坐标列表来包含任何其他依赖项。使用此命令时将处理所有传递依赖项。可以使用标志--repositories以逗号分隔的方式添加其他存储库(或SBT中的解析程序)。(请注意,在某些情况下,可以在存储库URI中提供受密码保护的存储库的凭据,例如在https://user:password@host/ ....以这种方式提供凭据时要小心。)这些命令可以是与pyspark,spark-shell和spark-submit一起使用以包含Spark Packages。

对于Python,可以使用等效的--py-files选项将.egg,.zip和.py库分发给执行程序。

更多信息

部署应用程序后,集群模式概述描述了分布式执行中涉及的组件,以及如何监视和调试应用程序。

坚壁清野