当我启动spark应用程序时,如何计算作业数量

时间:2021-07-31 02:29:48

I was trying to get the jobs count in the first time, and I try to get it from JobprogressListener. But only stages and tasks information, not jobs. We know Spark application generate the jobs "as it goes".

我试图在第一次获得作业,我尝试从JobprogressListener获取它。但只有阶段和任务信息,而不是工作。我们知道Spark应用程序会“按原样”生成作业。

But if there is a component or a class recording the job information UP FRONT?

但是如果有一个组件或类记录了工作信息UP FRONT?

1 个解决方案

#1


0  

It's possible, but I would recommend Spark RESTful API

这是可能的,但我会推荐Spark RESTful API

Steps:

  1. Get applicationId from SparkContext.applicationId property
  2. 从SparkContext.applicationId属性获取applicationId

  3. Query http://context-url:port/applications/api/v1/[app-id]/jobs Where context-url is address of your Spark Driver and port is port with Web UI, it's 4040 normally. Here is documentation
  4. 查询http:// context-url:port / applications / api / v1 / [app-id] / jobs其中context-url是Spark驱动程序的地址,端口是带Web UI的端口,通常是4040。这是文档

  5. Count jobs that are returned in response from RESTful API
  6. 计算从RESTful API响应返回的作业

#1


0  

It's possible, but I would recommend Spark RESTful API

这是可能的,但我会推荐Spark RESTful API

Steps:

  1. Get applicationId from SparkContext.applicationId property
  2. 从SparkContext.applicationId属性获取applicationId

  3. Query http://context-url:port/applications/api/v1/[app-id]/jobs Where context-url is address of your Spark Driver and port is port with Web UI, it's 4040 normally. Here is documentation
  4. 查询http:// context-url:port / applications / api / v1 / [app-id] / jobs其中context-url是Spark驱动程序的地址,端口是带Web UI的端口,通常是4040。这是文档

  5. Count jobs that are returned in response from RESTful API
  6. 计算从RESTful API响应返回的作业