云数据流控制台仪表板未更新

时间:2021-09-17 15:35:31

We are using apache-beam python 2.3 with Google Cloud Dataflow. Since about 2 weeks the Cloud Dataflow Dashboard at https://console.cloud.google.com/dataflow is heavily delayed for us (about 30mins - 1h).

我们正在使用带有Google Cloud Dataflow的apache-beam python 2.3。自大约2周以来,https://console.cloud.google.com/dataflow上的Cloud Dataflow仪表板严重延迟了我们(大约30分钟 - 1小时)。

This comes in 2 flavours:

这有两种口味:

  • newly started jobs do not show up in the Overview, also the status link provided by beam for the detailed job status page does not work with an error "Job not found"

    新启动的作业未显示在概述中,梁的详细作业状态页面提供的状态链接也不起作用“错误找不到作业”

  • also, if jobs are finally shown they often show a status of "running", while in reality they are already finished

    另外,如果最终显示作业,它们通常会显示“正在运行”状态,而实际上它们已经完成

This is also true when trying to access the status via gcloud cli tool (such as "gcloud dataflow jobs list").

尝试通过gcloud cli工具(例如“gcloud dataflow jobs list”)访问状态时也是如此。

Eventually (after up to 2h) all jobs are updated and displayed correctly.

最终(最多2小时后)所有作业都会更新并正确显示。

Now, my question is: What is the reason for this and how can I get an up-to-date dashboard? Is there possibly anything I am doing wrong when running the job, do I need to pass another parameter or something?

现在,我的问题是:这是什么原因以及如何获得最新的仪表板?在运行作业时是否有任何我做错的事情,我是否需要传递另一个参数?

We run all jobs in region europe-west1, with all workers in zone=europe-west3-a (Frankfurt/Germany) due to data privacy regulations on the data we are working with.

由于我们正在处理的数据的数据隐私法规,我们在欧洲 - 西部地区开展所有工作,所有工人都在zone = europe-west3-a(法兰克福/德国)。

2 个解决方案

#1


0  

We are seeing this as well (also europe-west-1c).

我们也看到了这一点(也是欧洲西部1c)。

While Google figures this out, one workaround that we use to get around this is to open some old job that's already in the list and to replace Job ID in the URL directly. This way the new job and all its related information will display in the web page. Not a perfect solution, but it works for now.

虽然Google对此进行了解释,但我们用来解决此问题的一种解决方法是打开一些已经在列表中的旧作业,并直接替换URL中的作业ID。这样,新作业及其所有相关信息将显示在网页中。不是一个完美的解决方案,但它现在有效。

So when you start your code, it should say something like 'Job 2018-03-06_09_31_00-13061856958687011068 submitted' that's the ID that you need to replace...

因此,当您启动代码时,应该说“提交的作业2018-03-06_09_31_00-13061856958687011068”,这是您需要替换的ID ...

By the way, it doesn't seem related to the 2.2.3 upgrade, as we started seeing this issue a couple of weeks ago even while still running 2.2.0

顺便说一句,它似乎与2.2.3升级无关,因为我们几周前开始看到这个问题,即使仍在运行2.2.0

#2


0  

There were some listjobs server OOM crashes which caused a delay to dashboard updates, but now the issue has been resolved.

有一些listjobs服务器OOM崩溃导致仪表板更新延迟,但现在问题已经解决。

#1


0  

We are seeing this as well (also europe-west-1c).

我们也看到了这一点(也是欧洲西部1c)。

While Google figures this out, one workaround that we use to get around this is to open some old job that's already in the list and to replace Job ID in the URL directly. This way the new job and all its related information will display in the web page. Not a perfect solution, but it works for now.

虽然Google对此进行了解释,但我们用来解决此问题的一种解决方法是打开一些已经在列表中的旧作业,并直接替换URL中的作业ID。这样,新作业及其所有相关信息将显示在网页中。不是一个完美的解决方案,但它现在有效。

So when you start your code, it should say something like 'Job 2018-03-06_09_31_00-13061856958687011068 submitted' that's the ID that you need to replace...

因此,当您启动代码时,应该说“提交的作业2018-03-06_09_31_00-13061856958687011068”,这是您需要替换的ID ...

By the way, it doesn't seem related to the 2.2.3 upgrade, as we started seeing this issue a couple of weeks ago even while still running 2.2.0

顺便说一句,它似乎与2.2.3升级无关,因为我们几周前开始看到这个问题,即使仍在运行2.2.0

#2


0  

There were some listjobs server OOM crashes which caused a delay to dashboard updates, but now the issue has been resolved.

有一些listjobs服务器OOM崩溃导致仪表板更新延迟,但现在问题已经解决。