数据流运行器 - 由于401而刷新

时间:2022-06-30 14:52:23

Running a pipeline on DataflowRunner (Google Cloud Dataflow SDK for Python 0.5.5).

在DataflowRunner上运行管道(适用于Python 0.5.5的Google Cloud Dataflow SDK)。

The pipeline:


    | 'Read trip from BigQuery' >> beam.io.Read(beam.io.BigQuerySource(query=known_args.input))
    | 'Convert' >> beam.Map(lambda row: (row['HardwareId'],row))
    | 'Group devices' >> beam.GroupByKey()
    | 'Pull way info from mapserver' >> beam.FlatMap(get_osm_way)
    | 'Map way info to dictionary' >> beam.FlatMap(convert_to_dict)
    | 'Save to BQ' >> beam.io.Write(beam.io.BigQuerySink(
            known_args.output,            schema=schema_string,

It's set to be autoscaling and 15 workers were spinned up by the runner.


More detailed code: my another * question


After around 2 hours of running, it reported:


Attempting refresh to obtain initial access_token
 insertId: "jf9yr4g1sv0qku"   
 jsonPayload: {
  message: "Attempting refresh to obtain initial access_token"    
  worker: "beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  logger: "oauth2client.client:client.py:new_request"    
  thread: "110:140052132222720"    
  job: "2017-02-16_14_10_18-17481182243152998182"    
 resource: {…}   
 timestamp: "2017-02-17T00:41:19.908143997Z"   
 severity: "INFO"   
 labels: {…}   
 logName: "projects/fiona-zhao/logs/dataflow.googleapis.com%2Fworker"   

and started continuously reporting "refreshing due to a 401" . One of them is:


Refreshing due to a 401 (attempt 1/2)
 insertId: "zsorfgg1urhvty"   
 jsonPayload: {
  worker: "beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  logger: "oauth2client.client:client.py:new_request"    
  thread: "110:140052273633024"    
  job: "2017-02-16_14_10_18-17481182243152998182"    
  message: "Refreshing due to a 401 (attempt 1/2)"    
 resource: {…}  
 timestamp: "2017-02-17T02:45:12.886137962Z"   
 severity: "INFO"   
 labels: {
  compute.googleapis.com/resource_name: "dataflow-beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  dataflow.googleapis.com/job_id: "2017-02-16_14_10_18-17481182243152998182"    
  dataflow.googleapis.com/job_name: "beamapp-root-0216221014-530646"    
  dataflow.googleapis.com/region: "global"    
  compute.googleapis.com/resource_type: "instance"    
  compute.googleapis.com/resource_id: "2301951363070532306"    
 logName: "projects/fiona-zhao/logs/dataflow.googleapis.com%2Fworker"   

What can I do?


1 个解决方案



These log messages are a normal part of execution and in themselves do not reflect errors. My suggestion is to add additional logging to debug hanging external API calls or execution steps.


Though we cannot comment on specific execution details of particular jobs on this open forum, the Cloud Dataflow team can provide more support on the dataflow-feedback@google.com mailing list.

虽然我们无法在此开放论坛上评论特定作业的具体执行细节,但Cloud Dataflow团队可以在dataflow-feedback@google.com邮件列表上提供更多支持。



These log messages are a normal part of execution and in themselves do not reflect errors. My suggestion is to add additional logging to debug hanging external API calls or execution steps.


Though we cannot comment on specific execution details of particular jobs on this open forum, the Cloud Dataflow team can provide more support on the dataflow-feedback@google.com mailing list.

虽然我们无法在此开放论坛上评论特定作业的具体执行细节,但Cloud Dataflow团队可以在dataflow-feedback@google.com邮件列表上提供更多支持。