从Google云数据流中的CombineFn访问PipelineOptions

时间:2022-04-15 15:26:26

I need to instantiate use a GcsUtil from within a CombineFn subclass and it looks like I need to hand a PipelineOptions instance to the GcsUtilFactory. However I cannot find a way to retrieve an instance of the PipelineOptions class (unlike in DoFns).

我需要在CombineFn子类中实例化使用GcsUtil,看起来我需要将PipelineOptions实例交给GcsUtilFactory。但是,我找不到一种方法来检索PipelineOptions类的实例(与DoFns不同)。

Is there an API to retrieve the current pipeline's options at runtime? Keeping the options in a field doesn't seem to work and blocks the pipeline upload to the dataflow service.

是否有API在运行时检索当前管道的选项?将选项保留在字段中似乎不起作用并阻止管道上载到数据流服务。

Thanks! G

谢谢! G

1 个解决方案

#1


1  

Reading from GCS within the CombineFn is likely to be problematic. For instance, you wouldn't get any of the caching that side-inputs give you.

在CombineFn中从GCS读取可能会有问题。例如,您不会获得侧输入给您的任何缓存。

Depending on what kind of configuration you're trying to do, your best bet is probably to use a ParDo/DoFn before running the Combine.

根据您尝试的配置类型,最好的办法是在运行Combine之前使用ParDo / DoFn。

Separately, it probably does make sense for PipelineOptions to be made accessible from within the CombineFn. I've made a note of this, and we'll take a look.

另外,从CombineFn中可以访问PipelineOptions可能是有意义的。我已经记下了这一点,我们来看看。

#1


1  

Reading from GCS within the CombineFn is likely to be problematic. For instance, you wouldn't get any of the caching that side-inputs give you.

在CombineFn中从GCS读取可能会有问题。例如,您不会获得侧输入给您的任何缓存。

Depending on what kind of configuration you're trying to do, your best bet is probably to use a ParDo/DoFn before running the Combine.

根据您尝试的配置类型,最好的办法是在运行Combine之前使用ParDo / DoFn。

Separately, it probably does make sense for PipelineOptions to be made accessible from within the CombineFn. I've made a note of this, and we'll take a look.

另外,从CombineFn中可以访问PipelineOptions可能是有意义的。我已经记下了这一点,我们来看看。