确定pcollection是否为空

时间:2021-09-30 15:32:09

How to check if a pcollection is empty or not before writing out to a text file in apache beam(2.1.0)?

如何在写入apache beam(2.1.0)中的文本文件之前检查pcollection是否为空?

What i'm trying to do here is to break a file into pcollections of specified number given as a parameter to the pipeline via ValueProvider. As this ValueProvider is not available at pipeline construction time, i declare a decent no 26(total no of alphabets and this is the max no which a user can input) to make it available for .withOuputTags(). So I get 26 tuple tags from which i have to retrieve pcollections before writing to text files. So here, only few number of tags as inputted by user will get populated and rest all are empty. Hence want to ignore empty pcollections returned by some of the tags before i apply TextIO.write().

我在这里尝试做的是通过ValueProvider将文件分解为指定数字的pcollections,作为管道的参数给出。由于此ValueProvider在管道构建时不可用,因此我声明了一个不错的26号(字母总数,这是用户可以输入的最大值),以使其可用于.withOuputTags()。所以我得到26个元组标签,我必须在写入文本文件之前检索pcollections。所以在这里,用户输入的标签只有少数会被填充,其余的都是空的。因此,在应用TextIO.write()之前,要忽略某些标记返回的空pcollections。

1 个解决方案

#1


0  

It seems like actually you want to write a collection into multiple sets of files, where some sets may be empty. The proper way to do this is using the DynamicDestinations API - see TextIO.write().to(DynamicDestinations) which will be available in Beam 2.2.0 which should be cut within the next couple of weeks. Meanwhile if you'd like to use it, you can build a snapshot of Beam at HEAD yourself.

实际上你似乎想要将一个集合写入多组文件,其中一些集合可能是空的。执行此操作的正确方法是使用DynamicDestinations API - 请参阅将在Beam 2.2.0中提供的TextIO.write()。to(DynamicDestinations),它应在接下来的几周内删除。同时如果您想使用它,您可以自己在HEAD上构建Beam的快照。

#1


0  

It seems like actually you want to write a collection into multiple sets of files, where some sets may be empty. The proper way to do this is using the DynamicDestinations API - see TextIO.write().to(DynamicDestinations) which will be available in Beam 2.2.0 which should be cut within the next couple of weeks. Meanwhile if you'd like to use it, you can build a snapshot of Beam at HEAD yourself.

实际上你似乎想要将一个集合写入多组文件,其中一些集合可能是空的。执行此操作的正确方法是使用DynamicDestinations API - 请参阅将在Beam 2.2.0中提供的TextIO.write()。to(DynamicDestinations),它应在接下来的几周内删除。同时如果您想使用它,您可以自己在HEAD上构建Beam的快照。