Beam / Dataflow 2.2.0 - 从pcollection中提取前n个元素

时间:2022-06-25 15:39:05

Is there any way to extract first n elements in a beam pcollection? The documentation doesn't seem to indicate any such function. I think such an operation would require first a global element number assignment and then a filter - would be nice to have this functionality.

有没有办法在梁pcollection中提取前n个元素?文档似乎没有表明任何此类功能。我认为这样的操作首先需要一个全局元素编号赋值然后一个过滤器 - 拥有这个功能会很好。

I use Google DataFlow Java SDK 2.2.0.

我使用Google DataFlow Java SDK 2.2.0。

1 个解决方案

#1


2  

PCollection's are unordered per se, so the notion of "first N elements" does not exist - however:

PCollection本身是无序的,因此“前N个元素”的概念不存在 - 但是:

  • In case you need the top N elements by some criterion, you can use the Top transform.

    如果您需要某些条件的前N个元素,则可以使用Top变换。

  • In case you need any N elements, you can use Sample.

    如果您需要任何N个元素,可以使用Sample。

#1


2  

PCollection's are unordered per se, so the notion of "first N elements" does not exist - however:

PCollection本身是无序的,因此“前N个元素”的概念不存在 - 但是:

  • In case you need the top N elements by some criterion, you can use the Top transform.

    如果您需要某些条件的前N个元素,则可以使用Top变换。

  • In case you need any N elements, you can use Sample.

    如果您需要任何N个元素,可以使用Sample。