从DataFlow流管道写入带有表名日期的BigQuery表

时间:2022-07-13 15:21:00

My table name format: tableName_YYYYMMDD. I am trying to write to this table from a streaming dataflow pipeline. The reason I want to write to a new table everyday is because I want to expire tables after 30 days and only want to keep a window of 30 tables at a time.

我的表名格式为:tableName_YYYYMMDD。我试图从流数据流管道写入此表。我想每天写一个新表的原因是因为我希望在30天后使表失效,并且只希望一次保留30个表的窗口。

Current code:

当前代码:

tableRow.apply(BigQueryIO.Write
                .named("WriteBQTable")
                .to(String.format("%1$s:%2$s.%3$s",projectId, bqDataSet, bqTable))
                .withSchema(schema)
                .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

I do realize above code will not roll over to new day and start writing there.

我确实意识到上面的代码不会翻到新的一天并开始写在那里。

As this answer suggests I can partition table and expire partitions, but writing to a partitioned tables seems like is not supported from a streaming pipeline.

由于这个答案表明我可以对表进行分区并使分区过期,但是从流管道中不支持写入分区表。

Any ideas how can I work around this?

任何想法我如何解决这个问题?

2 个解决方案

#1


0  

In the Dataflow 2.0 SDK there is a way to specify DynamicDestinations

在Dataflow 2.0 SDK中,有一种指定DynamicDestinations的方法

See to(DynamicDestinations<T,?> dynamicDestinations) in BigQuery Dynamic Destionations.

请参阅BigQuery Dynamic Destionations中的(DynamicDestinations dynamicDestinations)。 ,?>

Also, see the TableDestination version, which should be simpler and less code. Though unfortunately there is no example in the javadoc.

另外,请参阅TableDestination版本,该版本应该更简单,代码更少。虽然不幸的是javadoc中没有例子。

to(SerializableFunction<ValueInSingleWindow<T>,TableDestination> tableFunction)

https://beam.apache.org/documentation/sdks/javadoc/2.0.0/

https://beam.apache.org/documentation/sdks/javadoc/2.0.0/

#2


0  

This is an open source pipeline you can use to connect pub/sub to big query. I think google has also added support for streaming pipelines to date partitioned tables. Details here.

这是一个开源管道,可用于将pub / sub连接到大查询。我认为谷歌还增加了对流媒体管道到日期分区表的支持。细节在这里。

#1


0  

In the Dataflow 2.0 SDK there is a way to specify DynamicDestinations

在Dataflow 2.0 SDK中,有一种指定DynamicDestinations的方法

See to(DynamicDestinations<T,?> dynamicDestinations) in BigQuery Dynamic Destionations.

请参阅BigQuery Dynamic Destionations中的(DynamicDestinations dynamicDestinations)。 ,?>

Also, see the TableDestination version, which should be simpler and less code. Though unfortunately there is no example in the javadoc.

另外,请参阅TableDestination版本,该版本应该更简单,代码更少。虽然不幸的是javadoc中没有例子。

to(SerializableFunction<ValueInSingleWindow<T>,TableDestination> tableFunction)

https://beam.apache.org/documentation/sdks/javadoc/2.0.0/

https://beam.apache.org/documentation/sdks/javadoc/2.0.0/

#2


0  

This is an open source pipeline you can use to connect pub/sub to big query. I think google has also added support for streaming pipelines to date partitioned tables. Details here.

这是一个开源管道,可用于将pub / sub连接到大查询。我认为谷歌还增加了对流媒体管道到日期分区表的支持。细节在这里。