从Dataflow流式传输时从BigQuery中删除数据

时间:2021-08-17 15:26:14

Is it possible to delete data from a BigQuery table while loading data into it from an Apache Beam pipeline.

是否可以在从Apache Beam管道将数据加载到BigQuery表中时从BigQuery表中删除数据。

Our use case is such that we need to delete 3 days prior data from the table on the basis of a timestamp field (time when Dataflow pulls message from Pubsub topic).

我们的用例是这样的,我们需要根据时间戳字段(Dataflow从Pubsub主题中提取消息的时间)从表中删除3天的先前数据。

Is it recommended to do something like this? If yes, is there any way to achieve this?

是否建议做这样的事情?如果是的话,有没有办法实现这个目标?

Thank You.

谢谢。

1 个解决方案

#1


2  

I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually

我认为这样设置的最佳方式是将表设置为分区(基于摄取时间)表https://cloud.google.com/bigquery/docs/partitioned-tables并且您可以手动删除旧分区

bq rm 'mydataset.mytable$20160301'

You can also set expiration time

您还可以设置到期时间

bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]

If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call

如果摄取时间对您不起作用,您可以查看https://cloud.google.com/bigquery/docs/creating-column-partitions - 但它处于测试阶段 - 可靠运行,但这是您的通话

#1


2  

I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually

我认为这样设置的最佳方式是将表设置为分区(基于摄取时间)表https://cloud.google.com/bigquery/docs/partitioned-tables并且您可以手动删除旧分区

bq rm 'mydataset.mytable$20160301'

You can also set expiration time

您还可以设置到期时间

bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]

If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call

如果摄取时间对您不起作用,您可以查看https://cloud.google.com/bigquery/docs/creating-column-partitions - 但它处于测试阶段 - 可靠运行,但这是您的通话