如何使用Dataflow Python SDK阅读BigQuery嵌套表

How can I read nested structures using Apache Beam Python SDK?

如何使用Apache Beam Python SDK读取嵌套结构？

lines = p | io.Read(io.BigQuerySource('project:test.beam_in'))

result in

导致

"reason": "invalidQuery",
"message": "Cannot output multiple independently repeated fields at the same time. Found classification_item_distribution and category_cat_name"

Is it possible to read nested structures?

是否可以读取嵌套结构？

2 个解决方案

#1

This is a property of BigQuery. The two ways to execute such a query are to disable result flattening (by BigQuery) or to explicitly flatten fields in your query.

这是BigQuery的一个属性。执行此类查询的两种方法是禁用结果展平（通过BigQuery）或显式展平查询中的字段。

With the current Python SDK only the latter is available - see "Flattening Google Analytics data (with repeated fields) not working anymore" for a guide on where and how to invoke the FLATTEN function.

使用当前的Python SDK，只有后者可用 - 请参阅“将Google Analytics数据展平（重复字段）不再起作用”，以获取有关调用FLATTEN函数的位置和方法的指南。

The feature to disable flattening is filed as BEAM-877 if you care to subscribe to updates or discuss.

如果您想订阅更新或讨论，则禁用展平的功能将作为BEAM-877提交。

#2

You can now read nested results directly in Beam Python by adding flatten_results=False when creating your source:

您现在可以通过在创建源时添加flatten_results = False直接在Beam Python中读取嵌套结果：

lines = p | io.Read(io.BigQuerySource('project:test.beam_in', flatten_results=False))

See source here.

请参阅此处的来源

#1