Google BQ:运行参数化查询,其中参数变量是BQ表目标

时间:2021-02-28 15:22:07

I am trying to run a SQL from the Linux Commandline for a BQ Table destination. This SQL script will be used for multiple dates, clients, and BQ Table destinations, so this would require using parameters in my BQ API-commandline calls (the flag --parameter). Now, I have followed this link to learn about parameterized queries: https://cloud.google.com/bigquery/docs/parameterized-queries , but it's limited in helping me with declaring a table name.

我正在尝试从Linux命令行运行SQL以获取BQ表目标。此SQL脚本将用于多个日期,客户端和BQ表目标,因此这需要在我的BQ API命令行调用中使用参数(标志 - 参数)。现在,我已按照此链接了解参数化查询:https://cloud.google.com/bigquery/docs/parameterized-queries,但它仅限于帮我声明表名。

My SQL script, called Advertiser_Date_Check.sql, is the following:

我的SQL脚本名为Advertiser_Date_Check.sql,如下所示:

#standardSQL
SELECT *
FROM (SELECT *
      FROM @variable_table
      WHERE CAST(_PARTITIONTIME AS DATE) = @variable_date) as final
WHERE final.Advertiser IN UNNEST(@variable_clients)

Where the parameter variables represent the following:

参数变量表示以下内容:

  • variable_table: The BQ Table destination that I want to call
  • variable_table:我想要调用的BQ表目标
  • variable_date: The Date that I want to pull from the BQ table
  • variable_date:我想从BQ表中提取的日期
  • variable_clients: An Array list of specific clients that I want to pull from the data (which is from the date I referenced)
  • variable_clients:我想从数据中提取的特定客户端的数组列表(从我引用的日期开始)

Now, my Commandline (LINUX) for the BQ data is the following

现在,我的BQ数据的命令行(LINUX)如下

TABLE_NAME=table_name_example
BQ_TABLE=$(echo '`project_id.dataset_id.'$TABLE_NAME'`')
TODAY=$(date +%F)

/bin/bq query --use_legacy_sql=false    \
       --parameter='variable_table::'$BQ_TABLE''  \
       --parameter=variable_date::"$TODAY"    \
       --parameter='variable_clients:ARRAY<STRING>:["Client_1","Client_2","Client_3"]'  \
       "`cat /path/to/script/Advertiser_Date_Check.sql`" 

The parameters of @variable_date and @variable_clients have worked just fine in the past when it was just them. However, since I desire to run this exact SQL command on various tables in a loop, I created a parameter called variable_table. Parameterized Queries have to be in Standard SQL format, so the table name convention needs to be in such format:

@variable_date和@variable_clients的参数在过去只是它们时运行得很好。但是,由于我希望在循环中的各个表上运行这个精确的SQL命令,因此我创建了一个名为variable_table的参数。参数化查询必须采用标准SQL格式,因此表名约定必须采用以下格式:

`project_id.dataset_id.table_name`

Whenever I try to run this on the Commandline, I usually get the following error:

每当我尝试在命令行上运行它时,我通常会收到以下错误:

Error in query string: Error processing job ... : Syntax error: Unexpected "@" at [4:12]

Which is referencing the parameter @variable_table, so it's having a hard time processing that this is referencing a table name. In past attempts, there even has been the error:

这是参考@variable_table参数,因此很难处理这是引用表名。在过去的尝试中,甚至出现了错误:

project_id.dataset_id.table_name: command not found

But this was mostly due to poor reference of table destination name. The first error is the most common occurrence.

但这主要是由于表目的地名称参考不当造成的。第一个错误是最常见的错误。

Overall, my questions regarding this matter are:

总的来说,我对此事的疑问是:

  1. How do I reference a BQ Table as a parameter in the Commandline for Parameterized Queries at the FROM Clause (such as what I try to do with @variable_table)? Is it even possible?
  2. 如何在FROM子句中引用BQ表作为参数化查询的命令行中的参数(例如我尝试使用@variable_table)?它甚至可能吗?
  3. Do you know of other methods to run a query on multiple BQ tables from the commandline besides by the way I am currently doing it?
  4. 你知道从命令行在多个BQ表上运行查询的其他方法,除了我目前的方式吗?

Hope this all makes sense and thank you for your assistance!

希望这一切都有意义,谢谢你的帮助!

1 个解决方案

#1


3  

From the documentation that you linked:

从您链接的文档:

Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.

参数不能用作标识符,列名,表名或查询的其他部分的替代。

I think what might work for you in this case, though, is performing the injection of the table name as a regular shell variable (instead of a query parameter). You'd want to make sure that you trust the contents of it, or that you are building the string yourself in order to avoid SQL injection. One approach is to have hardcoded constants for the table names and then choose which one to insert into the query text based on the user input.

我认为在这种情况下可能对你有用的是将表名注入为常规shell变量(而不是查询参数)。您需要确保信任它的内容,或者您​​自己构建字符串以避免SQL注入。一种方法是为表名提供硬编码常量,然后根据用户输入选择要插入查询文本中的哪一个。

#1


3  

From the documentation that you linked:

从您链接的文档:

Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.

参数不能用作标识符,列名,表名或查询的其他部分的替代。

I think what might work for you in this case, though, is performing the injection of the table name as a regular shell variable (instead of a query parameter). You'd want to make sure that you trust the contents of it, or that you are building the string yourself in order to avoid SQL injection. One approach is to have hardcoded constants for the table names and then choose which one to insert into the query text based on the user input.

我认为在这种情况下可能对你有用的是将表名注入为常规shell变量(而不是查询参数)。您需要确保信任它的内容,或者您​​自己构建字符串以避免SQL注入。一种方法是为表名提供硬编码常量,然后根据用户输入选择要插入查询文本中的哪一个。