I want to automate the creation of "directory variables" from a set of URIS until the maximum number of directories is reached.
我想从一组URIS自动创建“目录变量”,直到达到最大目录数。
For example, if I had 4 directories from URI: "/A/B/C/17628.html"
, I would want to create the following variables:
例如,如果我有来自URI的4个目录:“/ A / B / C / 17628.html”,我想创建以下变量:
path_1 = "A"
- path_1 =“A”
path_2 = "B"
- path_2 =“B”
path_3 = "C"
- path_3 =“C”
path_4 = "17628.html"
- path_4 =“17628.html”
But if I had : "/A/D/E/F/178.html"
:
但如果我有:“/ A / D / E / F / 178.html”:
path_1 = "A"
- path_1 =“A”
path_2 = "D"
- path_2 =“D”
path_3 = "E"
- path_3 =“E”
path_4 = "F"
- path_4 =“F”
path_5 = "178.html"
- path_5 =“178.html”
It's probable to have a URI with many directories (up to 20). To avoid creating all these variables by hand, I want to define them using the for loop (or another option). It's possible to use this loop in BigQuery?
可能有一个包含许多目录的URI(最多20个)。为了避免手动创建所有这些变量,我想使用for循环(或其他选项)来定义它们。可以在BigQuery中使用这个循环吗?
2 个解决方案
#1
1
Consider below version
考虑以下版本
#standardSQL
WITH yourTable AS (
SELECT '/A/B/C/17628.html' AS uri UNION ALL
SELECT '/A/D/E/F/178.html' AS uri
)
SELECT uri, CONCAT('path_', CAST(1 + OFFSET AS STRING)) AS pos, path
FROM yourTable, UNNEST(SPLIT(REGEXP_EXTRACT(uri, r'/(.*)/'), '/')) path WITH OFFSET
ORDER BY uri, OFFSET
result is :
结果是:
uri pos path
/A/B/C/17628.html path_1 A
/A/B/C/17628.html path_2 B
/A/B/C/17628.html path_3 C
/A/D/E/F/178.html path_1 A
/A/D/E/F/178.html path_2 D
/A/D/E/F/178.html path_3 E
/A/D/E/F/178.html path_4 F
In most practical cases, having such a flattened schema versus pivoted - is much more easier to deal (query) with
在大多数实际情况中,拥有这样一个扁平的架构而不是透视 - 更容易处理(查询)
In case if you still want to pivot above result - see one of my many answers on that topic - Transpose rows into columns in BigQuery (Pivot implementation)
如果您仍希望在结果上方进行调整 - 请参阅我对该主题的许多答案之一 - 将行转换为BigQuery中的列(Pivot实现)
#2
0
You need to specify columns in the select list explicitly; it isn't possible for columns themselves to be dynamic. If you are okay with getting the results back as an array, you could do something like this:
您需要明确指定选择列表中的列;列本身不可能是动态的。如果您可以将结果作为数组返回,则可以执行以下操作:
#standardSQL
WITH T AS (
SELECT '/A/B/C/17628.html' AS path UNION ALL
SELECT '/A/D/E/F/178.html' AS path
)
SELECT
ARRAY(SELECT IFNULL(subpaths[SAFE_OFFSET(x)], '')
FROM UNNEST(GENERATE_ARRAY(0, 19)) AS x) AS paths
FROM (
SELECT SPLIT(path, '/') AS subpaths
FROM T
);
If you wanted explicit path_1
, path_2
, etc. columns you could do:
如果您想要显式的path_1,path_2等列,您可以执行以下操作:
#standardSQL
WITH T AS (
SELECT '/A/B/C/17628.html' AS path UNION ALL
SELECT '/A/D/E/F/178.html' AS path
)
SELECT
subpaths[SAFE_OFFSET(1)] AS path_1,
subpaths[SAFE_OFFSET(2)] AS path_2,
subpaths[SAFE_OFFSET(3)] AS path_3,
subpaths[SAFE_OFFSET(4)] AS path_4,
subpaths[SAFE_OFFSET(5)] AS path_5,
subpaths[SAFE_OFFSET(6)] AS path_6,
subpaths[SAFE_OFFSET(7)] AS path_7,
subpaths[SAFE_OFFSET(8)] AS path_8,
subpaths[SAFE_OFFSET(9)] AS path_9,
subpaths[SAFE_OFFSET(10)] AS path_10,
subpaths[SAFE_OFFSET(11)] AS path_11,
subpaths[SAFE_OFFSET(12)] AS path_12,
subpaths[SAFE_OFFSET(13)] AS path_13,
subpaths[SAFE_OFFSET(14)] AS path_14,
subpaths[SAFE_OFFSET(15)] AS path_15,
subpaths[SAFE_OFFSET(16)] AS path_16,
subpaths[SAFE_OFFSET(17)] AS path_17,
subpaths[SAFE_OFFSET(18)] AS path_18,
subpaths[SAFE_OFFSET(19)] AS path_19,
subpaths[SAFE_OFFSET(20)] AS path_20
FROM (
SELECT SPLIT(path, '/') AS subpaths
FROM T
);
Since I didn't want to write that list by hand, I ran a simple one-liner in my terminal:
由于我不想手工编写该列表,我在终端中运行了一个简单的单行程序:
for i in `seq 1 20`; do echo "subpaths[SAFE_OFFSET($i)] AS path_$i,"; done
#1
1
Consider below version
考虑以下版本
#standardSQL
WITH yourTable AS (
SELECT '/A/B/C/17628.html' AS uri UNION ALL
SELECT '/A/D/E/F/178.html' AS uri
)
SELECT uri, CONCAT('path_', CAST(1 + OFFSET AS STRING)) AS pos, path
FROM yourTable, UNNEST(SPLIT(REGEXP_EXTRACT(uri, r'/(.*)/'), '/')) path WITH OFFSET
ORDER BY uri, OFFSET
result is :
结果是:
uri pos path
/A/B/C/17628.html path_1 A
/A/B/C/17628.html path_2 B
/A/B/C/17628.html path_3 C
/A/D/E/F/178.html path_1 A
/A/D/E/F/178.html path_2 D
/A/D/E/F/178.html path_3 E
/A/D/E/F/178.html path_4 F
In most practical cases, having such a flattened schema versus pivoted - is much more easier to deal (query) with
在大多数实际情况中,拥有这样一个扁平的架构而不是透视 - 更容易处理(查询)
In case if you still want to pivot above result - see one of my many answers on that topic - Transpose rows into columns in BigQuery (Pivot implementation)
如果您仍希望在结果上方进行调整 - 请参阅我对该主题的许多答案之一 - 将行转换为BigQuery中的列(Pivot实现)
#2
0
You need to specify columns in the select list explicitly; it isn't possible for columns themselves to be dynamic. If you are okay with getting the results back as an array, you could do something like this:
您需要明确指定选择列表中的列;列本身不可能是动态的。如果您可以将结果作为数组返回,则可以执行以下操作:
#standardSQL
WITH T AS (
SELECT '/A/B/C/17628.html' AS path UNION ALL
SELECT '/A/D/E/F/178.html' AS path
)
SELECT
ARRAY(SELECT IFNULL(subpaths[SAFE_OFFSET(x)], '')
FROM UNNEST(GENERATE_ARRAY(0, 19)) AS x) AS paths
FROM (
SELECT SPLIT(path, '/') AS subpaths
FROM T
);
If you wanted explicit path_1
, path_2
, etc. columns you could do:
如果您想要显式的path_1,path_2等列,您可以执行以下操作:
#standardSQL
WITH T AS (
SELECT '/A/B/C/17628.html' AS path UNION ALL
SELECT '/A/D/E/F/178.html' AS path
)
SELECT
subpaths[SAFE_OFFSET(1)] AS path_1,
subpaths[SAFE_OFFSET(2)] AS path_2,
subpaths[SAFE_OFFSET(3)] AS path_3,
subpaths[SAFE_OFFSET(4)] AS path_4,
subpaths[SAFE_OFFSET(5)] AS path_5,
subpaths[SAFE_OFFSET(6)] AS path_6,
subpaths[SAFE_OFFSET(7)] AS path_7,
subpaths[SAFE_OFFSET(8)] AS path_8,
subpaths[SAFE_OFFSET(9)] AS path_9,
subpaths[SAFE_OFFSET(10)] AS path_10,
subpaths[SAFE_OFFSET(11)] AS path_11,
subpaths[SAFE_OFFSET(12)] AS path_12,
subpaths[SAFE_OFFSET(13)] AS path_13,
subpaths[SAFE_OFFSET(14)] AS path_14,
subpaths[SAFE_OFFSET(15)] AS path_15,
subpaths[SAFE_OFFSET(16)] AS path_16,
subpaths[SAFE_OFFSET(17)] AS path_17,
subpaths[SAFE_OFFSET(18)] AS path_18,
subpaths[SAFE_OFFSET(19)] AS path_19,
subpaths[SAFE_OFFSET(20)] AS path_20
FROM (
SELECT SPLIT(path, '/') AS subpaths
FROM T
);
Since I didn't want to write that list by hand, I ran a simple one-liner in my terminal:
由于我不想手工编写该列表,我在终端中运行了一个简单的单行程序:
for i in `seq 1 20`; do echo "subpaths[SAFE_OFFSET($i)] AS path_$i,"; done