I've that kind of string
我有那种弦
Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z
测试1|纽约| X, 2|芝加哥|Y, 3|哈里斯堡,pa| Z
My required result it's
我需要的结果
Column1 Column 2 Column3
Test 1 new york X
Test 2 chicago Y
Test 3 harrisburg,pa Z
But running this query
但运行该查询
SELECT
split_part(stat.st, '|', 1) Column1,
split_part(stat.st, '|', 2) Column2,
split_part(stat.st, '|', 3) Column3
FROM
(
SELECT
UNNEST (
string_to_array('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z',',')
)
AS st
) stat;
Result is
结果是
Column1 Column 2 Column3
Test 1 new york X
Test 2 chicago Y
Test 3 harrisburg
pa Z
Column3 could be everything (except | ). Possible pattern to match it's .This could be repeated N times. STRING could be everything except | char.
Column3可以是一切(除了|)。可能的模式匹配它,这可以重复N次。字符串可以是除| char之外的所有东西。
How could I use regexp_split_to_array()
to have my desire result set?
如何使用regexp_split_to_array()来设置我的愿望结果集?
1 个解决方案
#1
3
There is barely enough information to make this work. But this does the job:
几乎没有足够的信息使这个工作。但这确实起了作用:
SELECT * FROM crosstab3(
$$
SELECT (rn/3)::text AS x, (rn%3)::text, item
FROM (
SELECT row_number() OVER () - 1 AS rn, trim(item) AS item
FROM (
SELECT CASE WHEN rn%2 = 1 THEN regexp_split_to_table(item, ',')
ELSE item END AS item
FROM (
SELECT row_number() OVER () AS rn, *
FROM regexp_split_to_table('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z', '\|') AS item
) x
) y
) z
$$)
Returns:
返回:
row_name | category_1 | category_2 | category_3
----------+------------+----------------+------------
0 | Test 1 | new york | X
1 | Test 2 | chicago | Y
2 | Test 3 | harrisburg, pa | Z
After splitting the string at |
, I build on the criterion that only lines with uneven row number shall be split at ,
.
I trim()
the results and add derivatives of another row_number()
to arrive at this intermediary state before doing the cross tabulation:
在|上分割字符串后,我建立的条件是只有行数不均匀的行在,。我修剪()结果并添加另一个row_number()的导数,使其在做交叉表之前达到这个中间状态:
x | text | item
---+------+----------------
0 | 0 | Test 1
0 | 1 | new york
0 | 2 | X
1 | 0 | Test 2
1 | 1 | chicago
1 | 2 | Y
2 | 0 | Test 3
2 | 1 | harrisburg, pa
2 | 2 | Z
Finally, I apply the crosstab3()
function from the tablefunc
module. To install it, if you haven't already:
最后,我从tablefunc模块中应用了crosstab3()函数。要安装它,如果你还没有:
CREATE EXTENSION tablefunc;
Pre-process with regexp_replace()
Here is an alternative that may be easier to comprehend. Not sure which is faster. Complex regular expressions tend to be expensive:
这里有一个更容易理解的选择。不知道哪个更快。复杂的正则表达式往往很昂贵:
SELECT trim(split_part(a,'|', 1)) AS column1
,trim(split_part(a,'|', 2)) AS column2
,trim(split_part(a,'|', 3)) AS column3
FROM (
SELECT unnest(
string_to_array(
regexp_replace('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z'
,'([^|]*\|[^|]*\|[^,]*),', '\1~^~', 'g'), '~^~')) AS a
) sub
This one replaces commas (,
) only after two pipes (|
), before proceeding.
Now using *
instead of +
to allow for empty strings between the pipes.
在继续之前,这一个只在两个管道(|)之后替换逗号。现在使用*而不是+来允许管道之间的空字符串。
#1
3
There is barely enough information to make this work. But this does the job:
几乎没有足够的信息使这个工作。但这确实起了作用:
SELECT * FROM crosstab3(
$$
SELECT (rn/3)::text AS x, (rn%3)::text, item
FROM (
SELECT row_number() OVER () - 1 AS rn, trim(item) AS item
FROM (
SELECT CASE WHEN rn%2 = 1 THEN regexp_split_to_table(item, ',')
ELSE item END AS item
FROM (
SELECT row_number() OVER () AS rn, *
FROM regexp_split_to_table('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z', '\|') AS item
) x
) y
) z
$$)
Returns:
返回:
row_name | category_1 | category_2 | category_3
----------+------------+----------------+------------
0 | Test 1 | new york | X
1 | Test 2 | chicago | Y
2 | Test 3 | harrisburg, pa | Z
After splitting the string at |
, I build on the criterion that only lines with uneven row number shall be split at ,
.
I trim()
the results and add derivatives of another row_number()
to arrive at this intermediary state before doing the cross tabulation:
在|上分割字符串后,我建立的条件是只有行数不均匀的行在,。我修剪()结果并添加另一个row_number()的导数,使其在做交叉表之前达到这个中间状态:
x | text | item
---+------+----------------
0 | 0 | Test 1
0 | 1 | new york
0 | 2 | X
1 | 0 | Test 2
1 | 1 | chicago
1 | 2 | Y
2 | 0 | Test 3
2 | 1 | harrisburg, pa
2 | 2 | Z
Finally, I apply the crosstab3()
function from the tablefunc
module. To install it, if you haven't already:
最后,我从tablefunc模块中应用了crosstab3()函数。要安装它,如果你还没有:
CREATE EXTENSION tablefunc;
Pre-process with regexp_replace()
Here is an alternative that may be easier to comprehend. Not sure which is faster. Complex regular expressions tend to be expensive:
这里有一个更容易理解的选择。不知道哪个更快。复杂的正则表达式往往很昂贵:
SELECT trim(split_part(a,'|', 1)) AS column1
,trim(split_part(a,'|', 2)) AS column2
,trim(split_part(a,'|', 3)) AS column3
FROM (
SELECT unnest(
string_to_array(
regexp_replace('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z'
,'([^|]*\|[^|]*\|[^,]*),', '\1~^~', 'g'), '~^~')) AS a
) sub
This one replaces commas (,
) only after two pipes (|
), before proceeding.
Now using *
instead of +
to allow for empty strings between the pipes.
在继续之前,这一个只在两个管道(|)之后替换逗号。现在使用*而不是+来允许管道之间的空字符串。