I'm attempting to query a table which contains a character varying[]
column of years, and return those years as a string of comma-delimited year ranges. The year ranges would be determined by sequential years present within the array, and years/year ranges which are not sequential should be separated be commas.
我正在尝试查询一个包含字符变化的[]年列的表,并将这些年作为逗号分隔的年范围的字符串返回。年份范围将根据数组中存在的连续年份确定,而非连续年份的年份/年份范围应该用逗号分隔。
The reason the data-type is character varying[]
rather than integer[]
is because a few of the values contain ALL
instead of a list of years. We can omit these results.
数据类型之所以是字符变化的[]而不是整数[],是因为有些值包含所有值,而不是年份列表。我们可以忽略这些结果。
So far I've had little luck approaching the problem as I'm not really even sure where to start.
到目前为止,我在解决这个问题上运气不太好,因为我甚至不确定从哪里开始。
Would someone be able to give me some guidance or provide a useful examples of how one might solve such as challenge?
有人能给我一些指导或者提供一个有用的例子来说明如何解决问题,比如挑战吗?
years_table
Example
years_table例子
+=========+============================+
| id | years |
| integer | character varying[] |
+=========+============================+
| 1 | {ALL} |
| 2 | {1999,2000,2010,2011,2012} |
| 3 | {1990,1991,2007} |
+---------+----------------------------+
Output Goal:
输出目标:
Example SQL Query:
示例SQL查询:
SELECT id, [year concat logic] AS year_ranges
FROM years_table WHERE 'ALL' NOT IN years
Result:
结果:
+====+======================+
| id | year_ranges |
+====+======================+
| 2 | 1999-2000, 2010-2012 |
| 3 | 1990-1991, 2007 |
+----+----------------------+
2 个解决方案
#1
4
SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
SELECT id, CASE WHEN count(*) > 1
THEN min(year)::text || '-' || max(year)::text
ELSE min(year)::text
END AS year_range
FROM (
SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
FROM (
SELECT id, unnest(years) AS year
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, min(year)
) sub3
GROUP BY id
ORDER BY id
Produces exactly the desired result.
产生理想的结果。
If you deal with an an array of varchar (varchar[]
, just cast it to int[]
, before you proceed. It seems to be in perfectly legal form for that:
如果您处理一个varchar (varchar[])数组,请在继续之前将其转换为int[]。这似乎是完全合法的形式:
years::int[]
Replace the inner sub-select with the name of your source table in productive code.
用生产代码中的源表的名称替换内部子选择。
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
->
- >
FROM tbl
Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp
number. Else, a new range starts.
由于我们处理的是一个自然上升的数字(年份),我们可以使用快捷方式来形成连续的年份(形成一个范围)。我从行号(按年排序)中减去年份本身。连续数年,行数和年增加1,产生相同的grp数。否则,一个新的范围就开始了。
More on window functions in the manual here and here.
更多关于窗口功能的手册在这里和这里。
A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values
在这种情况下,plpgsql函数可能会更快。你需要测试。这些相关答案中的示例:连续重复/重复行_number()的有序计数显示了意想不到的值
#2
2
SQL Fiddle Not the output format you asked for but I think it can be more useful:
SQL不是你要求的输出格式,但我认为它可以更有用:
select id, g, min(year), max(year)
from (
select id, year,
count(not g or null) over(partition by id order by year) as g
from (
select id, year,
lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
from (
select id, unnest(years)::integer as year
from years
where years != '{ALL}'
) s
) s
) s
group by 1, 2
#1
4
SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
SELECT id, CASE WHEN count(*) > 1
THEN min(year)::text || '-' || max(year)::text
ELSE min(year)::text
END AS year_range
FROM (
SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
FROM (
SELECT id, unnest(years) AS year
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, min(year)
) sub3
GROUP BY id
ORDER BY id
Produces exactly the desired result.
产生理想的结果。
If you deal with an an array of varchar (varchar[]
, just cast it to int[]
, before you proceed. It seems to be in perfectly legal form for that:
如果您处理一个varchar (varchar[])数组,请在继续之前将其转换为int[]。这似乎是完全合法的形式:
years::int[]
Replace the inner sub-select with the name of your source table in productive code.
用生产代码中的源表的名称替换内部子选择。
FROM (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
,(3, '{1990,1991,2007}')
) AS tbl(id, years)
->
- >
FROM tbl
Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp
number. Else, a new range starts.
由于我们处理的是一个自然上升的数字(年份),我们可以使用快捷方式来形成连续的年份(形成一个范围)。我从行号(按年排序)中减去年份本身。连续数年,行数和年增加1,产生相同的grp数。否则,一个新的范围就开始了。
More on window functions in the manual here and here.
更多关于窗口功能的手册在这里和这里。
A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values
在这种情况下,plpgsql函数可能会更快。你需要测试。这些相关答案中的示例:连续重复/重复行_number()的有序计数显示了意想不到的值
#2
2
SQL Fiddle Not the output format you asked for but I think it can be more useful:
SQL不是你要求的输出格式,但我认为它可以更有用:
select id, g, min(year), max(year)
from (
select id, year,
count(not g or null) over(partition by id order by year) as g
from (
select id, year,
lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
from (
select id, unnest(years)::integer as year
from years
where years != '{ALL}'
) s
) s
) s
group by 1, 2