在不使用子查询的情况下,从同一个表中的regexp匹配更新

时间:2021-07-11 11:11:24

I want to fill two columns from the results of a regular expression matching on a column of the same table.

我想从匹配在同一个表的列上的正则表达式的结果填充两列。

Extracting the matches in an array is easy enough:

提取数组中的匹配很容易:

select regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') matches from room;

(note that only some of the rows match, not all of them)

(请注意,只有部分行匹配,而不是全部匹配)

But in order to do the update I didn't find anything simpler than

但是为了进行更新,我找不到比它更简单的东西

1) repeating the regex which would be ridiculous:

1)重复这个荒谬的正则表达式:

update room r set
    link=(regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'))[1],
    description=(regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'))[2]
where description ~ '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$';

2) a query with a subquery and an id join, which looks over complicated and probably not the most efficient:

2)带有子查询和id连接的查询,它看起来很复杂,可能不是最有效的:

update room r set link=matches[1], description=matches[2] from (
    select id, regexp_matches(description, '(?i)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') matches from room
) s where matches is not null and r.id=s.id;

What's the proper solution here ? I suspect one of the magical array functions of postgresql would do the trick, or another regexp related function, or maybe something even simpler.

这里有什么合适的解决方案?我怀疑postgresql的一个神奇的数组函数会做这个技巧,或者另一个与regexp相关的函数,或者更简单的东西。

1 个解决方案

#1


2  

From 9.5, you can use the following syntax:

从9.5开始,您可以使用以下语法:

with p(pattern) as (
  select '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'
)
update room
set    (link, description) = (select m[1], m[2]
                              from   regexp_matches(description, pattern) m)
from   p
where  description ~ pattern;

This way regexp_matches() executed only once, but this will execute your regex twice. If you want to avoid that you'll need to use a join anyway. Or, you could do:

这样regexp_matches()只执行一次,但这将执行你的正则表达式两次。如果你想避免这种情况,你还是需要使用连接。或者,你可以这样做:

update room
set    (link, description) = (
  select coalesce(m[1], l), coalesce(m[2], d)
  from   (select link l, description d) s,
         regexp_matches(d, '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') m
);

But this will "touch" every row no matter what. It will just don't modify the values of link and description when there is no match.

但无论如何,这将“触动”每一行。当没有匹配时,它不会修改链接和描述的值。

#1


2  

From 9.5, you can use the following syntax:

从9.5开始,您可以使用以下语法:

with p(pattern) as (
  select '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$'
)
update room
set    (link, description) = (select m[1], m[2]
                              from   regexp_matches(description, pattern) m)
from   p
where  description ~ pattern;

This way regexp_matches() executed only once, but this will execute your regex twice. If you want to avoid that you'll need to use a join anyway. Or, you could do:

这样regexp_matches()只执行一次,但这将执行你的正则表达式两次。如果你想避免这种情况,你还是需要使用连接。或者,你可以这样做:

update room
set    (link, description) = (
  select coalesce(m[1], l), coalesce(m[2], d)
  from   (select link l, description d) s,
         regexp_matches(d, '(?in)^(https?://\S{4,220}\.(?:jpe?g|png))\s(.*)$') m
);

But this will "touch" every row no matter what. It will just don't modify the values of link and description when there is no match.

但无论如何,这将“触动”每一行。当没有匹配时,它不会修改链接和描述的值。