I'm using Postgres and would like to remove everything after the last occurrence of '- ' or '|'. this is the query I came up with:
我正在使用Postgres,并希望在最后出现'- '或'|'之后删除所有内容。这就是我提出的问题:
select regexp_replace( title, E'(- |\\|).+$', '') as title from articles
The problem is that a string like:
问题是这样的字符串:
'Trump tweets in China - how, and why does it matter? - BBC News'
is truncated too early:
截断过早:
'Trump tweets in China'
How can I make it remove the suffix only after the last occurrence of '- '?
我怎样才能使它只在最后出现“-”之后才移除后缀?
Thanks!
谢谢!
2 个解决方案
#1
1
You may match either a space and a hyphen or a pipe symobl, capture them, and then just match the rest of the string that is not equal to the captured text:
您可以匹配空格和连字符或管道symobl,捕获它们,然后只匹配不等于捕获文本的其余字符串:
(- |\|)(?:(?!\1).)+$
Replace with \1
. Escape as necessary (you need to use double escapes in E'...'
strings).
替换为\ 1。按需转义(您需要在E'…的字符串)。
Details
细节
-
(- |\|)
- either a-
or a|
symbol - (- |\|) - a -或|符号
-
(?:(?!\1).)+
- any char (.
), 1 or more occurrences (+
), that does not start a-
sequence or not equal to|
- depending on what was captured into Group 1. - (?:(?!\1) -任何char(.)、1个或更多的事件(+),它不会启动一个-序列或不等于| -这取决于在组1中捕获的内容。
-
$
- end of string. - $ -字符串的末端。
See the regex demo.
查看演示正则表达式。
#2
1
You could try this:
你可以试试这个:
select regexp_replace ('Trump tweets in China - how, and why does it matter? - BBC News',
'[|-][^|-]*$', '')
It's basically saying:
这基本上是说:
- a
|
or a-
- |或-
- followed by any number of characters that are neither a
|
nor a-
at the end of the string - 后面是字符串末尾的任意数量的既不是|也不是-的字符
Result:
结果:
Trump tweets in China - how, and why does it matter?
#1
1
You may match either a space and a hyphen or a pipe symobl, capture them, and then just match the rest of the string that is not equal to the captured text:
您可以匹配空格和连字符或管道symobl,捕获它们,然后只匹配不等于捕获文本的其余字符串:
(- |\|)(?:(?!\1).)+$
Replace with \1
. Escape as necessary (you need to use double escapes in E'...'
strings).
替换为\ 1。按需转义(您需要在E'…的字符串)。
Details
细节
-
(- |\|)
- either a-
or a|
symbol - (- |\|) - a -或|符号
-
(?:(?!\1).)+
- any char (.
), 1 or more occurrences (+
), that does not start a-
sequence or not equal to|
- depending on what was captured into Group 1. - (?:(?!\1) -任何char(.)、1个或更多的事件(+),它不会启动一个-序列或不等于| -这取决于在组1中捕获的内容。
-
$
- end of string. - $ -字符串的末端。
See the regex demo.
查看演示正则表达式。
#2
1
You could try this:
你可以试试这个:
select regexp_replace ('Trump tweets in China - how, and why does it matter? - BBC News',
'[|-][^|-]*$', '')
It's basically saying:
这基本上是说:
- a
|
or a-
- |或-
- followed by any number of characters that are neither a
|
nor a-
at the end of the string - 后面是字符串末尾的任意数量的既不是|也不是-的字符
Result:
结果:
Trump tweets in China - how, and why does it matter?