在最后出现'- '或'|'后截断字符串

时间:2021-09-05 07:47:47

I'm using Postgres and would like to remove everything after the last occurrence of '- ' or '|'. this is the query I came up with:

我正在使用Postgres,并希望在最后出现'- '或'|'之后删除所有内容。这就是我提出的问题:

select regexp_replace( title, E'(- |\\|).+$', '') as title from articles

The problem is that a string like:

问题是这样的字符串:

'Trump tweets in China - how, and why does it matter? - BBC News'

is truncated too early:

截断过早:

'Trump tweets in China'

How can I make it remove the suffix only after the last occurrence of '- '?

我怎样才能使它只在最后出现“-”之后才移除后缀?

Thanks!

谢谢!

2 个解决方案

#1


1  

You may match either a space and a hyphen or a pipe symobl, capture them, and then just match the rest of the string that is not equal to the captured text:

您可以匹配空格和连字符或管道symobl,捕获它们,然后只匹配不等于捕获文本的其余字符串:

(- |\|)(?:(?!\1).)+$

Replace with \1. Escape as necessary (you need to use double escapes in E'...' strings).

替换为\ 1。按需转义(您需要在E'…的字符串)。

Details

细节

  • (- |\|) - either a - or a | symbol
  • (- |\|) - a -或|符号
  • (?:(?!\1).)+ - any char (.), 1 or more occurrences (+), that does not start a - sequence or not equal to | - depending on what was captured into Group 1.
  • (?:(?!\1) -任何char(.)、1个或更多的事件(+),它不会启动一个-序列或不等于| -这取决于在组1中捕获的内容。
  • $ - end of string.
  • $ -字符串的末端。

See the regex demo.

查看演示正则表达式。

#2


1  

You could try this:

你可以试试这个:

select regexp_replace ('Trump tweets in China - how, and why does it matter? - BBC News',
    '[|-][^|-]*$', '')

It's basically saying:

这基本上是说:

  • a | or a -
  • |或-
  • followed by any number of characters that are neither a | nor a - at the end of the string
  • 后面是字符串末尾的任意数量的既不是|也不是-的字符

Result:

结果:

Trump tweets in China - how, and why does it matter? 

#1


1  

You may match either a space and a hyphen or a pipe symobl, capture them, and then just match the rest of the string that is not equal to the captured text:

您可以匹配空格和连字符或管道symobl,捕获它们,然后只匹配不等于捕获文本的其余字符串:

(- |\|)(?:(?!\1).)+$

Replace with \1. Escape as necessary (you need to use double escapes in E'...' strings).

替换为\ 1。按需转义(您需要在E'…的字符串)。

Details

细节

  • (- |\|) - either a - or a | symbol
  • (- |\|) - a -或|符号
  • (?:(?!\1).)+ - any char (.), 1 or more occurrences (+), that does not start a - sequence or not equal to | - depending on what was captured into Group 1.
  • (?:(?!\1) -任何char(.)、1个或更多的事件(+),它不会启动一个-序列或不等于| -这取决于在组1中捕获的内容。
  • $ - end of string.
  • $ -字符串的末端。

See the regex demo.

查看演示正则表达式。

#2


1  

You could try this:

你可以试试这个:

select regexp_replace ('Trump tweets in China - how, and why does it matter? - BBC News',
    '[|-][^|-]*$', '')

It's basically saying:

这基本上是说:

  • a | or a -
  • |或-
  • followed by any number of characters that are neither a | nor a - at the end of the string
  • 后面是字符串末尾的任意数量的既不是|也不是-的字符

Result:

结果:

Trump tweets in China - how, and why does it matter?