I have data like "11223311" and I want all the multiple occurrence to be replaced by single occurence i.e. the above should turn into '123'. I am working in SAP HANA.
我有像“11223311”这样的数据,我希望所有的多次出现都被单次出现所取代,即上面应该变成'123'。我在SAP HANA工作。
But by using below logic I am getting '1231' from '11223311'.
但是通过使用以下逻辑,我从'11223311'得到'1231'。
SELECT REPLACE_REGEXPR('(.)\1+' IN '11223331' WITH '\1' OCCURRENCE ALL) FROM DUMMY;
2 个解决方案
#1
3
Your regular expression only replaces multiple consecutive occurrences of characters; that's what the \1+
directly after it's matching (.)
is doing.
您的正则表达式仅替换多个连续出现的字符;这就是匹配(。)之后直接发生的\ 1+。
You can use look-ahead to remove all characters that also occur somewhere after that match. Note that this keeps the last occurrence, not the first:
您可以使用前瞻来删除在该匹配后也出现的所有字符。请注意,这会保留最后一次出现,而不是第一次出现:
SELECT REPLACE_REGEXPR('(.)(?=.*\1)' IN '11223331' WITH '' OCCURRENCE ALL) FROM DUMMY
This returns: 231
返回:231
If you want to keep the first occurrence, I don't see a possibility just with one regex (I could be wrong though). Using a look-behind in the same way does not work because it would need to be variable-length, which is not supported in HANA and most other implementations. Often \K is recommended as alternative, but something like (.).*\K\1
wouldn't work with replace all, because all characters before \K are still consumed in replace. If you could run the same regex in a loop, it could work but then why not use a non-regex loop (like a user-defined HANA function) in the first place.
如果你想保留第一次出现,我看不到只有一个正则表达式的可能性(虽然我可能是错的)。以相同的方式使用后视不起作用,因为它需要是可变长度的,这在HANA和大多数其他实现中是不受支持的。通常\建议使用K作为替代,但像(。)。* \ K \ 1这样的东西不适用于全部替换,因为\ K之前的所有字符仍然在替换中消耗。如果你可以在循环中运行相同的正则表达式,它可以工作但是为什么不首先使用非正则表达式循环(如用户定义的HANA函数)。
#2
0
Please try this
请试试这个
SELECT REPLACE_REGEXPR(concat(concat('[^','11223331'),']') IN '0123456789' WITH '' OCCURRENCE ALL)
FROM DUMMY;
#1
3
Your regular expression only replaces multiple consecutive occurrences of characters; that's what the \1+
directly after it's matching (.)
is doing.
您的正则表达式仅替换多个连续出现的字符;这就是匹配(。)之后直接发生的\ 1+。
You can use look-ahead to remove all characters that also occur somewhere after that match. Note that this keeps the last occurrence, not the first:
您可以使用前瞻来删除在该匹配后也出现的所有字符。请注意,这会保留最后一次出现,而不是第一次出现:
SELECT REPLACE_REGEXPR('(.)(?=.*\1)' IN '11223331' WITH '' OCCURRENCE ALL) FROM DUMMY
This returns: 231
返回:231
If you want to keep the first occurrence, I don't see a possibility just with one regex (I could be wrong though). Using a look-behind in the same way does not work because it would need to be variable-length, which is not supported in HANA and most other implementations. Often \K is recommended as alternative, but something like (.).*\K\1
wouldn't work with replace all, because all characters before \K are still consumed in replace. If you could run the same regex in a loop, it could work but then why not use a non-regex loop (like a user-defined HANA function) in the first place.
如果你想保留第一次出现,我看不到只有一个正则表达式的可能性(虽然我可能是错的)。以相同的方式使用后视不起作用,因为它需要是可变长度的,这在HANA和大多数其他实现中是不受支持的。通常\建议使用K作为替代,但像(。)。* \ K \ 1这样的东西不适用于全部替换,因为\ K之前的所有字符仍然在替换中消耗。如果你可以在循环中运行相同的正则表达式,它可以工作但是为什么不首先使用非正则表达式循环(如用户定义的HANA函数)。
#2
0
Please try this
请试试这个
SELECT REPLACE_REGEXPR(concat(concat('[^','11223331'),']') IN '0123456789' WITH '' OCCURRENCE ALL)
FROM DUMMY;