REGEX从列表中选择第n个值,允许空值

时间:2022-02-22 15:33:39

I am using REGEXP_SUBSTR() to return the nth value from a comma-separated list. This works fine when all values are present, but fails if an item is null. Here is an example that works where all values are present and I am selecting the 2nd occurrence of 1 or more characters that are not a comma:

我使用REGEXP_SUBSTR()从逗号分隔列表中返回第n个值。这在所有值都存在时工作正常,但如果项为null则失败。这是一个适用于所有值存在的示例,我选择第二次出现的1个或多个不是逗号的字符:

SQL> select REGEXP_SUBSTR('1,2,3,4,5,6', '[^,]+', 1, 2) data
  2  from dual;

D
-
2

But when the second value is null, I am really getting the third item in the list, which of course really is the 2nd occurrence of 1 or more characters that are not a comma. However, I need it to return NULL as the 2nd item is empty:

但是当第二个值为null时,我真的得到列表中的第三个项目,当然这实际上是第二个出现的1个或多个不是逗号的字符。但是,我需要它返回NULL,因为第二项是空的:

SQL> select REGEXP_SUBSTR('1,,3,4,5,6', '[^,]+', 1, 2) data
  2  from dual;

D
-
3

If I change the regex to allow for zero or more characters instead of 1 or more, it also fails for numbers past the null:

如果我将正则表达式更改为允许零个或多个字符而不是1或更多,则对于超过null的数字也会失败:

SQL> select REGEXP_SUBSTR('1,,3,4,5,6', '[^,]*', 1, 4) data
  2  from dual;

D
-
3

I need to allow for the null but can't seem to get the syntax right. Logically I need to return what is before the nth occurrence of a comma whether data is present or not (and allow for the last value also). Any ideas?

我需要允许null但似乎无法使语法正确。从逻辑上讲,我需要返回逗号第n次出现之前的数据是否存在(并允许最后一个值)。有任何想法吗?

2 个解决方案

#1


10  

Thanks to those who replied. After perusing your answers and the answers in the link supplied, I arrived at this solution:

感谢那些回复的人。仔细阅读了所提供的链接中的答案和答案后,我得出了这个解决方案:

SQL> select REGEXP_SUBSTR('1,,3,4,5', '(.*?)(,|$)', 1, 2, NULL, 1) data
  2  from dual;

Data
----

Which can be described as "look at the 2nd occurrence of an optional set of zero or more characters that are followed by a comma or the end of the line, and return the 1st subgroup (which is the data less the comma or end of the line).

其中可以描述为“查看第二次出现的可选零个或多个字符后面跟一个逗号或行尾,并返回第一个子组(这是数据减去逗号或结尾的线)。

I forgot to mention I tested with the null in various positions, multiple nulls, selecting various positions, etc.

我忘了提到我在各种位置测试了null,多个空值,选择了各种位置等。

The only caveat I could find is if the field you look for is greater than the number available, it just returns NULL so you need to be aware of that. Not a problem for my case.

我能找到的唯一警告是,如果您查找的字段大于可用数字,它只返回NULL,因此您需要注意这一点。对我来说不是问题。

EDIT: I am updating the accepted answer for the benefit of future searchers that may stumble upon this.

编辑:我正在更新已接受的答案,以便未来的搜索者可能偶然发现这一点。

The next step is to encapsulate the code so it can be made into a simpler, reusable function. Here is the function source:

下一步是封装代码,以便可以将其制作成更简单,可重用的函数。这是功能来源:

  FUNCTION  GET_LIST_ELEMENT(string_in VARCHAR2, element_in NUMBER, delimiter_in VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
    BEGIN
      RETURN REGEXP_SUBSTR(string_in, '(.*?)(\'||delimiter_in||'|$)', 1, element_in, NULL, 1);
  END GET_LIST_ELEMENT;

This hides the regex complexities from developers who may not be so comfortable with it and makes the code cleaner anyway when in use. Call it like this to get the 4th element:

这隐藏了开发人员的正则表达式复杂性,他们可能对它不太满意,并且在使用时无论如何都要使代码更清晰。这样称它来获得第4个元素:

select get_list_element('123,222,,432,555', 4) from dual;

#2


0  

How about something brutal like this:

这样残酷的事情怎么样:

select REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2) data
from dual

That returns the string value. You can get a real NULL using a case:

返回字符串值。您可以使用案例获得真正的NULL:

select (case when REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2) = 'NULL'
             then NULL
             else REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2)
        end)
from dual;

There may be a regexp_-only solution, but this is what first comes to mind.

可能只有一个regexp_解决方案,但这是首先想到的。

#1


10  

Thanks to those who replied. After perusing your answers and the answers in the link supplied, I arrived at this solution:

感谢那些回复的人。仔细阅读了所提供的链接中的答案和答案后,我得出了这个解决方案:

SQL> select REGEXP_SUBSTR('1,,3,4,5', '(.*?)(,|$)', 1, 2, NULL, 1) data
  2  from dual;

Data
----

Which can be described as "look at the 2nd occurrence of an optional set of zero or more characters that are followed by a comma or the end of the line, and return the 1st subgroup (which is the data less the comma or end of the line).

其中可以描述为“查看第二次出现的可选零个或多个字符后面跟一个逗号或行尾,并返回第一个子组(这是数据减去逗号或结尾的线)。

I forgot to mention I tested with the null in various positions, multiple nulls, selecting various positions, etc.

我忘了提到我在各种位置测试了null,多个空值,选择了各种位置等。

The only caveat I could find is if the field you look for is greater than the number available, it just returns NULL so you need to be aware of that. Not a problem for my case.

我能找到的唯一警告是,如果您查找的字段大于可用数字,它只返回NULL,因此您需要注意这一点。对我来说不是问题。

EDIT: I am updating the accepted answer for the benefit of future searchers that may stumble upon this.

编辑:我正在更新已接受的答案,以便未来的搜索者可能偶然发现这一点。

The next step is to encapsulate the code so it can be made into a simpler, reusable function. Here is the function source:

下一步是封装代码,以便可以将其制作成更简单,可重用的函数。这是功能来源:

  FUNCTION  GET_LIST_ELEMENT(string_in VARCHAR2, element_in NUMBER, delimiter_in VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
    BEGIN
      RETURN REGEXP_SUBSTR(string_in, '(.*?)(\'||delimiter_in||'|$)', 1, element_in, NULL, 1);
  END GET_LIST_ELEMENT;

This hides the regex complexities from developers who may not be so comfortable with it and makes the code cleaner anyway when in use. Call it like this to get the 4th element:

这隐藏了开发人员的正则表达式复杂性,他们可能对它不太满意,并且在使用时无论如何都要使代码更清晰。这样称它来获得第4个元素:

select get_list_element('123,222,,432,555', 4) from dual;

#2


0  

How about something brutal like this:

这样残酷的事情怎么样:

select REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2) data
from dual

That returns the string value. You can get a real NULL using a case:

返回字符串值。您可以使用案例获得真正的NULL:

select (case when REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2) = 'NULL'
             then NULL
             else REGEXP_SUBSTR(replace('1,,3,4,5,6', ',,', ',NULL,'), '[^,]+', 1, 2)
        end)
from dual;

There may be a regexp_-only solution, but this is what first comes to mind.

可能只有一个regexp_解决方案,但这是首先想到的。