沿着分层表中的路径(节点数组)查找第一个非空值

时间:2022-08-18 22:59:52

I have been fruitlessly trying for several hours to make a function that filter array subscripts based upon a criteria on the array from which the subscripts and then create an array of those subscripts.

我几个小时都没有成功地尝试创建一个函数,根据下标数组的条件过滤数组下标,然后创建这些下标的数组。

The data structure I am dealing with is similar to the following sample (except with many more columns to compare and more complicated rules and mixed data types):

我正在处理的数据结构类似于以下示例(除了要比较更多的列和更复杂的规则和混合数据类型):

id hierarchy abbreviation1 abbreviation2
1  {1}       SB            GL
2  {2,1}     NULL          NULL
3  {3,2,1}   NULL          TC
4  {4,2,1}   NULL          NULL

I need to run a query that takes the next non-null value closest to the parent for abbreviation1 and abbreviation2 and compares them based upon the hierarchical distance from the current record in order to get a single value for an abbreviation. So, for example, if the first non-null values of abbreviation1 and abbreviation2 are both on the same record level abbreviation1 would take priority; on the other hand, if the first non-null abbreviation2 is closer to the current record then the corresponding non-null value for abbreviation1, then abbreviation2 would be used.

我需要运行一个查询,该查询将最接近父项的下一个非空值用于abbreviation1和abbreviation2,并根据与当前记录的分层距离对它们进行比较,以获得缩写的单个值。因此,例如,如果abbreviation1和abbreviation2的第一个非null值都在同一记录级别上,则abbreviation1将优先;另一方面,如果第一个非null缩写2更接近当前记录,则使用abbreviation1的相应非空值,然后使用abbreviation2。

Thus the described query on the above sample table would yield;

因此,上述样本表中描述的查询将产生;

id abbreviation
1  SB
2  SB
3  TC
4  SB

To accomplish this task I need to generate a filtered array of array subscripts (after doing an array_agg() on the abbreviation columns) which only contain subscripts where the value in an abbreviation column is not null.

要完成此任务,我需要生成一个过滤的数组下标数组(在缩写列上执行array_agg()之后),该数组只包含缩写列中的值不为null的下标。

The following function, based on all the logic in my tired mind, should work but does not

基于我疲惫的头脑中的所有逻辑,以下功能应该有效但不能

CREATE OR REPLACE FUNCTION filter_array_subscripts(rawarray anyarray,criteria anynonarray,dimension integer, reverse boolean DEFAULT False) 
  RETURNS integer[] as 
$$
DECLARE
  outarray integer[] := ARRAY[]::integer[];
  x integer;
  BEGIN
    for i in array_lower(rawarray,dimension)..array_upper(rawarray,dimension) LOOP
      IF NOT criteria IS NULL THEN
        IF NOT rawarray[i] IS NULL THEN
          IF NOT rawarray[i] = criteria THEN
            IF reverse = False THEN
              outarray := array_append(outarray,i);
            ELSE
              outarray := array_prepend(i,outarray);
            END IF;
         ELSE
            IF reverse = False THEN
              outarray := array_append(outarray,i);
            ELSE
              outarray := array_prepend(i,outarray);
            END IF;
         END IF;
        END IF;
      ELSE
        IF NOT rawarray[i] is NULL THEN
          IF reverse = False THEN
            outarray := array_append(outarray,i);
          ELSE
            outarray := array_prepend(i,outarray);
          END IF;
        END IF;
      END IF;
    END LOOP;
    RETURN outarray;
  END; 
$$ LANGUAGE plpgsql;

For example, the below query returns {5,3,1} when it should return {5,4,2,1}

例如,以下查询返回{5,3,1}时应返回{5,4,2,1}

select filter_array_subscripts(array['This',NULL,'is',NULL,'insane!']::text[]
                               ,'is',1,True);

I have no idea why this does not work, I have tried using the foreach array iteration syntax but I cannot figure out how to cast the iteration value to the scalar type contained within the anyarray.

我不知道为什么这不起作用,我已经尝试使用foreach数组迭代语法,但我无法弄清楚如何将迭代值转换为包含在anyarray中的标量类型。

What can be done to fix this?

可以做些什么来解决这个问题?

1 个解决方案

#1


2  

You can largely simplify this whole endeavor with the use of a RECURSIVE CTE, available in PostgreSQL 8.4 or later:

通过使用PostgreSQL 8.4或更高版本中提供的RECURSIVE CTE,您可以在很大程度上简化整个过程:

Test table (makes it easier for everyone to provide test data in a form like this):

测试表(使每个人更容易以这样的形式提供测试数据):

CREATE TEMP TABLE tbl (
    id int
  , hierarchy int[]
  , abbreviation1 text
  , abbreviation2 text
);

INSERT INTO tbl VALUES
 (1, '{1}',     'SB', 'GL')
,(2, '{2,1}',   NULL, NULL)
,(3, '{3,2,1}', NULL, 'TC')
,(4, '{4,2,1}', NULL, NULL);

Query:

查询:

WITH RECURSIVE x AS (
    SELECT id
         , COALESCE(abbreviation1, abbreviation2) AS abbr
         , hierarchy[2] AS parent_id
    FROM   tbl

    UNION ALL
    SELECT x.id
         , COALESCE(parent.abbreviation1, parent.abbreviation2) AS abbr
         , parent.hierarchy[2] AS parent_id
    FROM   x
    JOIN   tbl AS parent ON parent.id = x.parent_id
    WHERE  x.abbr IS NULL  -- stop at non-NULL value
    )
SELECT id, abbr
FROM   x
WHERE  abbr IS NOT NULL  -- discard intermediary NULLs
ORDER  BY id

Returns:

返回:

id | abbr
---+-----
1  | SB
2  | SB
3  | TC
4  | SB

This presumes that there is a non-null value on every path, or such rows will be dropped from the result.

这假设每条路径上都有一个非空值,或者这些行将从结果中删除。

#1


2  

You can largely simplify this whole endeavor with the use of a RECURSIVE CTE, available in PostgreSQL 8.4 or later:

通过使用PostgreSQL 8.4或更高版本中提供的RECURSIVE CTE,您可以在很大程度上简化整个过程:

Test table (makes it easier for everyone to provide test data in a form like this):

测试表(使每个人更容易以这样的形式提供测试数据):

CREATE TEMP TABLE tbl (
    id int
  , hierarchy int[]
  , abbreviation1 text
  , abbreviation2 text
);

INSERT INTO tbl VALUES
 (1, '{1}',     'SB', 'GL')
,(2, '{2,1}',   NULL, NULL)
,(3, '{3,2,1}', NULL, 'TC')
,(4, '{4,2,1}', NULL, NULL);

Query:

查询:

WITH RECURSIVE x AS (
    SELECT id
         , COALESCE(abbreviation1, abbreviation2) AS abbr
         , hierarchy[2] AS parent_id
    FROM   tbl

    UNION ALL
    SELECT x.id
         , COALESCE(parent.abbreviation1, parent.abbreviation2) AS abbr
         , parent.hierarchy[2] AS parent_id
    FROM   x
    JOIN   tbl AS parent ON parent.id = x.parent_id
    WHERE  x.abbr IS NULL  -- stop at non-NULL value
    )
SELECT id, abbr
FROM   x
WHERE  abbr IS NOT NULL  -- discard intermediary NULLs
ORDER  BY id

Returns:

返回:

id | abbr
---+-----
1  | SB
2  | SB
3  | TC
4  | SB

This presumes that there is a non-null value on every path, or such rows will be dropped from the result.

这假设每条路径上都有一个非空值,或者这些行将从结果中删除。