在PostgreSQL中如何返回一个函数的结果?

时间:2021-01-31 22:58:26

I have this function in PostgreSQL, but I don't know how to return the result of the query:

我在PostgreSQL中有这个函数,但是我不知道如何返回查询的结果:

CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
  RETURNS SETOF RECORD AS
$$
BEGIN
    SELECT text, count(*), 100 / maxTokens * count(*)
    FROM (
        SELECT text
    FROM token
    WHERE chartype = 'ALPHABETIC'
    LIMIT maxTokens
    ) as tokens
    GROUP BY text
    ORDER BY count DESC
END
$$
LANGUAGE plpgsql;

But I don't know how to return the result of the query inside the PostgreSQL function.

但是我不知道如何返回PostgreSQL函数内查询的结果。

I found that the return type should be SETOF RECORD, right? But the return command is not right.

我发现返回类型应该是SETOF RECORD,对吧?但是返回命令并不正确。

What is the right way to do this?

正确的做法是什么?

1 个解决方案

#1


73  

Use RETURN QUERY:

使用返回的查询:

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (
    txt   text   -- visible as OUT parameter inside and outside function
  , cnt   bigint
  , ratio bigint) AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , count(*) AS cnt  -- column alias only visible inside
        , (count(*) * 100) / _max_tokens  -- I added brackets
   FROM  (
      SELECT t.txt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      LIMIT  _max_tokens
      ) t
   GROUP  BY t.txt
   ORDER  BY cnt DESC;  -- note the potential ambiguity 
END
$func$  LANGUAGE plpgsql;

Call:

电话:

SELECT * FROM word_frequency(123);

Explanation:

解释:

  • It is much more practical to explicitly define the return type than simply declaring it as record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.

    显式地定义返回类型要比简单地将其声明为记录更加实际。这样,您不必为每个函数调用提供一个列定义列表。返回表是一种方法。有别人。OUT参数的数据类型必须精确匹配查询返回的内容。

  • Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.

    仔细地为OUT参数选择名称。它们几乎在任何地方都可以看到。表限定相同名称的列,以避免冲突或意外结果。在我的例子中,我对所有列都这样做了。

    But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:

    但是请注意OUT参数cnt和同名列别名之间的潜在命名冲突。在这种情况下(返回查询选择…)Postgres在OUT参数上使用列别名。然而,在其他情况下,这可能是模棱两可的。有很多方法可以避免混淆:

    1. Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
    2. 使用SELECT列表中项目的序号位置:ORDER BY 2 DESC. Example: SELECT first row in each GROUP BY GROUP ?
    3. Repeat the expression ORDER BY count(*).
    4. 通过count(*)重复表达式顺序。
    5. (Not applicable here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column per function. Example:
    6. (在这里不适用。)设置配置参数plpgsql。变量冲突或使用特殊命令#variable_conflict错误| use_variable | use_column每个函数。示例:函数参数和使用子句连接的结果之间的命名冲突
  • Don't use "text" and "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples.

    不要使用“文本”和“计数”作为列名。在Postgres中使用两者都是合法的,但是在标准SQL中“count”是一个保留词,而“text”是一个基本的数据类型。可能导致混淆错误。我在示例中使用txt和cnt。

  • Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - type after name.

    添加了一个失踪;并修正了标题中的语法错误。(_max_token int),而不是(int maxtoken)——输入一个又一个名字。

  • While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Even better: work with numeric (or a floating point type). See below.

    使用整数除法时,最好先乘后除,以最小化舍入误差。更好的是:使用数字(或浮点类型)。见下文。

Alternative

This is what I think your query should actually look like (calculating a relative share per token):

我认为您的查询应该是这样的(计算每个令牌的相对份额):

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (
    txt            text
  , abs_cnt        bigint
  , relative_share numeric) AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , t.cnt
        , round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2)  -- AS relative_share
   FROM  (
      SELECT t.txt
           , count(*) AS cnt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      GROUP  BY t.txt
      ORDER  BY cnt DESC
      LIMIT  _max_tokens
      ) t
   ORDER  BY t.cnt DESC;
END
$func$  LANGUAGE plpgsql;

The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery - pretty, but a subquery is typically cheaper in simple cases like this one.

表达式sum(t.cnt) OVER()是一个窗口函数。您可以使用CTE而不是子查询——pretty,但是在这种简单的情况下,子查询通常更便宜。

A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).

当使用OUT参数或RETURN表(隐式使用OUT参数)时,不需要(但允许)最后的显式返回语句。

round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

带有两个参数的round()只适用于数值类型。在子查询中,count()生成一个bigint结果,而这个bigint上的sum()生成一个数字结果,因此我们自动处理一个数字,所有的东西都就位了。

#1


73  

Use RETURN QUERY:

使用返回的查询:

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (
    txt   text   -- visible as OUT parameter inside and outside function
  , cnt   bigint
  , ratio bigint) AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , count(*) AS cnt  -- column alias only visible inside
        , (count(*) * 100) / _max_tokens  -- I added brackets
   FROM  (
      SELECT t.txt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      LIMIT  _max_tokens
      ) t
   GROUP  BY t.txt
   ORDER  BY cnt DESC;  -- note the potential ambiguity 
END
$func$  LANGUAGE plpgsql;

Call:

电话:

SELECT * FROM word_frequency(123);

Explanation:

解释:

  • It is much more practical to explicitly define the return type than simply declaring it as record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.

    显式地定义返回类型要比简单地将其声明为记录更加实际。这样,您不必为每个函数调用提供一个列定义列表。返回表是一种方法。有别人。OUT参数的数据类型必须精确匹配查询返回的内容。

  • Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.

    仔细地为OUT参数选择名称。它们几乎在任何地方都可以看到。表限定相同名称的列,以避免冲突或意外结果。在我的例子中,我对所有列都这样做了。

    But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:

    但是请注意OUT参数cnt和同名列别名之间的潜在命名冲突。在这种情况下(返回查询选择…)Postgres在OUT参数上使用列别名。然而,在其他情况下,这可能是模棱两可的。有很多方法可以避免混淆:

    1. Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
    2. 使用SELECT列表中项目的序号位置:ORDER BY 2 DESC. Example: SELECT first row in each GROUP BY GROUP ?
    3. Repeat the expression ORDER BY count(*).
    4. 通过count(*)重复表达式顺序。
    5. (Not applicable here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column per function. Example:
    6. (在这里不适用。)设置配置参数plpgsql。变量冲突或使用特殊命令#variable_conflict错误| use_variable | use_column每个函数。示例:函数参数和使用子句连接的结果之间的命名冲突
  • Don't use "text" and "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples.

    不要使用“文本”和“计数”作为列名。在Postgres中使用两者都是合法的,但是在标准SQL中“count”是一个保留词,而“text”是一个基本的数据类型。可能导致混淆错误。我在示例中使用txt和cnt。

  • Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - type after name.

    添加了一个失踪;并修正了标题中的语法错误。(_max_token int),而不是(int maxtoken)——输入一个又一个名字。

  • While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Even better: work with numeric (or a floating point type). See below.

    使用整数除法时,最好先乘后除,以最小化舍入误差。更好的是:使用数字(或浮点类型)。见下文。

Alternative

This is what I think your query should actually look like (calculating a relative share per token):

我认为您的查询应该是这样的(计算每个令牌的相对份额):

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (
    txt            text
  , abs_cnt        bigint
  , relative_share numeric) AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , t.cnt
        , round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2)  -- AS relative_share
   FROM  (
      SELECT t.txt
           , count(*) AS cnt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      GROUP  BY t.txt
      ORDER  BY cnt DESC
      LIMIT  _max_tokens
      ) t
   ORDER  BY t.cnt DESC;
END
$func$  LANGUAGE plpgsql;

The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery - pretty, but a subquery is typically cheaper in simple cases like this one.

表达式sum(t.cnt) OVER()是一个窗口函数。您可以使用CTE而不是子查询——pretty,但是在这种简单的情况下,子查询通常更便宜。

A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).

当使用OUT参数或RETURN表(隐式使用OUT参数)时,不需要(但允许)最后的显式返回语句。

round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

带有两个参数的round()只适用于数值类型。在子查询中,count()生成一个bigint结果,而这个bigint上的sum()生成一个数字结果,因此我们自动处理一个数字,所有的东西都就位了。