I have this function in PostgreSQL, but I don't know how to return the result of the query:
我在PostgreSQL中有这个函数,但是我不知道如何返回查询的结果:
CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
RETURNS SETOF RECORD AS
$$
BEGIN
SELECT text, count(*), 100 / maxTokens * count(*)
FROM (
SELECT text
FROM token
WHERE chartype = 'ALPHABETIC'
LIMIT maxTokens
) as tokens
GROUP BY text
ORDER BY count DESC
END
$$
LANGUAGE plpgsql;
But I don't know how to return the result of the query inside the PostgreSQL function.
但是我不知道如何返回PostgreSQL函数内查询的结果。
I found that the return type should be SETOF RECORD
, right? But the return command is not right.
我发现返回类型应该是SETOF RECORD,对吧?但是返回命令并不正确。
What is the right way to do this?
正确的做法是什么?
1 个解决方案
#1
73
Use RETURN QUERY
:
使用返回的查询:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (
txt text -- visible as OUT parameter inside and outside function
, cnt bigint
, ratio bigint) AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible inside
, (count(*) * 100) / _max_tokens -- I added brackets
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- note the potential ambiguity
END
$func$ LANGUAGE plpgsql;
Call:
电话:
SELECT * FROM word_frequency(123);
Explanation:
解释:
-
It is much more practical to explicitly define the return type than simply declaring it as record. This way you don't have to provide a column definition list with every function call.
RETURNS TABLE
is one way to do that. There are others. Data types ofOUT
parameters have to match exactly what is returned by the query.显式地定义返回类型要比简单地将其声明为记录更加实际。这样,您不必为每个函数调用提供一个列定义列表。返回表是一种方法。有别人。OUT参数的数据类型必须精确匹配查询返回的内容。
-
Choose names for
OUT
parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.仔细地为OUT参数选择名称。它们几乎在任何地方都可以看到。表限定相同名称的列,以避免冲突或意外结果。在我的例子中,我对所有列都这样做了。
But note the potential naming conflict between the
OUT
parametercnt
and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...
) Postgres uses the column alias over theOUT
parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:但是请注意OUT参数cnt和同名列别名之间的潜在命名冲突。在这种情况下(返回查询选择…)Postgres在OUT参数上使用列别名。然而,在其他情况下,这可能是模棱两可的。有很多方法可以避免混淆:
- Use the ordinal position of the item in the SELECT list:
ORDER BY 2 DESC
. Example:- Select first row in each GROUP BY group?
- 在每个组中按组选择第一行?
- 使用SELECT列表中项目的序号位置:ORDER BY 2 DESC. Example: SELECT first row in each GROUP BY GROUP ?
- Repeat the expression
ORDER BY count(*)
. - 通过count(*)重复表达式顺序。
- (Not applicable here.) Set the configuration parameter
plpgsql.variable_conflict
or use the special command#variable_conflict error | use_variable | use_column
per function. Example:- Naming conflict between function parameter and result of JOIN with USING clause
- 函数参数与使用子句连接结果之间的命名冲突
- (在这里不适用。)设置配置参数plpgsql。变量冲突或使用特殊命令#variable_conflict错误| use_variable | use_column每个函数。示例:函数参数和使用子句连接的结果之间的命名冲突
- Use the ordinal position of the item in the SELECT list:
-
Don't use "text" and "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use
txt
andcnt
in my examples.不要使用“文本”和“计数”作为列名。在Postgres中使用两者都是合法的,但是在标准SQL中“count”是一个保留词,而“text”是一个基本的数据类型。可能导致混淆错误。我在示例中使用txt和cnt。
-
Added a missing
;
and corrected a syntax error in the header.(_max_tokens int)
, not(int maxTokens)
- type after name.添加了一个失踪;并修正了标题中的语法错误。(_max_token int),而不是(int maxtoken)——输入一个又一个名字。
-
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Even better: work with
numeric
(or a floating point type). See below.使用整数除法时,最好先乘后除,以最小化舍入误差。更好的是:使用数字(或浮点类型)。见下文。
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
我认为您的查询应该是这样的(计算每个令牌的相对份额):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (
txt text
, abs_cnt bigint
, relative_share numeric) AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt
, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$ LANGUAGE plpgsql;
The expression sum(t.cnt) OVER ()
is a window function. You could use a CTE instead of the subquery - pretty, but a subquery is typically cheaper in simple cases like this one.
表达式sum(t.cnt) OVER()是一个窗口函数。您可以使用CTE而不是子查询——pretty,但是在这种简单的情况下,子查询通常更便宜。
A final explicit RETURN
statement is not required (but allowed) when working with OUT
parameters or RETURNS TABLE
(which makes implicit use of OUT
parameters).
当使用OUT参数或RETURN表(隐式使用OUT参数)时,不需要(但允许)最后的显式返回语句。
round()
with two parameters only works for numeric
types. count()
in the subquery produces a bigint
result and a sum()
over this bigint
produces a numeric
result, thus we deal with a numeric
number automatically and everything just falls into place.
带有两个参数的round()只适用于数值类型。在子查询中,count()生成一个bigint结果,而这个bigint上的sum()生成一个数字结果,因此我们自动处理一个数字,所有的东西都就位了。
#1
73
Use RETURN QUERY
:
使用返回的查询:
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (
txt text -- visible as OUT parameter inside and outside function
, cnt bigint
, ratio bigint) AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, count(*) AS cnt -- column alias only visible inside
, (count(*) * 100) / _max_tokens -- I added brackets
FROM (
SELECT t.txt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
LIMIT _max_tokens
) t
GROUP BY t.txt
ORDER BY cnt DESC; -- note the potential ambiguity
END
$func$ LANGUAGE plpgsql;
Call:
电话:
SELECT * FROM word_frequency(123);
Explanation:
解释:
-
It is much more practical to explicitly define the return type than simply declaring it as record. This way you don't have to provide a column definition list with every function call.
RETURNS TABLE
is one way to do that. There are others. Data types ofOUT
parameters have to match exactly what is returned by the query.显式地定义返回类型要比简单地将其声明为记录更加实际。这样,您不必为每个函数调用提供一个列定义列表。返回表是一种方法。有别人。OUT参数的数据类型必须精确匹配查询返回的内容。
-
Choose names for
OUT
parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.仔细地为OUT参数选择名称。它们几乎在任何地方都可以看到。表限定相同名称的列,以避免冲突或意外结果。在我的例子中,我对所有列都这样做了。
But note the potential naming conflict between the
OUT
parametercnt
and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...
) Postgres uses the column alias over theOUT
parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:但是请注意OUT参数cnt和同名列别名之间的潜在命名冲突。在这种情况下(返回查询选择…)Postgres在OUT参数上使用列别名。然而,在其他情况下,这可能是模棱两可的。有很多方法可以避免混淆:
- Use the ordinal position of the item in the SELECT list:
ORDER BY 2 DESC
. Example:- Select first row in each GROUP BY group?
- 在每个组中按组选择第一行?
- 使用SELECT列表中项目的序号位置:ORDER BY 2 DESC. Example: SELECT first row in each GROUP BY GROUP ?
- Repeat the expression
ORDER BY count(*)
. - 通过count(*)重复表达式顺序。
- (Not applicable here.) Set the configuration parameter
plpgsql.variable_conflict
or use the special command#variable_conflict error | use_variable | use_column
per function. Example:- Naming conflict between function parameter and result of JOIN with USING clause
- 函数参数与使用子句连接结果之间的命名冲突
- (在这里不适用。)设置配置参数plpgsql。变量冲突或使用特殊命令#variable_conflict错误| use_variable | use_column每个函数。示例:函数参数和使用子句连接的结果之间的命名冲突
- Use the ordinal position of the item in the SELECT list:
-
Don't use "text" and "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use
txt
andcnt
in my examples.不要使用“文本”和“计数”作为列名。在Postgres中使用两者都是合法的,但是在标准SQL中“count”是一个保留词,而“text”是一个基本的数据类型。可能导致混淆错误。我在示例中使用txt和cnt。
-
Added a missing
;
and corrected a syntax error in the header.(_max_tokens int)
, not(int maxTokens)
- type after name.添加了一个失踪;并修正了标题中的语法错误。(_max_token int),而不是(int maxtoken)——输入一个又一个名字。
-
While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Even better: work with
numeric
(or a floating point type). See below.使用整数除法时,最好先乘后除,以最小化舍入误差。更好的是:使用数字(或浮点类型)。见下文。
Alternative
This is what I think your query should actually look like (calculating a relative share per token):
我认为您的查询应该是这样的(计算每个令牌的相对份额):
CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
RETURNS TABLE (
txt text
, abs_cnt bigint
, relative_share numeric) AS
$func$
BEGIN
RETURN QUERY
SELECT t.txt
, t.cnt
, round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2) -- AS relative_share
FROM (
SELECT t.txt
, count(*) AS cnt
FROM token t
WHERE t.chartype = 'ALPHABETIC'
GROUP BY t.txt
ORDER BY cnt DESC
LIMIT _max_tokens
) t
ORDER BY t.cnt DESC;
END
$func$ LANGUAGE plpgsql;
The expression sum(t.cnt) OVER ()
is a window function. You could use a CTE instead of the subquery - pretty, but a subquery is typically cheaper in simple cases like this one.
表达式sum(t.cnt) OVER()是一个窗口函数。您可以使用CTE而不是子查询——pretty,但是在这种简单的情况下,子查询通常更便宜。
A final explicit RETURN
statement is not required (but allowed) when working with OUT
parameters or RETURNS TABLE
(which makes implicit use of OUT
parameters).
当使用OUT参数或RETURN表(隐式使用OUT参数)时,不需要(但允许)最后的显式返回语句。
round()
with two parameters only works for numeric
types. count()
in the subquery produces a bigint
result and a sum()
over this bigint
produces a numeric
result, thus we deal with a numeric
number automatically and everything just falls into place.
带有两个参数的round()只适用于数值类型。在子查询中,count()生成一个bigint结果,而这个bigint上的sum()生成一个数字结果,因此我们自动处理一个数字,所有的东西都就位了。