I am trying to write the following query on postgresql:
我正在尝试写以下关于postgresql的查询:
select name, author_id, count(1),
(select count(1)
from names as n2
where n2.id = n1.id
and t2.author_id = t1.author_id
)
from names as n1
group by name, author_id
This would certainly work on Microsoft SQL Server but it does not at all on postegresql. I read its documentation a bit and it seems I could rewrite it as:
这当然适用于Microsoft SQL Server,但完全不适用于postegresql。我读了一些文档,似乎可以把它重写为:
select name, author_id, count(1), total
from names as n1, (select count(1) as total
from names as n2
where n2.id = n1.id
and n2.author_id = t1.author_id
) as total
group by name, author_id
But that returns the following error on postegresql: "subquery in FROM cannot refer to other relations of same query level". So I'm stuck. Does anyone know how I can achieve that?
但是,这在postegresql上返回了以下错误:“来自FROM的子查询不能引用相同查询级别的其他关系”。我卡住了。有人知道我是怎么做到的吗?
Thanks
谢谢
4 个解决方案
#1
75
I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:
我不确定我是否完全理解你的意图,但也许下面的内容会接近你想要的:
select n1.name, n1.author_id, count_1, total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
from names
group by id, author_id) n2
on (n2.id = n1.id and n2.author_id = n1.author_id)
Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.
不幸的是,这增加了按id以及名称和author_id对第一个子查询进行分组的要求,我认为这是不需要的。但是,我不确定如何解决这个问题,因为您需要有id来连接第二个子查询。也许其他人会想出更好的解决方案。
Share and enjoy.
分享和享受。
#2
7
I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:
我只是根据Bob Jarvis的回答,用我需要的最终sql的格式化版本来回答这个问题。
select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
from names
group by author_id) n2
on (n2.author_id = n1.author_id)
#3
3
Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different
补充@Bob Jarvis和@dmikam的答案,Postgres在没有使用横向(在模拟之下)的情况下不会执行一个好的计划,在这两种情况下,查询数据结果是相同的,但是成本是不同的
Table structure
表结构
CREATE TABLE ITEMS (
N INTEGER NOT NULL,
S TEXT NOT NULL
);
CREATE INDEX N_INDEX ON ITEMS(N);
INSERT INTO ITEMS
SELECT
(random()*1000000)::integer AS n,
md5(random()::text) AS s
FROM
generate_series(1,10000000);
Performing JOIN
with GROUP BY
in subquery without LATERAL
在子查询中执行与组的连接,而不进行横向查询
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN (
SELECT
COUNT(1), n
FROM ITEMS
GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);
The results
结果
Merge Join (cost=0.87..637500.40 rows=23 width=37)
Merge Cond: (i.n = items.n)
-> Index Scan using n_index on items i (cost=0.43..101.28 rows=23 width=37)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..626631.11 rows=861418 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..593016.93 rows=10000000 width=4)
Using LATERAL
使用侧
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN LATERAL (
SELECT
COUNT(1), n
FROM ITEMS
WHERE N = I.N
GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);
Results
结果
Nested Loop (cost=9.49..1319.97 rows=276 width=37)
-> Bitmap Heap Scan on items i (cost=9.06..100.20 rows=23 width=37)
Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
-> Bitmap Index Scan on n_index (cost=0.00..9.05 rows=23 width=0)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..52.79 rows=12 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..52.64 rows=12 width=4)
Index Cond: (n = i.n)
My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)
我的Postgres版本是PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)
#4
1
I know this is old, but since Postgresql 9.3 there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:
我知道这是旧的,但是由于Postgresql 9.3有一个选项,可以使用一个关键字“LATERAL”来在join中使用相关的子查询,所以问题中的查询应该是这样的:
SELECT
name, author_id, count(*), t.total
FROM
names as n1
INNER JOIN LATERAL (
SELECT
count(*) as total
FROM
names as n2
WHERE
n2.id = n1.id
AND n2.author_id = n1.author_id
) as t ON 1=1
GROUP BY
n1.name, n1.author_id
#1
75
I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:
我不确定我是否完全理解你的意图,但也许下面的内容会接近你想要的:
select n1.name, n1.author_id, count_1, total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
from names
group by id, author_id) n2
on (n2.id = n1.id and n2.author_id = n1.author_id)
Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.
不幸的是,这增加了按id以及名称和author_id对第一个子查询进行分组的要求,我认为这是不需要的。但是,我不确定如何解决这个问题,因为您需要有id来连接第二个子查询。也许其他人会想出更好的解决方案。
Share and enjoy.
分享和享受。
#2
7
I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:
我只是根据Bob Jarvis的回答,用我需要的最终sql的格式化版本来回答这个问题。
select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
from names
group by author_id) n2
on (n2.author_id = n1.author_id)
#3
3
Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different
补充@Bob Jarvis和@dmikam的答案,Postgres在没有使用横向(在模拟之下)的情况下不会执行一个好的计划,在这两种情况下,查询数据结果是相同的,但是成本是不同的
Table structure
表结构
CREATE TABLE ITEMS (
N INTEGER NOT NULL,
S TEXT NOT NULL
);
CREATE INDEX N_INDEX ON ITEMS(N);
INSERT INTO ITEMS
SELECT
(random()*1000000)::integer AS n,
md5(random()::text) AS s
FROM
generate_series(1,10000000);
Performing JOIN
with GROUP BY
in subquery without LATERAL
在子查询中执行与组的连接,而不进行横向查询
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN (
SELECT
COUNT(1), n
FROM ITEMS
GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);
The results
结果
Merge Join (cost=0.87..637500.40 rows=23 width=37)
Merge Cond: (i.n = items.n)
-> Index Scan using n_index on items i (cost=0.43..101.28 rows=23 width=37)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..626631.11 rows=861418 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..593016.93 rows=10000000 width=4)
Using LATERAL
使用侧
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN LATERAL (
SELECT
COUNT(1), n
FROM ITEMS
WHERE N = I.N
GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);
Results
结果
Nested Loop (cost=9.49..1319.97 rows=276 width=37)
-> Bitmap Heap Scan on items i (cost=9.06..100.20 rows=23 width=37)
Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
-> Bitmap Index Scan on n_index (cost=0.00..9.05 rows=23 width=0)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..52.79 rows=12 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..52.64 rows=12 width=4)
Index Cond: (n = i.n)
My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)
我的Postgres版本是PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)
#4
1
I know this is old, but since Postgresql 9.3 there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:
我知道这是旧的,但是由于Postgresql 9.3有一个选项,可以使用一个关键字“LATERAL”来在join中使用相关的子查询,所以问题中的查询应该是这样的:
SELECT
name, author_id, count(*), t.total
FROM
names as n1
INNER JOIN LATERAL (
SELECT
count(*) as total
FROM
names as n2
WHERE
n2.id = n1.id
AND n2.author_id = n1.author_id
) as t ON 1=1
GROUP BY
n1.name, n1.author_id