如何在select子句中执行Postgresql子查询,并在SQL Server等子句中加入join ?

时间:2021-03-03 09:30:06

I am trying to write the following query on postgresql:

我正在尝试写以下关于postgresql的查询:

select name, author_id, count(1), 
    (select count(1)
    from names as n2
    where n2.id = n1.id
        and t2.author_id = t1.author_id
    )               
from names as n1
group by name, author_id

This would certainly work on Microsoft SQL Server but it does not at all on postegresql. I read its documentation a bit and it seems I could rewrite it as:

这当然适用于Microsoft SQL Server,但完全不适用于postegresql。我读了一些文档,似乎可以把它重写为:

select name, author_id, count(1), total                     
from names as n1, (select count(1) as total
    from names as n2
    where n2.id = n1.id
        and n2.author_id = t1.author_id
    ) as total
group by name, author_id

But that returns the following error on postegresql: "subquery in FROM cannot refer to other relations of same query level". So I'm stuck. Does anyone know how I can achieve that?

但是,这在postegresql上返回了以下错误:“来自FROM的子查询不能引用相同查询级别的其他关系”。我卡住了。有人知道我是怎么做到的吗?

Thanks

谢谢

4 个解决方案

#1


75  

I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:

我不确定我是否完全理解你的意图,但也许下面的内容会接近你想要的:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.

不幸的是,这增加了按id以及名称和author_id对第一个子查询进行分组的要求,我认为这是不需要的。但是,我不确定如何解决这个问题,因为您需要有id来连接第二个子查询。也许其他人会想出更好的解决方案。

Share and enjoy.

分享和享受。

#2


7  

I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:

我只是根据Bob Jarvis的回答,用我需要的最终sql的格式化版本来回答这个问题。

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

#3


3  

Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different

补充@Bob Jarvis和@dmikam的答案,Postgres在没有使用横向(在模拟之下)的情况下不会执行一个好的计划,在这两种情况下,查询数据结果是相同的,但是成本是不同的

Table structure

表结构

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

CREATE INDEX N_INDEX ON ITEMS(N);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,10000000);

Performing JOIN with GROUP BY in subquery without LATERAL

在子查询中执行与组的连接,而不进行横向查询

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

结果

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

使用侧

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

结果

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

我的Postgres版本是PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

#4


1  

I know this is old, but since Postgresql 9.3 there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:

我知道这是旧的,但是由于Postgresql 9.3有一个选项,可以使用一个关键字“LATERAL”来在join中使用相关的子查询,所以问题中的查询应该是这样的:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

#1


75  

I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:

我不确定我是否完全理解你的意图,但也许下面的内容会接近你想要的:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.

不幸的是,这增加了按id以及名称和author_id对第一个子查询进行分组的要求,我认为这是不需要的。但是,我不确定如何解决这个问题,因为您需要有id来连接第二个子查询。也许其他人会想出更好的解决方案。

Share and enjoy.

分享和享受。

#2


7  

I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:

我只是根据Bob Jarvis的回答,用我需要的最终sql的格式化版本来回答这个问题。

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

#3


3  

Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different

补充@Bob Jarvis和@dmikam的答案,Postgres在没有使用横向(在模拟之下)的情况下不会执行一个好的计划,在这两种情况下,查询数据结果是相同的,但是成本是不同的

Table structure

表结构

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

CREATE INDEX N_INDEX ON ITEMS(N);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,10000000);

Performing JOIN with GROUP BY in subquery without LATERAL

在子查询中执行与组的连接,而不进行横向查询

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

结果

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

使用侧

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

结果

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

我的Postgres版本是PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

#4


1  

I know this is old, but since Postgresql 9.3 there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:

我知道这是旧的,但是由于Postgresql 9.3有一个选项,可以使用一个关键字“LATERAL”来在join中使用相关的子查询,所以问题中的查询应该是这样的:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id