在PostgreSQL中,多个表之间有多个sum/count

时间:2022-11-07 11:10:19

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.

我在这个网站上搜索了一些建议,但是还没有找到我想要的。我怀疑我只是漏掉了一个语法/标点的问题。

I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:

我使用phpPgAdmin技术开发了一个数据库,该数据库跟踪大量与正在研究的狒狒种群有关的信息。我要做一个查询来识别每个狒狒,我们收集了多少不同类型的组织样本以及我们有多少不同类型的DNA样本有三个表与我的问题相关:

Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.

表格:“生物图谱”包含了所有动物的基本信息,尽管名字是我在这里关心的。

name | birth
-----+-----------
A21  | 1968-07-01
AAR  | 2002-03-30
ABB  | 1998-09-10
ABD  | 2005-03-15
ABE  | 1986-01-01

Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.

表:“babtissue”记录了这些年来收集到的不同组织的信息,包括下面的三栏。该表中的一些行表示我们不再拥有的组织样本,但仍在数据库中其他地方引用,因此“有用”列帮助我们筛选仍然存在的样本。

name | sample_type | avail
-----+-------------+------
A21  | BLOOD       | Y
A21  | BLOOD       | Y
A21  | TISSUE      | N
ABB  | BLOOD       | Y
ABB  | TISSUE      | Y

Table: "dna" is similar to babtissue.

表:“dna”与babtissue相似。

name | sample_type | avail
-----+-------------+------
ABB  | GDNA        | N
ABB  | WGA         | Y
ACC  | WGA         | N
ALE  | GDNA        | Y
ALE  | GDNA        | Y

Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...

总的来说,我正在编写一个查询,它将返回生物图上的每个名字,并在一个列中告诉我每个人有多少“血液”、“组织”、“GDNA”和“WGA”样本。之类的……

name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21  | 2          | 0           | 0     | 0    | ?
AAR  | 0          | 0           | 0     | 0    | ?
ABB  | 1          | 1           | 0     | 1    | ?
ACC  | 0          | 0           | 0     | 0    | ?
ALE  | 0          | 0           | 2     | 0    | ?

(Apologies for the weird formatting above, I'm not very familiar with writing this way)

(抱歉上面的格式很奇怪,我不太熟悉这种写法)

The latest version of the query that I've tried:

我尝试过的最新版本的查询:

select b.name,  
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,   
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,   
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,  
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas  
from biograph b  
left join babtissue t on b.name=t.name  
left join dna d on b.name=d.name  
where b.name is not NULL  
group by b.name  
order by b.name  

I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.

这样做时我不会收到任何错误,但我知道它给我的数字是错误的——太高了。我认为这与我使用不止一个连接有关,我的连接语法需要改变。

Any ideas?

什么好主意吗?

2 个解决方案

#1


4  

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.

这个数字太高了,因为你加入了babtissue和dna,这会导致复制。

You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...

你可以试着打破它。我不知道这种语法是否适用于您的数据库,但我相信它遵循ANSI标准,所以请尝试一下……

SELECT
    SQ.name,
    SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
    SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
    SQ.gdnas,
    SQ.wgas
FROM
    (
    SELECT
        B.name,
        SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
        SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
    FROM
        biograph B
    LEFT JOIN dna D ON D.name = B.name
    GROUP BY
        B.name
    ) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name

Can the name really be NULL?

这个名字真的是空的吗?

#2


1  

I don't know about the "avail" column, but this should give you the other columns you're looking for:

我不知道“有用”栏,但这应该会给你其他你想要的列:

SELECT  b.name,
        COALESCE (t.bloodsamps,  0) AS bloodsamps,
        COALESCE (t.tissuesamps, 0) AS tissuesamps
        COALESCE (d.gdnas, 0) AS gdnas 
        COALESCE (d.wgas,  0) AS wgas
    FROM biograph b
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'BLOOD'  THEN 1 ELSE 0 END) AS bloodsamps,
                SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
            FROM babtissue
            WHERE avail = 'Y'
            GROUP BY name
        ) t
        ON (t.name = b.name)
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
                SUM(CASE WHEN sample_type = 'WGA'  THEN 1 ELSE 0 END) AS wgas
            FROM dna
            WHERE avail = 'Y'
            GROUP BY name
        ) d
        ON (d.name = b.name)
;

#1


4  

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.

这个数字太高了,因为你加入了babtissue和dna,这会导致复制。

You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...

你可以试着打破它。我不知道这种语法是否适用于您的数据库,但我相信它遵循ANSI标准,所以请尝试一下……

SELECT
    SQ.name,
    SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
    SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
    SQ.gdnas,
    SQ.wgas
FROM
    (
    SELECT
        B.name,
        SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
        SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
    FROM
        biograph B
    LEFT JOIN dna D ON D.name = B.name
    GROUP BY
        B.name
    ) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name

Can the name really be NULL?

这个名字真的是空的吗?

#2


1  

I don't know about the "avail" column, but this should give you the other columns you're looking for:

我不知道“有用”栏,但这应该会给你其他你想要的列:

SELECT  b.name,
        COALESCE (t.bloodsamps,  0) AS bloodsamps,
        COALESCE (t.tissuesamps, 0) AS tissuesamps
        COALESCE (d.gdnas, 0) AS gdnas 
        COALESCE (d.wgas,  0) AS wgas
    FROM biograph b
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'BLOOD'  THEN 1 ELSE 0 END) AS bloodsamps,
                SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
            FROM babtissue
            WHERE avail = 'Y'
            GROUP BY name
        ) t
        ON (t.name = b.name)
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
                SUM(CASE WHEN sample_type = 'WGA'  THEN 1 ELSE 0 END) AS wgas
            FROM dna
            WHERE avail = 'Y'
            GROUP BY name
        ) d
        ON (d.name = b.name)
;