mysql - 两个左连接 - 重复计数

时间:2021-10-17 15:22:53

Long time user, first time poster. I've found similar questions/answers, typically involving subqueries, but I'm not sure how to apply to my situation.

长时间用户,第一次海报。我发现了类似的问题/答案,通常涉及子查询,但我不确定如何应用于我的情况。

I have 3 tables:

我有3张桌子:

table1
id

table2
id | val (each id has 1 of 3 possible values)

table3
id | val (each id has 1 of 3 possible values)

EDIT: Example: (table1 = unique id of everyone who attended a theme park; table2 = which attraction each visitor visited first; table3 = which attraction each visitor visited second).

编辑:示例:( table1 =参加主题公园的每个人的唯一ID; table2 =每个访客首先访问哪个景点; table3 =每个访客访问的第二个景点)。

I want to write a query to look up 7 different counts: (1) count of the unique ids in table1 (2) count of the number of ids that have each of the possible values in table2 (3) count of the number of ids that have each of the possible values in table3

我想写一个查询来查找7个不同的计数:(1)table1中唯一ID的计数(2)具有table2中每个可能值的id数的计数(3)id的数量的计数表3中包含每个可能的值

My MySQL query:

我的MySQL查询:

SELECT 
    count(DISTINCT table1.id) AS x1, 
    SUM(IF(table2.val='1'),1,0)) AS x2, 
    SUM(IF(table2.val='2'),1,0)) AS x3, 
    SUM(IF(table2.val='3'),1,0)) AS x4, 
    SUM(IF(table3.val='1'),1,0)) AS x5, 
    SUM(IF(table3.val='2'),1,0)) AS x6, 
    SUM(IF(table3.val='3'),1,0)) AS x7 
FROM 
    table1 
LEFT JOIN 
    table2 ON table1.id=table2.id 
LEFT JOIN 
    table3 ON table1.id=table3.id 

Results:

结果:

x1 = correct (because of DISTINCT)

x1 =正确(因为DISTINCT)

x2,x3,x4 = correct

x2,x3,x4 =正确

x5,x6,x7 = TWICE the number they should be (because I'm getting cartesian product?)

x5,x6,x7 =它们应该是的数字两倍(因为我得到的是笛卡尔积?)

Any suggestions?

有什么建议么?

3 个解决方案

#1


1  

My guess is you issue is that id is not unique in table1. So even though it is unique in table2/3 (according to your description) each row in table2/3 is joined to two rows in table1 and thus counted twice. Has nothing to do with the left joins, normal inner joins would have the same issue.

我的猜测是你的问题是在table1中id不是唯一的。因此即使它在table2 / 3中是唯一的(根据你的描述),table2 / 3中的每一行都连接到table1中的两行,因此计数两次。与左连接无关,正常的内连接会有同样的问题。

If mysql (which I don't know real well) lets you do inline views like oracle does, then you can fix it by writing your query as:

如果mysql(我不太了解它)可以让你像oracle那样进行内联视图,那么你可以通过编写查询来修复它:

SELECT 
    count(view1.id)              AS x1, 
    SUM(IF(table2.val='1'),1,0)) AS x2, 
    SUM(IF(table2.val='2'),1,0)) AS x3, 
    SUM(IF(table2.val='3'),1,0)) AS x4, 
    SUM(IF(table3.val='1'),1,0)) AS x5, 
    SUM(IF(table3.val='2'),1,0)) AS x6, 
    SUM(IF(table3.val='3'),1,0)) AS x7 
FROM 
    (  SELECT DISTINCT table1.id
       FROM   table1
    ) view1
LEFT JOIN 
    table2 ON view1.id=table2.id 
LEFT JOIN 
    table3 ON view1.id=table3.id

#2


2  

You are getting a Cartesian result. Since you are not showing how many "1", "2" or "3" counts per "ID", just do a select sum() from those tables by themselves. Since a sum with no group by will always result in ONE record, you don't need any join and it will pull the results of one record per each summary with no Cartesian result. Since your original query was LEFT JOIN to the others, the ID would have already existed on table 1, so why re-query count distinct in each sub-table.

你得到了笛卡尔结果。由于您没有显示每个“ID”有多少“1”,“2”或“3”计数,因此只需从这些表中自行选择一个()。由于没有group by的总和将始终产生一条记录,因此您不需要任何连接,它将为每个摘要提取一条记录的结果,而不会产生笛卡尔结果。由于您的原始查询是LEFT JOIN到其他查询,因此ID已经存在于表1中,因此为什么在每个子表中重新查询计数不同。

SELECT
      SumForTable1.x1, 
      SumForTable2.x2,
      SumForTable2.x3,
      SumForTable2.x4,
      SumForTable3.x5,
      SumForTable3.x6,
      SumForTable3.x7
   FROM 
      ( select count(DISTINCT table1.id) AS x1
           from table1 ) SumForTable1,

      ( select SUM(IF(table2.val='1'), 1, 0)) AS x2, 
               SUM(IF(table2.val='2'), 1, 0)) AS x3, 
               SUM(IF(table2.val='3'), 1, 0)) AS x4
            from table2 ) SumForTable2,

      ( select SUM(IF(table3.val='1'), 1, 0)) AS x5, 
               SUM(IF(table3.val='2'), 1, 0)) AS x6, 
               SUM(IF(table3.val='3'), 1, 0)) AS x7
            from table3 ) SumForTable3

#3


0  

I'd remove duplicates on every table:

我会删除每个表上的重复项:

SELECT
    count(t1.id) AS t1, 
    SUM(IF(t2.val=1,1,0)) AS t21, 
    SUM(IF(t2.val=2,1,0)) AS t22, 
    SUM(IF(t2.val=3,1,0)) AS t23, 
    SUM(IF(t3.val=1,1,0)) AS t31, 
    SUM(IF(t3.val=2,1,0)) AS t32, 
    SUM(IF(t3.val=3,1,0)) AS t33
FROM (SELECT DISTINCT * FROM table1) as t1
JOIN (SELECT DISTINCT * FROM table2) as t2 ON t1.id=t2.id 
JOIN (SELECT DISTINCT * FROM table3) as t3 ON t1.id=t3.id;

#1


1  

My guess is you issue is that id is not unique in table1. So even though it is unique in table2/3 (according to your description) each row in table2/3 is joined to two rows in table1 and thus counted twice. Has nothing to do with the left joins, normal inner joins would have the same issue.

我的猜测是你的问题是在table1中id不是唯一的。因此即使它在table2 / 3中是唯一的(根据你的描述),table2 / 3中的每一行都连接到table1中的两行,因此计数两次。与左连接无关,正常的内连接会有同样的问题。

If mysql (which I don't know real well) lets you do inline views like oracle does, then you can fix it by writing your query as:

如果mysql(我不太了解它)可以让你像oracle那样进行内联视图,那么你可以通过编写查询来修复它:

SELECT 
    count(view1.id)              AS x1, 
    SUM(IF(table2.val='1'),1,0)) AS x2, 
    SUM(IF(table2.val='2'),1,0)) AS x3, 
    SUM(IF(table2.val='3'),1,0)) AS x4, 
    SUM(IF(table3.val='1'),1,0)) AS x5, 
    SUM(IF(table3.val='2'),1,0)) AS x6, 
    SUM(IF(table3.val='3'),1,0)) AS x7 
FROM 
    (  SELECT DISTINCT table1.id
       FROM   table1
    ) view1
LEFT JOIN 
    table2 ON view1.id=table2.id 
LEFT JOIN 
    table3 ON view1.id=table3.id

#2


2  

You are getting a Cartesian result. Since you are not showing how many "1", "2" or "3" counts per "ID", just do a select sum() from those tables by themselves. Since a sum with no group by will always result in ONE record, you don't need any join and it will pull the results of one record per each summary with no Cartesian result. Since your original query was LEFT JOIN to the others, the ID would have already existed on table 1, so why re-query count distinct in each sub-table.

你得到了笛卡尔结果。由于您没有显示每个“ID”有多少“1”,“2”或“3”计数,因此只需从这些表中自行选择一个()。由于没有group by的总和将始终产生一条记录,因此您不需要任何连接,它将为每个摘要提取一条记录的结果,而不会产生笛卡尔结果。由于您的原始查询是LEFT JOIN到其他查询,因此ID已经存在于表1中,因此为什么在每个子表中重新查询计数不同。

SELECT
      SumForTable1.x1, 
      SumForTable2.x2,
      SumForTable2.x3,
      SumForTable2.x4,
      SumForTable3.x5,
      SumForTable3.x6,
      SumForTable3.x7
   FROM 
      ( select count(DISTINCT table1.id) AS x1
           from table1 ) SumForTable1,

      ( select SUM(IF(table2.val='1'), 1, 0)) AS x2, 
               SUM(IF(table2.val='2'), 1, 0)) AS x3, 
               SUM(IF(table2.val='3'), 1, 0)) AS x4
            from table2 ) SumForTable2,

      ( select SUM(IF(table3.val='1'), 1, 0)) AS x5, 
               SUM(IF(table3.val='2'), 1, 0)) AS x6, 
               SUM(IF(table3.val='3'), 1, 0)) AS x7
            from table3 ) SumForTable3

#3


0  

I'd remove duplicates on every table:

我会删除每个表上的重复项:

SELECT
    count(t1.id) AS t1, 
    SUM(IF(t2.val=1,1,0)) AS t21, 
    SUM(IF(t2.val=2,1,0)) AS t22, 
    SUM(IF(t2.val=3,1,0)) AS t23, 
    SUM(IF(t3.val=1,1,0)) AS t31, 
    SUM(IF(t3.val=2,1,0)) AS t32, 
    SUM(IF(t3.val=3,1,0)) AS t33
FROM (SELECT DISTINCT * FROM table1) as t1
JOIN (SELECT DISTINCT * FROM table2) as t2 ON t1.id=t2.id 
JOIN (SELECT DISTINCT * FROM table3) as t3 ON t1.id=t3.id;