如何使这个长SQL查询成为完整的连接?

时间:2022-01-10 10:18:11

I have the following sql query. It is used to get stats on users who only logged in once this week. My problem is that I am missing some data. When I run a simple query to see how many users only logged in once this week I get five rows but this query only returns four rows. I assume this is because the tables are only left joined. As I am creating tables in the query I am having trouble and keep getting errors when trying to add in the union statement to make it a full join. Here is the query any help appreciated.

我有以下sql查询。它用于获取本周只登录过一次的用户的统计信息。我的问题是我遗漏了一些数据。当我运行一个简单的查询来查看本周只有多少用户登录时,我得到五行但这个查询只返回四行。我认为这是因为这些表只是连在一起。当我在查询中创建表时,我遇到了麻烦,并且在尝试添加union语句以使其成为完全连接时不断出错。这是查询任何帮助赞赏。

SELECT a.user_id, 
   a.logins, 
   a._date, 
   COALESCE(b.loaded, 0)    loaded, 
   COALESCE(c.attempted, 0) attempted, 
   COALESCE(d.correct, 0)   correct 
FROM   (SELECT l.user_id, 
           l.in_datetime, 
           Date_format(l.in_datetime, '%d/%m/%Y') _date, 
           Count(*)                               AS logins 
    FROM   production.login l 
    GROUP  BY user_id) a 
   LEFT JOIN (SELECT user_id, 
                     Count(*) AS loaded 
              FROM   production.score s 
                     JOIN processedquestion pq 
                       ON s.attempt_id = pq.attempt_id 
              GROUP  BY user_id) b 
          ON a.user_id = b.user_id 
   LEFT JOIN (SELECT user_id, 
                     Count(*) AS attempted 
              FROM   production.score s 
                     JOIN processedquestion pq 
                       ON s.attempt_id = pq.attempt_id 
              WHERE  s.selected_answer IS NOT NULL 
              GROUP  BY user_id) c 
          ON c.user_id = b.user_id 
   LEFT JOIN (SELECT user_id, 
                     Count(*) AS correct 
              FROM   production.score s 
                     JOIN processedquestion pq 
                       ON s.attempt_id = pq.attempt_id 
              WHERE  s.selected_answer = s.correct_answer 
              GROUP  BY user_id) d 
          ON c.user_id = d.user_id 
WHERE  logins = 1 
   AND Year(a.in_datetime) = Year(Curdate()) 
   AND Week(a.in_datetime) = Week(Curdate()) 

1 个解决方案

#1


1  

I don't think the problem has anything to do with full joins. The issue is that you need to move your logins date filter into the table expression. The query above looks for users with only a single login in the entire table which is why you have fewer results.

我不认为问题与完全连接有关。问题是您需要将登录日期过滤器移动到表表达式中。上面的查询查找整个表中只有一次登录的用户,这就是为什么结果较少的原因。

Also note that your query wouldn't have run on systems that correctly disallow the return of a non-aggregate column in a grouping query. In your case you only wanted a single date so it didn't really matter; however, the correct method is to use a dummy aggregate like min() on the _date calculation. I'm calling this out because it's a source of many problems for MySQL devs.

另请注意,您的查询不会在正确禁止在分组查询中返回非聚合列的系统上运行。在你的情况下,你只想要一个日期,所以它并不重要;但是,正确的方法是在_date计算中使用像min()这样的虚拟聚合。我之所以这样称呼是因为它是MySQL开发人员遇到的许多问题的根源。

The single login condition can also be expressed with having which has the benefit of keeping that part of the logic in one place without the need to expose a separate count column to reference later. I suppose that's possibly a matter of preference though I would argue it makes sense to use the tools built into the language.

单个登录条件也可以表示为具有将该部分逻辑保持在一个位置而不需要公开单独的计数列以便稍后引用的益处。我想这可能是一个偏好问题,尽管我认为使用语言中内置的工具是有意义的。

I've also consolidated the multiple joins in a single table which should make it a lot simpler to follow.

我还在一个表中整合了多个连接,这使得它更容易遵循。

select
    ...
from
    (
        select user_id, min(date_format(in_datetime, '%d/%m/%Y')) _date,
        from production.login
        where year(in_datetime) = year(curdate()) and week(in_datetime) = week(curdate())
        group by user_id
        having count(*) = 1
    ) users
        left outer join
    (
        select
            s.user_id, /* I qualified with s but not sure that was the right table */
            count(*) as loaded,
            count(s.selected_answer) as attempted,
            count(case when s.selected_answer = s.corrected_answer then 1 end) as correct
        from production.score s inner join processedquestion pq
            on pq.attempt_id = s.attempt_id
        group by user_id
    ) questions
        on questions.user_id = users.user_id

I have no idea how large your logins table is but the query might run more efficiently if you were to calculate a start and end date and use in_datetime between <start_of_week> and <end_of_week> rather than a check based on extracting the year and week parts. And actually I think you'll have worse problems when you use this in the first week of January.

我不知道您的登录表有多大,但如果您要计算开始和结束日期并在 之间使用in_datetime而不是基于提取年份和周份的检查,则查询可能会更有效地运行。实际上,当你在1月的第一周使用它时,我认为你会遇到更严重的问题。

#1


1  

I don't think the problem has anything to do with full joins. The issue is that you need to move your logins date filter into the table expression. The query above looks for users with only a single login in the entire table which is why you have fewer results.

我不认为问题与完全连接有关。问题是您需要将登录日期过滤器移动到表表达式中。上面的查询查找整个表中只有一次登录的用户,这就是为什么结果较少的原因。

Also note that your query wouldn't have run on systems that correctly disallow the return of a non-aggregate column in a grouping query. In your case you only wanted a single date so it didn't really matter; however, the correct method is to use a dummy aggregate like min() on the _date calculation. I'm calling this out because it's a source of many problems for MySQL devs.

另请注意,您的查询不会在正确禁止在分组查询中返回非聚合列的系统上运行。在你的情况下,你只想要一个日期,所以它并不重要;但是,正确的方法是在_date计算中使用像min()这样的虚拟聚合。我之所以这样称呼是因为它是MySQL开发人员遇到的许多问题的根源。

The single login condition can also be expressed with having which has the benefit of keeping that part of the logic in one place without the need to expose a separate count column to reference later. I suppose that's possibly a matter of preference though I would argue it makes sense to use the tools built into the language.

单个登录条件也可以表示为具有将该部分逻辑保持在一个位置而不需要公开单独的计数列以便稍后引用的益处。我想这可能是一个偏好问题,尽管我认为使用语言中内置的工具是有意义的。

I've also consolidated the multiple joins in a single table which should make it a lot simpler to follow.

我还在一个表中整合了多个连接,这使得它更容易遵循。

select
    ...
from
    (
        select user_id, min(date_format(in_datetime, '%d/%m/%Y')) _date,
        from production.login
        where year(in_datetime) = year(curdate()) and week(in_datetime) = week(curdate())
        group by user_id
        having count(*) = 1
    ) users
        left outer join
    (
        select
            s.user_id, /* I qualified with s but not sure that was the right table */
            count(*) as loaded,
            count(s.selected_answer) as attempted,
            count(case when s.selected_answer = s.corrected_answer then 1 end) as correct
        from production.score s inner join processedquestion pq
            on pq.attempt_id = s.attempt_id
        group by user_id
    ) questions
        on questions.user_id = users.user_id

I have no idea how large your logins table is but the query might run more efficiently if you were to calculate a start and end date and use in_datetime between <start_of_week> and <end_of_week> rather than a check based on extracting the year and week parts. And actually I think you'll have worse problems when you use this in the first week of January.

我不知道您的登录表有多大,但如果您要计算开始和结束日期并在 之间使用in_datetime而不是基于提取年份和周份的检查,则查询可能会更有效地运行。实际上,当你在1月的第一周使用它时,我认为你会遇到更严重的问题。