I have the following sql query. It is used to get stats on users who only logged in once this week. My problem is that I am missing some data. When I run a simple query to see how many users only logged in once this week I get five rows but this query only returns four rows. I assume this is because the tables are only left joined. As I am creating tables in the query I am having trouble and keep getting errors when trying to add in the union statement to make it a full join. Here is the query any help appreciated.
我有以下sql查询。它用于获取本周只登录过一次的用户的统计信息。我的问题是我遗漏了一些数据。当我运行一个简单的查询来查看本周只有多少用户登录时,我得到五行但这个查询只返回四行。我认为这是因为这些表只是连在一起。当我在查询中创建表时,我遇到了麻烦,并且在尝试添加union语句以使其成为完全连接时不断出错。这是查询任何帮助赞赏。
SELECT a.user_id,
a.logins,
a._date,
COALESCE(b.loaded, 0) loaded,
COALESCE(c.attempted, 0) attempted,
COALESCE(d.correct, 0) correct
FROM (SELECT l.user_id,
l.in_datetime,
Date_format(l.in_datetime, '%d/%m/%Y') _date,
Count(*) AS logins
FROM production.login l
GROUP BY user_id) a
LEFT JOIN (SELECT user_id,
Count(*) AS loaded
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
GROUP BY user_id) b
ON a.user_id = b.user_id
LEFT JOIN (SELECT user_id,
Count(*) AS attempted
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
WHERE s.selected_answer IS NOT NULL
GROUP BY user_id) c
ON c.user_id = b.user_id
LEFT JOIN (SELECT user_id,
Count(*) AS correct
FROM production.score s
JOIN processedquestion pq
ON s.attempt_id = pq.attempt_id
WHERE s.selected_answer = s.correct_answer
GROUP BY user_id) d
ON c.user_id = d.user_id
WHERE logins = 1
AND Year(a.in_datetime) = Year(Curdate())
AND Week(a.in_datetime) = Week(Curdate())
1 个解决方案
#1
1
I don't think the problem has anything to do with full joins. The issue is that you need to move your logins date filter into the table expression. The query above looks for users with only a single login in the entire table which is why you have fewer results.
我不认为问题与完全连接有关。问题是您需要将登录日期过滤器移动到表表达式中。上面的查询查找整个表中只有一次登录的用户,这就是为什么结果较少的原因。
Also note that your query wouldn't have run on systems that correctly disallow the return of a non-aggregate column in a grouping query. In your case you only wanted a single date so it didn't really matter; however, the correct method is to use a dummy aggregate like min()
on the _date
calculation. I'm calling this out because it's a source of many problems for MySQL devs.
另请注意,您的查询不会在正确禁止在分组查询中返回非聚合列的系统上运行。在你的情况下,你只想要一个日期,所以它并不重要;但是,正确的方法是在_date计算中使用像min()这样的虚拟聚合。我之所以这样称呼是因为它是MySQL开发人员遇到的许多问题的根源。
The single login condition can also be expressed with having
which has the benefit of keeping that part of the logic in one place without the need to expose a separate count column to reference later. I suppose that's possibly a matter of preference though I would argue it makes sense to use the tools built into the language.
单个登录条件也可以表示为具有将该部分逻辑保持在一个位置而不需要公开单独的计数列以便稍后引用的益处。我想这可能是一个偏好问题,尽管我认为使用语言中内置的工具是有意义的。
I've also consolidated the multiple joins in a single table which should make it a lot simpler to follow.
我还在一个表中整合了多个连接,这使得它更容易遵循。
select
...
from
(
select user_id, min(date_format(in_datetime, '%d/%m/%Y')) _date,
from production.login
where year(in_datetime) = year(curdate()) and week(in_datetime) = week(curdate())
group by user_id
having count(*) = 1
) users
left outer join
(
select
s.user_id, /* I qualified with s but not sure that was the right table */
count(*) as loaded,
count(s.selected_answer) as attempted,
count(case when s.selected_answer = s.corrected_answer then 1 end) as correct
from production.score s inner join processedquestion pq
on pq.attempt_id = s.attempt_id
group by user_id
) questions
on questions.user_id = users.user_id
I have no idea how large your logins table is but the query might run more efficiently if you were to calculate a start and end date and use in_datetime between <start_of_week> and <end_of_week>
rather than a check based on extracting the year and week parts. And actually I think you'll have worse problems when you use this in the first week of January.
我不知道您的登录表有多大,但如果您要计算开始和结束日期并在
#1
1
I don't think the problem has anything to do with full joins. The issue is that you need to move your logins date filter into the table expression. The query above looks for users with only a single login in the entire table which is why you have fewer results.
我不认为问题与完全连接有关。问题是您需要将登录日期过滤器移动到表表达式中。上面的查询查找整个表中只有一次登录的用户,这就是为什么结果较少的原因。
Also note that your query wouldn't have run on systems that correctly disallow the return of a non-aggregate column in a grouping query. In your case you only wanted a single date so it didn't really matter; however, the correct method is to use a dummy aggregate like min()
on the _date
calculation. I'm calling this out because it's a source of many problems for MySQL devs.
另请注意,您的查询不会在正确禁止在分组查询中返回非聚合列的系统上运行。在你的情况下,你只想要一个日期,所以它并不重要;但是,正确的方法是在_date计算中使用像min()这样的虚拟聚合。我之所以这样称呼是因为它是MySQL开发人员遇到的许多问题的根源。
The single login condition can also be expressed with having
which has the benefit of keeping that part of the logic in one place without the need to expose a separate count column to reference later. I suppose that's possibly a matter of preference though I would argue it makes sense to use the tools built into the language.
单个登录条件也可以表示为具有将该部分逻辑保持在一个位置而不需要公开单独的计数列以便稍后引用的益处。我想这可能是一个偏好问题,尽管我认为使用语言中内置的工具是有意义的。
I've also consolidated the multiple joins in a single table which should make it a lot simpler to follow.
我还在一个表中整合了多个连接,这使得它更容易遵循。
select
...
from
(
select user_id, min(date_format(in_datetime, '%d/%m/%Y')) _date,
from production.login
where year(in_datetime) = year(curdate()) and week(in_datetime) = week(curdate())
group by user_id
having count(*) = 1
) users
left outer join
(
select
s.user_id, /* I qualified with s but not sure that was the right table */
count(*) as loaded,
count(s.selected_answer) as attempted,
count(case when s.selected_answer = s.corrected_answer then 1 end) as correct
from production.score s inner join processedquestion pq
on pq.attempt_id = s.attempt_id
group by user_id
) questions
on questions.user_id = users.user_id
I have no idea how large your logins table is but the query might run more efficiently if you were to calculate a start and end date and use in_datetime between <start_of_week> and <end_of_week>
rather than a check based on extracting the year and week parts. And actually I think you'll have worse problems when you use this in the first week of January.
我不知道您的登录表有多大,但如果您要计算开始和结束日期并在