I've got a table that contains (let's say) all the times when a user looked at a specific webpage. Users can of course look at a page more than once, so there can be multiple entries for users and pages, like so:
我有一个表,它包含(比方说)用户查看特定网页时的所有时间。用户当然可以不止一次地查看页面,因此用户和页面可以有多个条目,比如:
nid time user page_id
25 8000 4 467
24 7000 1 482
23 6000 1 484
22 5000 1 482
21 4000 5 467
20 3000 4 467
I want to do a query that returns the rows corresponding to every page viewed by every user WITH THE CATCH THAT if a user looked at a page more than once, I get the row corresponding to the most recent view (i.e., the largest value of TIME). Thus, I should get this:
我想做一个查询,它返回每个用户查看的每一页对应的行,如果用户不止一次查看页面,我就会得到与最近的视图相对应的行(即:,时间的最大价值)。因此,我应该得到:
nid time user page_id
25 8000 4 467
24 7000 1 482
23 6000 1 484
21 4000 5 467
We lose row 22 because user 1 looked at page 482 at a later time, and we lose row 20 because user 4 looked at page 467 at a later time.
我们丢失了第22行,因为用户1在以后的时间查看了482页,我们丢失了第20行,因为用户4在以后的时间看了467页。
I almost have this figured out, but I can't quite crack it, while also convincing myself that the results I'm getting will be generally correct and not just an accident of my test cases. I keep going back and forth between GROUP BY or DISTINCT queries and embedded queries, and then my brain explodes. Any suggestions? Thanks!
我几乎已经弄明白了这一点,但我不能完全理解它,同时我也要说服自己,我得到的结果通常是正确的,而不仅仅是我的测试用例的意外。我不断地在不同的查询和嵌入的查询之间来回切换,然后我的大脑就爆炸了。有什么建议吗?谢谢!
3 个解决方案
#1
20
If you need the full row you can use this:
如果你需要整行,你可以使用这个:
SELECT fullTable.nid as nid,
recent.time as time,
fullTable.user as user,
fullTable.page_id as page_id
FROM TableName fullTable
INNER JOIN (SELECT MAX(t1.time) as time, t1.user, t1.page_id
FROM TableName t1
GROUP BY user, page_id) recent
ON recent.time = fullTable.time AND
recent.user = fullTable.user AND
recent.page_id = fullTable.page_id
ORDER BY time DESC
If you ask for a column outside the "group by" clause, mysql can return any value for this column inside this group. So if all the values inside the group are not the same, that is your case, you can't include it directly on the select clause, you need to use a join.
如果您请求“group by”子句之外的列,那么mysql可以为这个组中的列返回任何值。如果组内的所有值都不相同,这是你的情况,你不能将它直接包含在select子句中,你需要使用join。
You can read more about not grouped columns on MySQL on the reference
您可以阅读更多关于MySQL中未分组的列的参考资料。
If you don't need the nid field, you can use this other:
如果您不需要nid字段,您可以使用另一个:
SELECT MAX(time) as time, user, page_id
FROM TableName
GROUP BY user, page_id
ORDER BY time DESC
#2
1
Try this:
试试这个:
SELECT *
FROM <YOUR_TABLE>
WHERE (user, page_id, time) IN
(
SELECT user, page_id, MAX(time) time
FROM <YOUR_TABLE>
GROUP BY user, page_id
)
#3
0
SELECT nid, MAX(time), user, page_id
FROM TableName
GROUP BY nid, user, page_id
#1
20
If you need the full row you can use this:
如果你需要整行,你可以使用这个:
SELECT fullTable.nid as nid,
recent.time as time,
fullTable.user as user,
fullTable.page_id as page_id
FROM TableName fullTable
INNER JOIN (SELECT MAX(t1.time) as time, t1.user, t1.page_id
FROM TableName t1
GROUP BY user, page_id) recent
ON recent.time = fullTable.time AND
recent.user = fullTable.user AND
recent.page_id = fullTable.page_id
ORDER BY time DESC
If you ask for a column outside the "group by" clause, mysql can return any value for this column inside this group. So if all the values inside the group are not the same, that is your case, you can't include it directly on the select clause, you need to use a join.
如果您请求“group by”子句之外的列,那么mysql可以为这个组中的列返回任何值。如果组内的所有值都不相同,这是你的情况,你不能将它直接包含在select子句中,你需要使用join。
You can read more about not grouped columns on MySQL on the reference
您可以阅读更多关于MySQL中未分组的列的参考资料。
If you don't need the nid field, you can use this other:
如果您不需要nid字段,您可以使用另一个:
SELECT MAX(time) as time, user, page_id
FROM TableName
GROUP BY user, page_id
ORDER BY time DESC
#2
1
Try this:
试试这个:
SELECT *
FROM <YOUR_TABLE>
WHERE (user, page_id, time) IN
(
SELECT user, page_id, MAX(time) time
FROM <YOUR_TABLE>
GROUP BY user, page_id
)
#3
0
SELECT nid, MAX(time), user, page_id
FROM TableName
GROUP BY nid, user, page_id