MySQL group by和max返回错误的行

时间:2020-12-03 12:28:10

I have two tables and I try to find the "post" with the highest score per day.

我有两张桌子,我试着找到每天得分最高的“帖子”。

CREATE TABLE IF NOT EXISTS `posts_points` (
  `post_id` int(10) unsigned NOT NULL,
  `comments` smallint(5) unsigned NOT NULL,
  `likes` smallint(5) unsigned NOT NULL,
  `favorites` smallint(5) unsigned NOT NULL,
   PRIMARY KEY (`post_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


CREATE TABLE IF NOT EXISTS `posts` (
  `profile_id` int(10) unsigned NOT NULL,
  `post_id` int(10) unsigned NOT NULL,
  `pubdate_utc` datetime NOT NULL,
  PRIMARY KEY (`post_id`),
  KEY `profile_id` (`profile_id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

I have tried the query below. It returns the correct score but the other columns are just random rows. What am I doing wrong ?

我已经尝试过以下查询。它返回正确的分数,但其他列只是随机行。我究竟做错了什么 ?

SELECT p.post_id, p.profile_id
   , MAX(t1.score)
   , DATE_FORMAT(t1.pubdate_utc, '%d %b') post_date
   , DATE(t1.pubdate_utc) mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) t1
INNER JOIN posts p ON t1.post_id = p.post_id
   AND t1.pubdate_utc = p.pubdate_utc
GROUP BY mydate
ORDER BY mydate DESC
LIMIT 18;

5 个解决方案

#1


2  

I run into this problem all the time. When MySQL runs an aggregate function, for any non-aggregated columns, it simply pulls the first data it runs across for that group, whether it is from the MAX row or not. So what you have to do is order the data in an inner query such that the maxes are first in their groups. See if this works for you:

我一直遇到这个问题。当MySQL运行聚合函数时,对于任何非聚合列,它只是拉出它为该组运行的第一个数据,无论它是否来自MAX行。因此,您需要做的是在内部查询中对数据进行排序,使得maxes在其组中是第一位的。看看这是否适合你:

SELECT t.post_id,
       t.profile_id,
       t.score,
       t.pubdate_utc
FROM (SELECT p.profile_id,
             p.post_id,
             p.pubdate_utc,
             (pp.comments + pp.likes + pp.favorites) score
      FROM posts p
      JOIN posts_points pp ON p.post_id = pp.post_id
      WHERE p.pubdate_utc >= DATE_ADD(DATE(NOW()), INTERVAL -17 DAY)
      ORDER BY score DESC
     ) t
GROUP BY DATE(t.pubdate_utc) DESC
;

Notice that I use no MAX function here. Ordering by score descending and then grouping by date in the outer query will pull up the highest score by date. Also notice that I put the WHERE clause in the inner query. Inner queries like this (tho sometimes necessary) are not very efficient, since they have no indexes for the outer query to optimize on, so make sure your inner result set is as small as it can be. Lastly, notice the GROUP BY DATE(t.pubdate_utc). If I did not reduce it down to just the date information, there would be a lot more than 18 results, as times are also counted then.

请注意,我在这里没有使用MAX功能。按分数降序排序,然后在外部查询中按日期分组将按日期提取最高分。另请注意,我将WHERE子句放在内部查询中。像这样的内部查询(有时是必要的)不是很有效,因为它们没有外部查询的优化索引,所以要确保你的内部结果集尽可能小。最后,请注意GROUP BY DATE(t.pubdate_utc)。如果我没有将其减少到仅仅是日期信息,那么将会有超过18个结果,因为时间也被计算在内。

Edit: Changed to INTERVAL -17 DAY to give up to 18 results instead of 19.

编辑:更改为INTERVAL -17 DAY,最多可提供18个结果,而不是19个。

#2


0  

Column1  Column2
C        d
A        any thing
D        y
B        z  

If you order this data by Column1 then it looks like this .....orderby just orders the first column....

如果您通过Column1订购此数据,那么它看起来像这样..... orderby只是命令第一列....

Column1  Column2
A        any thing
B        z            
C        d
D        y

#3


0  

is a little tricky to understand what you want to do.

了解你想做什么有点棘手。

The words (columns (posts, comments, favorites) and PK), I understood that you update the values ​​increasing, and does not record each vote.

单词(列(帖子,评论,收藏)和PK),我知道你更新的值增加,并不记录每个投票。

This select returns the data from the post, and the score, by ordering bigger point, limited to 18.

此选择返回帖子中的数据和分数,通过排序更大的点,限制为18。

        SELECT P.post_id,
               P.profile_id,
               (PP.comments + PP.likes + PP.favorites) AS score,
               DATE_FORMAT (P.pubdate_utc, '%d %b') AS post_data,
               DATE (P.pubdate_utc) AS mydate
          FROM posts P
    INNER JOIN posts_points PP
            ON (= P.post_id PP.post_id)
      ORDER BY 3 DESC
         LIMIT 18;

If you want to select the most votes of the day, you must record the likes / favs different, needs data in that table (posts_points).

如果要选择当天最多的投票,则必须记录该表中的likes / favs,需要数据(posts_points)。

#4


0  

Wow! Tricky. For example, there is always the possibility of ties for max.

哇!棘手。例如,总是存在最大关系的可能性。

The solution below creates an intermediate list of day's max_scores, then gets all posts whose scores are equal to the max for their day. It returns ties, so you may get two rows for a given day. I beg your forgiveness that I can't test this, so give feedback, and I'm sure we can get this to do what you need.

下面的解决方案创建了一天的max_scores的中间列表,然后获得其分数等于当天最大值的所有帖子。它会返回关系,因此您可以在给定的一天获得两行。我请求你原谅我无法测试,所以给予反馈,我相信我们可以做到这一点来做你需要的。

SELECT p.profile_id, p.post_id, p.pubdate_utc
, DATE_FORMAT(p.pubdate_utc, '%d %b') AS post_date
, DATE(p.pubdate_utc) AS mydate
, (pp.comments + pp.likes + pp.favorites) AS score
FROM posts p 
INNER JOIN posts_points pp ON p.post_id = pp.post_id
INNER JOIN 
(
    SELECT p.pubdate_utc AS max_date, 
    (pp.comments + pp.likes + pp.favorites) AS max_score
    FROM posts p2 
    INNER JOIN posts_points pp2 ON p2.post_id = pp2.post_id
) m ON score = m.max_score
AND mydate = m.max_date
ORDER BY mydate DESC
LIMIT 18;

#5


0  

you can see this query .Inner query is firstly fetch those rows which have same post_id in both table than sum (pp.comments + pp.likes + pp.favorites) as score .Outer Query is fetch max score and doing group by on the date wise....

你可以看到这个查询。内部查询首先获取那些在两个表中具有相同post_id的行而不是sum(pp.comments + pp.likes + pp.favorites)作为得分.Outer Query是获取最大分数并且在约会......

SELECT post_id, profile_id
   , MAX(score)
   , DATE_FORMAT(pubdate_utc, '%d %b') post_date
   , DATE(pubdate_utc) as mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) 
GROUP BY pubdate_utc
ORDER BY pubdate_utc DESC

#1


2  

I run into this problem all the time. When MySQL runs an aggregate function, for any non-aggregated columns, it simply pulls the first data it runs across for that group, whether it is from the MAX row or not. So what you have to do is order the data in an inner query such that the maxes are first in their groups. See if this works for you:

我一直遇到这个问题。当MySQL运行聚合函数时,对于任何非聚合列,它只是拉出它为该组运行的第一个数据,无论它是否来自MAX行。因此,您需要做的是在内部查询中对数据进行排序,使得maxes在其组中是第一位的。看看这是否适合你:

SELECT t.post_id,
       t.profile_id,
       t.score,
       t.pubdate_utc
FROM (SELECT p.profile_id,
             p.post_id,
             p.pubdate_utc,
             (pp.comments + pp.likes + pp.favorites) score
      FROM posts p
      JOIN posts_points pp ON p.post_id = pp.post_id
      WHERE p.pubdate_utc >= DATE_ADD(DATE(NOW()), INTERVAL -17 DAY)
      ORDER BY score DESC
     ) t
GROUP BY DATE(t.pubdate_utc) DESC
;

Notice that I use no MAX function here. Ordering by score descending and then grouping by date in the outer query will pull up the highest score by date. Also notice that I put the WHERE clause in the inner query. Inner queries like this (tho sometimes necessary) are not very efficient, since they have no indexes for the outer query to optimize on, so make sure your inner result set is as small as it can be. Lastly, notice the GROUP BY DATE(t.pubdate_utc). If I did not reduce it down to just the date information, there would be a lot more than 18 results, as times are also counted then.

请注意,我在这里没有使用MAX功能。按分数降序排序,然后在外部查询中按日期分组将按日期提取最高分。另请注意,我将WHERE子句放在内部查询中。像这样的内部查询(有时是必要的)不是很有效,因为它们没有外部查询的优化索引,所以要确保你的内部结果集尽可能小。最后,请注意GROUP BY DATE(t.pubdate_utc)。如果我没有将其减少到仅仅是日期信息,那么将会有超过18个结果,因为时间也被计算在内。

Edit: Changed to INTERVAL -17 DAY to give up to 18 results instead of 19.

编辑:更改为INTERVAL -17 DAY,最多可提供18个结果,而不是19个。

#2


0  

Column1  Column2
C        d
A        any thing
D        y
B        z  

If you order this data by Column1 then it looks like this .....orderby just orders the first column....

如果您通过Column1订购此数据,那么它看起来像这样..... orderby只是命令第一列....

Column1  Column2
A        any thing
B        z            
C        d
D        y

#3


0  

is a little tricky to understand what you want to do.

了解你想做什么有点棘手。

The words (columns (posts, comments, favorites) and PK), I understood that you update the values ​​increasing, and does not record each vote.

单词(列(帖子,评论,收藏)和PK),我知道你更新的值增加,并不记录每个投票。

This select returns the data from the post, and the score, by ordering bigger point, limited to 18.

此选择返回帖子中的数据和分数,通过排序更大的点,限制为18。

        SELECT P.post_id,
               P.profile_id,
               (PP.comments + PP.likes + PP.favorites) AS score,
               DATE_FORMAT (P.pubdate_utc, '%d %b') AS post_data,
               DATE (P.pubdate_utc) AS mydate
          FROM posts P
    INNER JOIN posts_points PP
            ON (= P.post_id PP.post_id)
      ORDER BY 3 DESC
         LIMIT 18;

If you want to select the most votes of the day, you must record the likes / favs different, needs data in that table (posts_points).

如果要选择当天最多的投票,则必须记录该表中的likes / favs,需要数据(posts_points)。

#4


0  

Wow! Tricky. For example, there is always the possibility of ties for max.

哇!棘手。例如,总是存在最大关系的可能性。

The solution below creates an intermediate list of day's max_scores, then gets all posts whose scores are equal to the max for their day. It returns ties, so you may get two rows for a given day. I beg your forgiveness that I can't test this, so give feedback, and I'm sure we can get this to do what you need.

下面的解决方案创建了一天的max_scores的中间列表,然后获得其分数等于当天最大值的所有帖子。它会返回关系,因此您可以在给定的一天获得两行。我请求你原谅我无法测试,所以给予反馈,我相信我们可以做到这一点来做你需要的。

SELECT p.profile_id, p.post_id, p.pubdate_utc
, DATE_FORMAT(p.pubdate_utc, '%d %b') AS post_date
, DATE(p.pubdate_utc) AS mydate
, (pp.comments + pp.likes + pp.favorites) AS score
FROM posts p 
INNER JOIN posts_points pp ON p.post_id = pp.post_id
INNER JOIN 
(
    SELECT p.pubdate_utc AS max_date, 
    (pp.comments + pp.likes + pp.favorites) AS max_score
    FROM posts p2 
    INNER JOIN posts_points pp2 ON p2.post_id = pp2.post_id
) m ON score = m.max_score
AND mydate = m.max_date
ORDER BY mydate DESC
LIMIT 18;

#5


0  

you can see this query .Inner query is firstly fetch those rows which have same post_id in both table than sum (pp.comments + pp.likes + pp.favorites) as score .Outer Query is fetch max score and doing group by on the date wise....

你可以看到这个查询。内部查询首先获取那些在两个表中具有相同post_id的行而不是sum(pp.comments + pp.likes + pp.favorites)作为得分.Outer Query是获取最大分数并且在约会......

SELECT post_id, profile_id
   , MAX(score)
   , DATE_FORMAT(pubdate_utc, '%d %b') post_date
   , DATE(pubdate_utc) as mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) 
GROUP BY pubdate_utc
ORDER BY pubdate_utc DESC