当我只需要每个组的最新条目时，如何加快mySQL查询?

So I was working on a project about transportation system. There are buses reporting to the server. All buses insert new rows into a table called Events every 2 sec, so it is a large table.

所以我在做一个关于交通系统的项目。有总线向服务器报告。所有总线都每2秒向名为Events的表中插入新行，因此它是一个大表。

Each bus has a unique busID. I want to get a table which contains all buses but only their latest report.

每辆公共汽车都有自己独特的业务。我想要一个包含所有总线的表，但只包含它们的最新报告。

Here are things I tried:

以下是我试过的:

Firstly I think I could ORDER BY time DESC LIMIT 20 It turns out that it is sorting the entire table first then doing the LIMIT thing second... which actually make sense, how else could it sort?
首先，我想我可以按时间DESC极限20排序结果是，它首先对整个表进行排序，然后再做极限的事情……这实际上是有道理的，它怎么排序呢?
So I was googling and found out that its much faster to sort with the index. So i did ORDER BY id DESC LIMIT 20; It gave me the latest 20 entries pretty fast.
我在谷歌上搜索，发现用索引排序要快得多。所以我按id DESC极限20排序;它很快就给了我最新的20个条目。
However I don't really need the latest 20 entries instead that I need the latest entry from all buses. So I was thinking about combining GROUP BY bus with ORDER BY id somehow but didn't really figure that out...
但是我并不需要最近的20个条目，而是需要所有总线的最新条目。所以我想把分组，总线和顺序，id结合起来，但是我没搞清楚。
Next I read about another post on this site about speeding things up when you only need the max value of a column in each group. So finally I came up with SELECT driver,busID,route,timestamp,MAX(id) FROM Events GROUP BY bus However it seems like using MAX(id) does not really help...
接下来，我阅读了这个网站上的另一篇文章，内容是当你只需要每个组中某一列的最大值时，就可以加快速度。最后我想到了SELECT driver、busID、route、timestamp、MAX(id) FROM Events GROUP BY bus，但是使用MAX(id)似乎并没有什么帮助……
And I think about first using ORDER BY id LIMIT (some number) to make a sub table, then find the newest entry of each bus within the sub table. But a problem is that, the tablet on the bus which is sending report might accidentally go offline thus unable to insert new rows. So I don't really know how large should I make the sub table so that it contains at least the latest entry of each bus...
我想先按id限制(某个数字)使用ORDER来创建子表，然后在子表中找到每个总线的最新条目。但问题是，正在发送报告的总线上的tablet可能会意外地脱机，从而无法插入新的行。所以我不知道我应该把子表做多大，这样它至少包含每辆公共汽车的最新入口……

So I am kinda running out of ideas... I am still a noob in mySQL, so maybe there are other better functions to use? Or maybe I am complexing things? I though it wouldn't be so hard to do at the begin ...

所以我的想法有点少了……我还是mySQL中的noob，也许还有更好的函数可以使用?或者是我把事情搞复杂了?我想一开始做起来不会那么难……

Any advice would be greatly appreciated.

如有任何建议，我们将不胜感激。

I also read about this Retrieving the last record in each group which is brilliant! But it still takes forever in my case...

我也读到这个检索每组的最后一个记录，非常棒!但在我的案例中，它仍然是永恒的…

CREATE TABLE `Events` (
  `id` bigint(20) NOT NULL auto_increment,
  `driver` varchar(200) collate utf8_unicode_ci default NULL,
  `bus` varchar(200) collate utf8_unicode_ci default NULL,
  `route` varchar(50) collate utf8_unicode_ci default NULL,
  `time` datetime default NULL,
  `clientTime` datetime default NULL,
  `latitude` decimal(30,20) default NULL,
  `longitude` decimal(30,20) default NULL,
  `accuracy` int(11) default NULL,
  `speed` decimal(30,20) default NULL,
  `heading` decimal(30,20) default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=66528487 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Thank you all for helping me！ But it is time to talk to professor! Maybe I am not supposed to do that hmm...

谢谢大家对我的帮助!但是现在是时候跟教授!也许我不该那么做……

3 个解决方案

#1

You have to use indexes: id is a primary key and is already indexed, so sorting by id should be fast, but bus and time are not indexed. I would add a composite unique index like this:

您必须使用索引:id是一个主键，并且已经被索引，所以按id排序应该很快，但是总线和时间没有被索引。我将添加一个这样的组合唯一索引:

alter table Events add unique index idx_bus_time (bus, time);

this should make the following query much faster:

这将使以下查询快得多:

select bus, max(time)
from Events
group by bus

then you can easily get the last info for each bus:

然后你可以很容易地得到每辆巴士的最后信息:

select e.*
from Events e INNER JOIN (
  select bus, max(time) max_time
  from Events
  group by bus) l on e.bus=l.bus AND e.time=l.max_time

another thing you can do to improve performances is to create a busses table:

另一件可以改进性能的事情是创建一个总线表:

create table busses (
  id int primary key auto_increment,
  bus varchar(200)
)

and alter the original table, and use a bus_id INT instead of the bus VARCHAR(200), and index the bus_id and the time column together.

修改原始表，使用bus_id INT而不是bus VARCHAR(200)，并将bus_id和time列一起索引。

#2

I would rather make it simple,

我宁愿简单点，

I would add one column in the table i.e. < latest_record >...

我将在表中添加一列，即< latest_record >……

Now, for the particular < bus_id >'s latest record or Event, it would have value < 0 > in < latest_record > field.

现在，对于特定的< bus_id >的最新记录或事件，它在< latest_record >字段中具有< 0 >的值。

once another entry/event for the same < bus_id > is arrived, before inserting that entry/event, I would update previous_latest entry/event's < latest_record >'s value to '1' and the newly arrived entry will have < latest_record >'s value to '0'

对于相同的< bus_id >的另一个条目/事件，在插入该条目/事件之前，我将更新previous_latest条目/事件的< latest_record >'s值为“1”，新到达的条目将有< latest_record >'s值为“0”

Now, you just have to make Index on < latest_record > and you can find all unique bus_id's latest entry by filtering latest_record='0' in WHERE clause

现在，只需在< latest_record >上创建索引，就可以通过在WHERE子句中过滤latest_record='0'来找到所有惟一的bus_id的最新条目

#3

The solution would be so simple if you could simply INSERT into a new table that contains one row per bus -- the current status of the bus.

如果您可以简单地将解决方案插入到一个新表中，其中每个总线包含一行——总线的当前状态。

#1