SQL LIMIT获取最新记录

时间:2022-04-10 08:50:44

I am writing a script which will list 25 items of all 12 categories. Database structure is like:

我正在编写一个脚本,列出所有12个类别中的25个项目。数据库结构如下:

tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp 
---------------------------------------------

tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------

There are around 600,000 rows in the table tbl_items. I am using this SQL query:

表tbl_items中有大约600,000行。我正在使用此SQL查询:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  LIMIT 25

Using the same query in a loop for cat_id from 6000 to 6012. But I want the latest records of every category. If I use something like:

在循环中使用相同的查询,从6000到6012的cat_id。但我想要每个类别的最新记录。如果我使用类似的东西:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  ORDER BY e.timestamp
  LIMIT 25

..the query goes computing for approximately 10 minutes which is not acceptable. Can I use LIMIT more nicely to give the latest 25 records for each category?

..查询计算大约10分钟,这是不可接受的。我可以更好地使用LIMIT来为每个类别提供最新的25条记录吗?

Can anyone help me achieve this without ORDER BY? Any ideas or help will be highly appreciated.

没有ORDER BY,任何人都可以帮我实现吗?任何想法或帮助将受到高度赞赏。

EDIT

编辑

tbl_items

+---------------------+--------------+------+-----+---------+-------+
| Field               | Type         | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id             | int(11)      | NO   | PRI | 0       |       |
| item_name           | longtext     | YES  |     | NULL    |       |
| item_value          | longtext     | YES  |     | NULL    |       |
| timestamp           | datetime     | YES  |     | NULL    |       |
+---------------------+--------------+------+-----+---------+-------+

tbl_categories

+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id         | int(11)    | NO   | PRI | 0       |       |
| item_id        | int(11)    | NO   | PRI | 0       |       |
| timestamp      | datetime   | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+

3 个解决方案

#1


1  

First of all:

首先:

It seems to be a N:M relation between items and categories: a item may be in several categories. I say this because categories has item_id foreign key.

它似乎是项目和类别之间的N:M关系:项目可能属于多个类别。我这样说是因为类别有item_id外键。

If is not a N:M relationship then you should consider to change design. If it is a 1:N relationship, where a category has several items, then item must constain category_id foreign key.

如果不是N:M关系,那么你应该考虑改变设计。如果它是1:N关系,其中类别有多个项目,则item必须包含category_id外键。

Working with N:M:

与N:M合作:

I have rewrite your query to make a inner join insteat a cross join:

我已经重写了您的查询以使内部联接成为交叉联接:

  SELECT e.item_id, e.item_value
  FROM 
     tbl_items AS e
  JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
  WHERE  
     cat.cat_id = 6001
  ORDER BY 
     e.timestamp
  LIMIT 25

To optimize performance required indexes are:

要优化性能,所需的索引是:

create index idx_1 on tbl_categories( cat_id, item_id)

it is not mandatory an index on items because primary key is also indexed. A index that contains timestamp don't help as mutch. To be sure can try with an index on item with item_id and timestamp to avoid access to table and take values from index:

它不是必需的项目索引,因为主键也被索引。包含时间戳的索引没有帮助作为mutch。确保可以尝试使用item_id和timestamp的项目索引来避免访问表并从索引中获取值:

create index idx_2 on tbl_items( item_id, timestamp)

To increase performace you can change your loop over categories by a single query:

要提高性能,您可以通过单个查询更改类别循环:

  select T.cat_id, T.item_id, T.item_value from 
  (SELECT cat.cat_id, e.item_id, e.item_value
   FROM 
     tbl_items AS e
   JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
   ORDER BY 
     e.timestamp
   LIMIT 25
  ) T
  WHERE  
     T.cat_id between 6001 and 6012
  ORDER BY
     T.cat_id, T.item_id

Please, try this querys and come back with your comments to refine it if necessary.

请尝试此查询并返回您的评论,以便在必要时进行优化。

#2


1  

Can you add indices? If you add an index on the timestamp and other appropriate columns the ORDER BY won't take 10 minutes.

你能添加指数吗?如果在时间戳和其他适当的列上添加索引,则ORDER BY将不会花费10分钟。

#3


1  

Leaving aside all other factors I can tell you that the main reason why the query is so slow, is because the result involves longtext columns.

抛开所有其他因素,我可以告诉你,查询速度太慢的主要原因是因为结果涉及longtext列。

BLOB and TEXT fields in MySQL are mostly meant to store complete files, textual or binary. They are stored separately from the row data for InnoDB tables. Each time a query involes sorting (explicitly or for a group by), MySQL is sure to use disk for the sorting (because it can not be sure in advance how large any file is).

MySQL中的BLOB和TEXT字段主要用于存储完整的文件,文本或二进制文件。它们与InnoDB表的行数据分开存储。每次查询involes排序(显式或组播)时,MySQL肯定会使用磁盘进行排序(因为它无法预先确定任何文件的大小)。

And it is probably a rule of thumb: if you need to return more than a single row of a column in a query, the type of the field is almost never should be TEXT or BLOB, use VARCHAR or VARBINARY instead.

这可能是一个经验法则:如果您需要在查询中返回多于一行的列,则该字段的类型几乎永远不应该是TEXT或BLOB,而是使用VARCHAR或VARBINARY。

UPD

UPD

If you can not update the table, the query will hardly be fast with the current indexes and column types. But, anyway, here is a similar question and a popular solution to your problem: How to SELECT the newest four items per category?

如果无法更新表,则使用当前索引和列类型的查询几乎不会很快。但是,无论如何,这是一个类似的问题和一个流行的解决方案:如何选择每个类别最新的四个项目?

#1


1  

First of all:

首先:

It seems to be a N:M relation between items and categories: a item may be in several categories. I say this because categories has item_id foreign key.

它似乎是项目和类别之间的N:M关系:项目可能属于多个类别。我这样说是因为类别有item_id外键。

If is not a N:M relationship then you should consider to change design. If it is a 1:N relationship, where a category has several items, then item must constain category_id foreign key.

如果不是N:M关系,那么你应该考虑改变设计。如果它是1:N关系,其中类别有多个项目,则item必须包含category_id外键。

Working with N:M:

与N:M合作:

I have rewrite your query to make a inner join insteat a cross join:

我已经重写了您的查询以使内部联接成为交叉联接:

  SELECT e.item_id, e.item_value
  FROM 
     tbl_items AS e
  JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
  WHERE  
     cat.cat_id = 6001
  ORDER BY 
     e.timestamp
  LIMIT 25

To optimize performance required indexes are:

要优化性能,所需的索引是:

create index idx_1 on tbl_categories( cat_id, item_id)

it is not mandatory an index on items because primary key is also indexed. A index that contains timestamp don't help as mutch. To be sure can try with an index on item with item_id and timestamp to avoid access to table and take values from index:

它不是必需的项目索引,因为主键也被索引。包含时间戳的索引没有帮助作为mutch。确保可以尝试使用item_id和timestamp的项目索引来避免访问表并从索引中获取值:

create index idx_2 on tbl_items( item_id, timestamp)

To increase performace you can change your loop over categories by a single query:

要提高性能,您可以通过单个查询更改类别循环:

  select T.cat_id, T.item_id, T.item_value from 
  (SELECT cat.cat_id, e.item_id, e.item_value
   FROM 
     tbl_items AS e
   JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
   ORDER BY 
     e.timestamp
   LIMIT 25
  ) T
  WHERE  
     T.cat_id between 6001 and 6012
  ORDER BY
     T.cat_id, T.item_id

Please, try this querys and come back with your comments to refine it if necessary.

请尝试此查询并返回您的评论,以便在必要时进行优化。

#2


1  

Can you add indices? If you add an index on the timestamp and other appropriate columns the ORDER BY won't take 10 minutes.

你能添加指数吗?如果在时间戳和其他适当的列上添加索引,则ORDER BY将不会花费10分钟。

#3


1  

Leaving aside all other factors I can tell you that the main reason why the query is so slow, is because the result involves longtext columns.

抛开所有其他因素,我可以告诉你,查询速度太慢的主要原因是因为结果涉及longtext列。

BLOB and TEXT fields in MySQL are mostly meant to store complete files, textual or binary. They are stored separately from the row data for InnoDB tables. Each time a query involes sorting (explicitly or for a group by), MySQL is sure to use disk for the sorting (because it can not be sure in advance how large any file is).

MySQL中的BLOB和TEXT字段主要用于存储完整的文件,文本或二进制文件。它们与InnoDB表的行数据分开存储。每次查询involes排序(显式或组播)时,MySQL肯定会使用磁盘进行排序(因为它无法预先确定任何文件的大小)。

And it is probably a rule of thumb: if you need to return more than a single row of a column in a query, the type of the field is almost never should be TEXT or BLOB, use VARCHAR or VARBINARY instead.

这可能是一个经验法则:如果您需要在查询中返回多于一行的列,则该字段的类型几乎永远不应该是TEXT或BLOB,而是使用VARCHAR或VARBINARY。

UPD

UPD

If you can not update the table, the query will hardly be fast with the current indexes and column types. But, anyway, here is a similar question and a popular solution to your problem: How to SELECT the newest four items per category?

如果无法更新表,则使用当前索引和列类型的查询几乎不会很快。但是,无论如何,这是一个类似的问题和一个流行的解决方案:如何选择每个类别最新的四个项目?