如何在LIMIT子句中使用大偏移量加速MySQL查询?

时间:2021-10-12 03:56:28

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:

在使用大偏移量限制mysql SELECT时,我遇到了性能问题:

SELECT * FROM table LIMIT m, n;

If the offset m is, say, larger than 1,000,000, the operation is very slow.

如果偏移量m大于1,000,000,则操作非常慢。

I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.

我必须使用limit m,n;我不能使用id> 1,000,000 limit n之类的东西。

How can I optimize this statement for better performance?

如何优化此声明以获得更好的性能?

6 个解决方案

#1


13  

Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.

也许您可以创建一个索引表,该表提供与目标表中的键相关的顺序键。然后,您可以将此索引表连接到目标表,并使用where子句更有效地获取所需的行。

#create table to store sequences
CREATE TABLE seq (
   seq_no int not null auto_increment,
   id int not null,
   primary key(seq_no),
   unique(id)
);

#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;

#now get 1000 rows from offset 1000000
SELECT mytable.* 
FROM mytable 
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;

#2


9  

There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.

互联网上有一篇博客文章,说明如何最好地选择要显示的行应该尽可能紧凑,因此:只有ids;然后生成完整的结果应该只为您选择的行获取所需的所有数据。

Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):

因此,SQL可能是这样的(未经测试,我不确定它实际上会做什么好事):

select A.* from table A 
  inner join (select id from table order by whatever limit m, n) B
  on A.id = B.id
order by A.whatever

If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.

如果你的SQL引擎过于原始而不允许这种类型的SQL语句,或者它没有改进任何东西,而不是希望,那么将这个单个语句分解为多个语句并将id捕获到数据结构中可能是值得的。

Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.

更新:我找到了我正在谈论的博客文章:这是关于编码恐怖的杰夫阿特伍德的“所有抽象都是失败的抽象”。

#3


4  

If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)

如果记录很大,则缓慢可能来自加载数据。如果id列被索引,那么只选择它会快得多。然后,您可以使用IN子句为相应的id执行第二次查询(或者可以使用第一个查询中的min和max id来表示WHERE子句。)

slow:

慢:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

fast:

快速:

SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

SELECT * FROM table WHERE id IN (1,2,3...10)

#4


2  

Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.

Paul Dixon的答案确实是问题的解决方案,但您必须维护序列表并确保没有行间隙。

If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.

如果这是可行的,更好的解决方案是简单地确保原始表没有行间隙,并从id 1开始。然后使用id进行分页来获取行。

SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;

and so on...

等等...

#5


2  

I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:

如果您的表已有一个索引,我认为不需要创建单独的索引。如果是这样,那么您可以通过此主键进行排序,然后使用键的值来逐步执行:

SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;

Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.

另一个优化不是使用SELECT *而是使用ID,这样它就可以简单地读取索引,而不必定位所有数据(减少IO开销)。如果您需要其他一些列,那么也许您可以将这些列添加到索引中,以便使用主键(最有可能保存在内存中,因此不需要光盘查找)读取它们 - 尽管这不合适对于所有情况,所以你必须要玩。

I wrote an article with more details:

我写了一篇文章,详细介绍:

http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/

http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/

#6


0  

I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:

我最近遇到过这个问题。问题是需要解决的两个部分。首先,我必须在我的FROM子句中使用内部选择,仅在主键上对我进行限制和偏移:

$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId}  ORDER BY title ) as t");  

Then I could use that as the from part of my query:

然后我可以使用它作为我的查询的一部分:

'titles.id',
                            'title_eisbns_concat.eisbns_concat', 
                            'titles.pub_symbol', 
                            'titles.title', 
                            'titles.subtitle', 
                            'titles.contributor1', 
                            'titles.publisher', 
                            'titles.epub_date', 
                            'titles.ebook_price', 
                            'publisher_licenses.id as pub_license_id', 
                            'license_types.shortname',
                            $coversQuery
                        )
                        ->from($subQuery)
                        ->leftJoin('titles',  't.id',  '=', 'titles.id')
                        ->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol') 
                        ->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id') 
                        ->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id') 
                        ->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')

The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.

我第一次创建这个查询时,我在MySql中使用了OFFSET和LIMIT。这工作正常,直到我超过第100页然后偏移开始变得无法忍受的缓慢。在我的内部查询中将其更改为BETWEEN会为任何页面加速。我不确定为什么MySql没有加速OFFSET,但介于两者之间。

#1


13  

Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.

也许您可以创建一个索引表,该表提供与目标表中的键相关的顺序键。然后,您可以将此索引表连接到目标表,并使用where子句更有效地获取所需的行。

#create table to store sequences
CREATE TABLE seq (
   seq_no int not null auto_increment,
   id int not null,
   primary key(seq_no),
   unique(id)
);

#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;

#now get 1000 rows from offset 1000000
SELECT mytable.* 
FROM mytable 
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;

#2


9  

There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.

互联网上有一篇博客文章,说明如何最好地选择要显示的行应该尽可能紧凑,因此:只有ids;然后生成完整的结果应该只为您选择的行获取所需的所有数据。

Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):

因此,SQL可能是这样的(未经测试,我不确定它实际上会做什么好事):

select A.* from table A 
  inner join (select id from table order by whatever limit m, n) B
  on A.id = B.id
order by A.whatever

If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.

如果你的SQL引擎过于原始而不允许这种类型的SQL语句,或者它没有改进任何东西,而不是希望,那么将这个单个语句分解为多个语句并将id捕获到数据结构中可能是值得的。

Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.

更新:我找到了我正在谈论的博客文章:这是关于编码恐怖的杰夫阿特伍德的“所有抽象都是失败的抽象”。

#3


4  

If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)

如果记录很大,则缓慢可能来自加载数据。如果id列被索引,那么只选择它会快得多。然后,您可以使用IN子句为相应的id执行第二次查询(或者可以使用第一个查询中的min和max id来表示WHERE子句。)

slow:

慢:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

fast:

快速:

SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

SELECT * FROM table WHERE id IN (1,2,3...10)

#4


2  

Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.

Paul Dixon的答案确实是问题的解决方案,但您必须维护序列表并确保没有行间隙。

If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.

如果这是可行的,更好的解决方案是简单地确保原始表没有行间隙,并从id 1开始。然后使用id进行分页来获取行。

SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;

and so on...

等等...

#5


2  

I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:

如果您的表已有一个索引,我认为不需要创建单独的索引。如果是这样,那么您可以通过此主键进行排序,然后使用键的值来逐步执行:

SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;

Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.

另一个优化不是使用SELECT *而是使用ID,这样它就可以简单地读取索引,而不必定位所有数据(减少IO开销)。如果您需要其他一些列,那么也许您可以将这些列添加到索引中,以便使用主键(最有可能保存在内存中,因此不需要光盘查找)读取它们 - 尽管这不合适对于所有情况,所以你必须要玩。

I wrote an article with more details:

我写了一篇文章,详细介绍:

http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/

http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/

#6


0  

I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:

我最近遇到过这个问题。问题是需要解决的两个部分。首先,我必须在我的FROM子句中使用内部选择,仅在主键上对我进行限制和偏移:

$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId}  ORDER BY title ) as t");  

Then I could use that as the from part of my query:

然后我可以使用它作为我的查询的一部分:

'titles.id',
                            'title_eisbns_concat.eisbns_concat', 
                            'titles.pub_symbol', 
                            'titles.title', 
                            'titles.subtitle', 
                            'titles.contributor1', 
                            'titles.publisher', 
                            'titles.epub_date', 
                            'titles.ebook_price', 
                            'publisher_licenses.id as pub_license_id', 
                            'license_types.shortname',
                            $coversQuery
                        )
                        ->from($subQuery)
                        ->leftJoin('titles',  't.id',  '=', 'titles.id')
                        ->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol') 
                        ->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id') 
                        ->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id') 
                        ->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')

The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.

我第一次创建这个查询时,我在MySql中使用了OFFSET和LIMIT。这工作正常,直到我超过第100页然后偏移开始变得无法忍受的缓慢。在我的内部查询中将其更改为BETWEEN会为任何页面加速。我不确定为什么MySql没有加速OFFSET,但介于两者之间。