使用postgresql的ORDER、OFFSET和LIMIT来优化SELECT查询

时间:2021-09-06 04:07:17

This is my table schema

这是我的表模式。

Column       |          Type          |                      Modifiers                      
-------------+------------------------+------------------------------------------------------
id           | integer                | not null default nextval('message_id_seq'::regclass)
date_created | bigint                 |
content      | text                   |
user_name    | character varying(128) |
user_id      | character varying(128) |
user_type    | character varying(8)   |
user_ip      | character varying(128) |
user_avatar  | character varying(128) |
chatbox_id   | integer                | not null
Indexes:
    "message_pkey" PRIMARY KEY, btree (id)
    "idx_message_chatbox_id" btree (chatbox_id)
    "indx_date_created" btree (date_created)
Foreign-key constraints:
    "message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE

This is the query

这是查询

SELECT * 
FROM message 
WHERE chatbox_id=$1 
ORDER BY date_created 
OFFSET 0 
LIMIT 20;

($1 will be replaced by the actual ID)

($1将被实际ID替换)

It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records

它运行得很好,但是当它达到3.7百万条记录时,所有的SELECT查询都会消耗大量的CPU和RAM,然后整个系统就会崩溃。我必须临时备份所有当前消息并截断该表。我不知道发生了什么,因为当我有200万张唱片的时候一切都很好

I am using Postresql Server 9.1.5 with default options.

我使用Postresql Server 9.1.5作为默认选项。


Update the output of EXPLAIN ANALYZE

更新EXPLAIN ANALYZE的输出

Limit  (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
->  Index Scan Backward using indx_date_created on message  (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)

Update server specification

更新服务器规范

Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB

Final solution

最终的解决方案

Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

最后,我可以超过300万条消息,我必须按照wildplasser的建议优化postgresql配置,并按照A.H.的建议创建一个新的索引

2 个解决方案

#1


8  

You could try to give PostgreSQL a better index for that query. I propose something like this:

您可以尝试为该查询赋予PostgreSQL一个更好的索引。我的建议是:

create index invent_suitable_name on message(chatbox_id, date_created);

or

 create index invent_suitable_name on message(chatbox_id, date_created desc);

#2


3  

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

尝试为chatbox_id添加一个创建了date_created的索引。对于这个特定的查询,它将提供最大的性能。

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

对于这种情况,当postgres“开始消耗大量的CPU和RAM”时,试着获得更多的细节。它可能是一个bug(默认配置postgres通常不会消耗太多内存)。

UPD My guess for the reason of bad performance:

UPD我的猜测是由于表现不佳:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:

在某一时刻,该表会变得很大,以便进行全扫描以收集准确的统计数据。再分析之后,Postgresql得到了表的糟糕统计数据。其结果是-得到了糟糕的计划,包括:

  1. Index scan on chatbox_id;
  2. 索引扫描chatbox_id;
  3. Ordering of returned records to get top 20.
  4. 订购返回的记录以获得前20名。

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

由于第一步返回的默认配置和大量记录,postgres*对磁盘上的文件进行排序。结果——表现不佳。

UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

UPD2 EXPALIN分析显示了0.376 ms时间和一个好的计划。你能详细说明一个业绩不佳的案子吗?

#1


8  

You could try to give PostgreSQL a better index for that query. I propose something like this:

您可以尝试为该查询赋予PostgreSQL一个更好的索引。我的建议是:

create index invent_suitable_name on message(chatbox_id, date_created);

or

 create index invent_suitable_name on message(chatbox_id, date_created desc);

#2


3  

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

尝试为chatbox_id添加一个创建了date_created的索引。对于这个特定的查询,它将提供最大的性能。

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

对于这种情况,当postgres“开始消耗大量的CPU和RAM”时,试着获得更多的细节。它可能是一个bug(默认配置postgres通常不会消耗太多内存)。

UPD My guess for the reason of bad performance:

UPD我的猜测是由于表现不佳:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:

在某一时刻,该表会变得很大,以便进行全扫描以收集准确的统计数据。再分析之后,Postgresql得到了表的糟糕统计数据。其结果是-得到了糟糕的计划,包括:

  1. Index scan on chatbox_id;
  2. 索引扫描chatbox_id;
  3. Ordering of returned records to get top 20.
  4. 订购返回的记录以获得前20名。

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

由于第一步返回的默认配置和大量记录,postgres*对磁盘上的文件进行排序。结果——表现不佳。

UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?

UPD2 EXPALIN分析显示了0.376 ms时间和一个好的计划。你能详细说明一个业绩不佳的案子吗?