This is my table schema
这是我的表模式。
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('message_id_seq'::regclass)
date_created | bigint |
content | text |
user_name | character varying(128) |
user_id | character varying(128) |
user_type | character varying(8) |
user_ip | character varying(128) |
user_avatar | character varying(128) |
chatbox_id | integer | not null
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"idx_message_chatbox_id" btree (chatbox_id)
"indx_date_created" btree (date_created)
Foreign-key constraints:
"message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE
This is the query
这是查询
SELECT *
FROM message
WHERE chatbox_id=$1
ORDER BY date_created
OFFSET 0
LIMIT 20;
($1 will be replaced by the actual ID)
($1将被实际ID替换)
It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records
它运行得很好,但是当它达到3.7百万条记录时,所有的SELECT查询都会消耗大量的CPU和RAM,然后整个系统就会崩溃。我必须临时备份所有当前消息并截断该表。我不知道发生了什么,因为当我有200万张唱片的时候一切都很好
I am using Postresql Server 9.1.5 with default options.
我使用Postresql Server 9.1.5作为默认选项。
Update the output of EXPLAIN ANALYZE
更新EXPLAIN ANALYZE的输出
Limit (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
-> Index Scan Backward using indx_date_created on message (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)
Update server specification
更新服务器规范
Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB
Final solution
最终的解决方案
Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested
最后,我可以超过300万条消息,我必须按照wildplasser的建议优化postgresql配置,并按照A.H.的建议创建一个新的索引
2 个解决方案
#1
8
You could try to give PostgreSQL a better index for that query. I propose something like this:
您可以尝试为该查询赋予PostgreSQL一个更好的索引。我的建议是:
create index invent_suitable_name on message(chatbox_id, date_created);
or
或
create index invent_suitable_name on message(chatbox_id, date_created desc);
#2
3
Try adding an index for chatbox_id, date_created
. For this particular query it will give you maximum performance.
尝试为chatbox_id添加一个创建了date_created的索引。对于这个特定的查询,它将提供最大的性能。
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
对于这种情况,当postgres“开始消耗大量的CPU和RAM”时,试着获得更多的细节。它可能是一个bug(默认配置postgres通常不会消耗太多内存)。
UPD My guess for the reason of bad performance:
UPD我的猜测是由于表现不佳:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE
Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
在某一时刻,该表会变得很大,以便进行全扫描以收集准确的统计数据。再分析之后,Postgresql得到了表的糟糕统计数据。其结果是-得到了糟糕的计划,包括:
- Index scan on
chatbox_id
; - 索引扫描chatbox_id;
- Ordering of returned records to get top 20.
- 订购返回的记录以获得前20名。
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
由于第一步返回的默认配置和大量记录,postgres*对磁盘上的文件进行排序。结果——表现不佳。
UPD2 EXPALIN ANALYZE
shows 0.376 ms
time and a good plan. Can you give details about a case with bad performance?
UPD2 EXPALIN分析显示了0.376 ms时间和一个好的计划。你能详细说明一个业绩不佳的案子吗?
#1
8
You could try to give PostgreSQL a better index for that query. I propose something like this:
您可以尝试为该查询赋予PostgreSQL一个更好的索引。我的建议是:
create index invent_suitable_name on message(chatbox_id, date_created);
or
或
create index invent_suitable_name on message(chatbox_id, date_created desc);
#2
3
Try adding an index for chatbox_id, date_created
. For this particular query it will give you maximum performance.
尝试为chatbox_id添加一个创建了date_created的索引。对于这个特定的查询,它将提供最大的性能。
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
对于这种情况,当postgres“开始消耗大量的CPU和RAM”时,试着获得更多的细节。它可能是一个bug(默认配置postgres通常不会消耗太多内存)。
UPD My guess for the reason of bad performance:
UPD我的猜测是由于表现不佳:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE
Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
在某一时刻,该表会变得很大,以便进行全扫描以收集准确的统计数据。再分析之后,Postgresql得到了表的糟糕统计数据。其结果是-得到了糟糕的计划,包括:
- Index scan on
chatbox_id
; - 索引扫描chatbox_id;
- Ordering of returned records to get top 20.
- 订购返回的记录以获得前20名。
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
由于第一步返回的默认配置和大量记录,postgres*对磁盘上的文件进行排序。结果——表现不佳。
UPD2 EXPALIN ANALYZE
shows 0.376 ms
time and a good plan. Can you give details about a case with bad performance?
UPD2 EXPALIN分析显示了0.376 ms时间和一个好的计划。你能详细说明一个业绩不佳的案子吗?