2013-05-29: Updated question with latest configuration and extra info. Earlier I was testing in a virtualbox image. Now I am testing on the productive server, which reflects the real world much better. Question should be fully clear now. If you have helped me before, read again carefully
2013-05-29:更新的问题与最新的配置和额外的信息。早些时候,我在一个虚拟框映像中进行测试。现在我正在生产服务器上进行测试,它能更好地反映真实世界。问题现在应该完全清楚了。如果你以前帮助过我,请再仔细阅读
Currently I have found a query that is very slow in PostgreSQL, though I do not understand how it can be slow. I scaled it down a bit so it's much smaller to post here (and much faster, but still slow!).
目前,我在PostgreSQL中发现了一个非常慢的查询,尽管我不理解它是如何变慢的。我把它缩小了一点,所以放在这里要小得多(而且要快得多,但还是很慢!)
Little background: In this project I have adverts that belong to users. Users are part of an area within the country. An area can have multiple child areas, so the area table is a tree. A network is assigned to an area. When filtering on a network it should filter on that area and all its area childs in the tree. Because I can't query against a endless tree I have table that flattens this full tree.
小背景:在这个项目中,我有属于用户的广告。用户是这个国家的一部分。区域可以有多个子区域,因此区域表是树。一个网络被分配给一个区域。当对网络进行过滤时,它应该对该区域及其树中的所有区域进行过滤。因为我无法查询一个无限的树,所以我有一个表格可以将整个树变平。
So with 1 query (SELECT area_id FROM network_area_flatdeep WHERE network_id = 1)
I get all areas that belong to network 1: 63, 64, 65, 66, 67, 68, 69, 70
通过一个查询(从network_area_flatdeep中选择area_id,其中network_id = 1),我得到属于网络1的所有区域:63、64、65、66、67、68、69、70
This makes querying very easy.
这使得查询非常容易。
The slow query (Before testing everything has been VACUUM ANALYZED):
慢速查询(在测试所有内容之前都进行了真空分析):
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND a7_.id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
ORDER BY a0_.created_date DESC
LIMIT 60;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=41.10..34695.53 rows=60 width=12) (actual time=9.327..134.581 rows=60 loops=1)
-> Nested Loop (cost=41.10..2291276.69 rows=3967 width=12) (actual time=9.326..134.534 rows=60 loops=1)
Join Filter: (a7_.id = m6_.area_id)
Rows Removed by Join Filter: 22566
-> Nested Loop (cost=41.10..821630.16 rows=317633 width=24) (actual time=0.049..39.638 rows=22666 loops=1)
-> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..762000.64 rows=317633 width=16) (actual time=0.013..4.357 rows=2834 loops=1)
Filter: (status = 1)
Rows Removed by Filter: 21
-> Materialize (cost=41.10..73.38 rows=15 width=8) (actual time=0.000..0.004 rows=8 loops=2834)
-> Nested Loop (cost=41.10..73.30 rows=15 width=8) (actual time=0.031..0.073 rows=8 loops=1)
-> HashAggregate (cost=41.10..41.18 rows=8 width=4) (actual time=0.023..0.026 rows=8 loops=1)
-> Bitmap Heap Scan on network_area_flatdeep (cost=4.37..41.06 rows=15 width=4) (actual time=0.011..0.015 rows=8 loops=1)
Recheck Cond: (network_id = 1)
-> Bitmap Index Scan on idx_c29e880034128b91 (cost=0.00..4.36 rows=15 width=0) (actual time=0.007..0.007 rows=8 loops=1)
Index Cond: (network_id = 1)
-> Index Only Scan using area_pkey on area a7_ (cost=0.00..4.01 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=8)
Index Cond: (id = network_area_flatdeep.area_id)
Heap Fetches: 8
-> Index Scan using member_pkey on member m6_ (cost=0.00..4.61 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=22666)
Index Cond: (id = a0_.user_id)
Filter: (status = 1)
Rows Removed by Filter: 0
Total runtime: 134.698 ms
(23 rows)
The subquery itself is:
子查询本身:
EXPLAIN ANALYZE SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on network_area_flatdeep (cost=4.37..41.06 rows=15 width=4) (actual time=0.020..0.024 rows=8 loops=1)
Recheck Cond: (network_id = 1)
-> Bitmap Index Scan on idx_c29e880034128b91 (cost=0.00..4.36 rows=15 width=0) (actual time=0.012..0.012 rows=8 loops=1)
Index Cond: (network_id = 1)
Total runtime: 0.051 ms
(5 rows)
which results in : 63, 64, 65, 66, 67, 68, 69, 70
结果是:63 64 65 66 67 68 69 70 ?
So I've tried to hard code in the ids. And expected it to be faster (but is not):
我试着在id中硬编码。而且预计会更快(但不是):
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND a7_.id IN (63, 64, 65, 66, 67, 68, 69, 70)
ORDER BY a0_.created_date DESC
LIMIT 60;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17558.82..17558.97 rows=60 width=12) (actual time=56.594..56.670 rows=60 loops=1)
-> Sort (cost=17558.82..17560.07 rows=498 width=12) (actual time=56.593..56.621 rows=60 loops=1)
Sort Key: a0_.created_date
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=0.00..17541.62 rows=498 width=12) (actual time=0.047..53.808 rows=4478 loops=1)
-> Nested Loop (cost=0.00..3903.99 rows=286 width=4) (actual time=0.027..17.492 rows=8004 loops=1)
-> Seq Scan on area a7_ (cost=0.00..144.78 rows=8 width=4) (actual time=0.007..0.823 rows=8 loops=1)
Filter: (id = ANY ('{63,64,65,66,67,68,69,70}'::integer[]))
Rows Removed by Filter: 5081
-> Index Scan using idx_70e4fa78bd0f409c on member m6_ (cost=0.00..468.38 rows=152 width=8) (actual time=0.011..1.208 rows=1000 loops=8)
Index Cond: (area_id = a7_.id)
Filter: (status = 1)
Rows Removed by Filter: 2
-> Index Scan using idx_54f1f40ba76ed395 on advert a0_ (cost=0.00..47.49 rows=19 width=16) (actual time=0.002..0.003 rows=1 loops=8004)
Index Cond: (user_id = m6_.id)
Filter: (status = 1)
Rows Removed by Filter: 1
Total runtime: 56.744 ms
(18 rows)
Time: 57.995 ms
I tried to put the subquery in the INNER JOIN:
我试着将子查询放在内部连接中:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
AND m6_.area_id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.37..34966.73 rows=60 width=12) (actual time=2.957..42.443 rows=60 loops=1)
-> Nested Loop (cost=4.37..2311599.10 rows=3967 width=12) (actual time=2.956..42.394 rows=60 loops=1)
-> Nested Loop Semi Join (cost=4.37..2302173.51 rows=3967 width=20) (actual time=2.949..42.099 rows=60 loops=1)
Join Filter: (m6_.area_id = network_area_flatdeep.area_id)
Rows Removed by Join Filter: 22333
-> Nested Loop (cost=0.00..2230853.09 rows=316797 width=16) (actual time=0.028..18.612 rows=2829 loops=1)
-> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..762000.64 rows=317633 width=16) (actual time=0.012..3.802 rows=2834 loops=1)
Filter: (status = 1)
Rows Removed by Filter: 21
-> Index Scan using member_pkey on member m6_ (cost=0.00..4.61 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=2834)
Index Cond: (id = a0_.user_id)
Filter: (status = 1)
Rows Removed by Filter: 0
-> Materialize (cost=4.37..41.14 rows=15 width=4) (actual time=0.000..0.004 rows=8 loops=2829)
-> Bitmap Heap Scan on network_area_flatdeep (cost=4.37..41.06 rows=15 width=4) (actual time=0.009..0.015 rows=8 loops=1)
Recheck Cond: (network_id = 1)
-> Bitmap Index Scan on idx_c29e880034128b91 (cost=0.00..4.36 rows=15 width=0) (actual time=0.006..0.006 rows=8 loops=1)
Index Cond: (network_id = 1)
-> Index Only Scan using area_pkey on area a7_ (cost=0.00..2.37 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=60)
Index Cond: (id = m6_.area_id)
Heap Fetches: 60
Total runtime: 42.538 ms
(22 rows)
I tried to get rid of the subquery and make a proper JOIN statement to network_area_flatdeep
(I STRONGLY PREFER THIS VERSION):
我试图摆脱子查询,并对network_area_flatdeep(我非常喜欢这个版本)做一个适当的连接语句:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
INNER JOIN network_area_flatdeep n14_
ON a7_.id = n14_.area_id
AND ( n14_.network_id IN ( 1 ) )
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30031.18..30031.33 rows=60 width=12) (actual time=62.968..63.045 rows=60 loops=1)
-> Sort (cost=30031.18..30033.51 rows=934 width=12) (actual time=62.967..62.991 rows=60 loops=1)
Sort Key: a0_.created_date
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=0.00..29998.92 rows=934 width=12) (actual time=0.157..60.280 rows=4478 loops=1)
-> Nested Loop (cost=0.00..4401.66 rows=536 width=4) (actual time=0.029..20.488 rows=8004 loops=1)
-> Nested Loop (cost=0.00..120.69 rows=15 width=8) (actual time=0.015..0.084 rows=8 loops=1)
-> Index Scan using idx_c29e880034128b91 on network_area_flatdeep n14_ (cost=0.00..60.47 rows=15 width=4) (actual time=0.009..0.019 rows=8 loops=1)
Index Cond: (network_id = 1)
-> Index Only Scan using area_pkey on area a7_ (cost=0.00..4.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=8)
Index Cond: (id = n14_.area_id)
Heap Fetches: 8
-> Index Scan using idx_70e4fa78bd0f409c on member m6_ (cost=0.00..283.88 rows=152 width=8) (actual time=0.011..1.278 rows=1000 loops=8)
Index Cond: (area_id = a7_.id)
Filter: (status = 1)
Rows Removed by Filter: 2
-> Index Scan using idx_54f1f40ba76ed395 on advert a0_ (cost=0.00..47.57 rows=19 width=16) (actual time=0.003..0.003 rows=1 loops=8004)
Index Cond: (user_id = m6_.id)
Filter: (status = 1)
Rows Removed by Filter: 1
Total runtime: 63.125 ms
(21 rows)
Changing IN
to =
doesn't helper either.
改为=也没有帮助。
I tried using EXISTS as suggested by wilderplasser with an answer below.
我试着按照wilderplasser的建议使用exist,并给出如下答案。
When I either remove the ORDER BY or the network criteria, the query is fast (2ms). But why can't they play nice together?
当我删除ORDER BY或network criteria时,查询是快速的(2ms)。但是为什么他们不能友好相处呢?
Now then... when I change the network_id from '1' to '10'. The query is extremely fast like I want. 10 is a root network that contains ALL areas. So apperently the LESS results the JOIN needs to filter out, the faster the query is.
现在……当我将network_id从“1”更改为“10”时。查询速度非常快。10是一个包含所有区域的根网络。因此,很显然,连接需要过滤的结果越少,查询就越快。
The structure looks like this:
结构是这样的:
area (mapped to network 10 which contains 5089 areas in flat table and is FAST)
| |---- area 1 (network 1 which contains 8 areas in flat table and is SLOW)
| |--- more areas
|-- ALOT MORE areas
|-- etc
Changing network 1 to 10 gives: 3.265 ms
将网络1更改为10会得到:3.265 ms
To summarize the difference:
总结的区别:
network 1 has 8 areas in network_area_flatdeep
network 10 has 5089 areas in network_area_flatdeep
网络1在network_area_flatdeep有8个区域,网络10在network_area_flatdeep有5089个区域
So INNER JOIN using network 1 is very slow, INNER JOIN using network 10 is very fast. Same behavior with subqueries.
所以使用网络1的内部连接非常慢,使用网络10的内部连接非常快。同样的行为与子查询。
Advert table
广告表
353804 rows
Table "public.advert"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('advert_id_seq'::regclass) | plain | |
user_id | integer | not null | plain | |
advert_category_id | integer | not null | plain | |
currency_id | integer | not null | plain | |
advert_kind_id | integer | not null | plain | |
advert_price_id | integer | | plain | |
external_source_id | integer | | plain | |
status | integer | not null | plain | |
type | integer | not null | plain | |
title | character varying(60) | not null | extended | |
description | text | not null | extended | |
price | numeric(19,2) | default NULL::numeric | main | |
accepting_bids | boolean | not null | plain | |
promoted | boolean | not null | plain | |
edited_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |
created_date | timestamp(0) without time zone | not null | plain | |
archived_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |
views | integer | not null | plain | |
checked_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |
archived_by_cron | boolean | not null | plain | |
unarchived_by_cron | boolean | not null | plain | |
containting_forbidden_words | boolean | not null | plain | |
external_id | character varying(255) | default NULL::character varying | extended | |
new_product | boolean | not null | plain | |
Indexes:
"advert_pkey" PRIMARY KEY, btree (id)
"advert_external_idx_uq" UNIQUE, btree (external_id, external_source_id)
"advert_archived_date_idx" btree (archived_date)
"advert_checked_date_idx" btree (checked_date)
"advert_created_date_idx" btree (created_date)
"advert_edited_date_idx" btree (edited_date)
"advert_external_id_idx" btree (external_id)
"advert_price_idx" btree (price)
"advert_status_idx" btree (status)
"advert_type_idx" btree (type)
"advert_views_idx" btree (views)
"idx_54f1f40b38248176" btree (currency_id)
"idx_54f1f40b54b67d66" btree (advert_price_id)
"idx_54f1f40b9a2e6cff" btree (advert_kind_id)
"idx_54f1f40ba76ed395" btree (user_id)
"idx_54f1f40bb167b375" btree (external_source_id)
"idx_54f1f40bd4436821" btree (advert_category_id)
Foreign-key constraints:
"fk_54f1f40b38248176" FOREIGN KEY (currency_id) REFERENCES currency(id) ON DELETE RESTRICT
"fk_54f1f40b54b67d66" FOREIGN KEY (advert_price_id) REFERENCES advertprice(id) ON DELETE RESTRICT
"fk_54f1f40b9a2e6cff" FOREIGN KEY (advert_kind_id) REFERENCES advertkind(id) ON DELETE RESTRICT
"fk_54f1f40ba76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
"fk_54f1f40bb167b375" FOREIGN KEY (external_source_id) REFERENCES externalsource(id) ON DELETE RESTRICT
"fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
Referenced by:
TABLE "advert_photo" CONSTRAINT "fk_1c939974d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE CASCADE
TABLE "banner" CONSTRAINT "fk_6f9db8e7d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE SET NULL
TABLE "advertbid" CONSTRAINT "fk_fccdba75d07eccb6" FOREIGN KEY (advert_id) REFERENCES advert(id) ON DELETE CASCADE
Has OIDs: no
Area table:
区域表:
5089 rows
Table "public.area"
Column | Type | Modifiers | Storage | Stats target | Description
------------+---------+---------------------------------------------------+---------+--------------+-------------
id | integer | not null default nextval('area_id_seq'::regclass) | plain | |
network_id | integer | | plain | |
parent_id | integer | | plain | |
selectable | boolean | not null | plain | |
Indexes:
"area_pkey" PRIMARY KEY, btree (id)
"idx_d7943d6834128b91" btree (network_id)
"idx_d7943d68727aca70" btree (parent_id)
Foreign-key constraints:
"fk_d7943d6834128b91" FOREIGN KEY (network_id) REFERENCES network(id) ON DELETE RESTRICT
"fk_d7943d68727aca70" FOREIGN KEY (parent_id) REFERENCES area(id) ON DELETE CASCADE
Referenced by:
TABLE "network_area_flat" CONSTRAINT "fk_10aae5b2bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
TABLE "area_language" CONSTRAINT "fk_17d42f7dbd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
TABLE "area_zip_code" CONSTRAINT "fk_62a3bf90bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
TABLE "member" CONSTRAINT "fk_70e4fa78bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE RESTRICT
TABLE "network_area_flatdeep" CONSTRAINT "fk_c29e8800bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
TABLE "area" CONSTRAINT "fk_d7943d68727aca70" FOREIGN KEY (parent_id) REFERENCES area(id) ON DELETE CASCADE
Has OIDs: no
Member table:
成员表:
182450 rows
Table "public.member"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('member_id_seq'::regclass) | plain | |
language_id | integer | not null | plain | |
area_id | integer | not null | plain | |
company_id | integer | | plain | |
external_source_id | integer | | plain | |
email | character varying(255) | not null | extended | |
password | character varying(40) | not null | extended | |
status | integer | not null | plain | |
name | character varying(150) | not null | extended | |
zip_code | character varying(20) | | extended | |
phone_number | character varying(120) | default NULL::character varying | extended | |
using_email_service | boolean | not null | plain | |
edited_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |
created_date | timestamp(0) without time zone | not null | plain | |
hiding_on_own_network | boolean | not null | plain | |
staff | boolean | not null | plain | |
superuser | boolean | not null | plain | |
external_id | character varying(255) | default NULL::character varying | extended | |
last_login_date | timestamp(0) without time zone | default NULL::timestamp without time zone | plain | |
deleted_adverts | integer | not null | plain | |
Indexes:
"member_pkey" PRIMARY KEY, btree (id)
"user_email_idx_uq" UNIQUE, btree (email)
"user_external_idx_uq" UNIQUE, btree (external_id, external_source_id)
"idx_70e4fa7882f1baf4" btree (language_id)
"idx_70e4fa78979b1ad6" btree (company_id)
"idx_70e4fa78b167b375" btree (external_source_id)
"idx_70e4fa78bd0f409c" btree (area_id)
"user_external_id_idx" btree (external_id)
"user_name_idx" btree (name)
"user_status_idx" btree (status)
Foreign-key constraints:
"fk_70e4fa7882f1baf4" FOREIGN KEY (language_id) REFERENCES language(id) ON DELETE RESTRICT
"fk_70e4fa78979b1ad6" FOREIGN KEY (company_id) REFERENCES company(id) ON DELETE SET NULL
"fk_70e4fa78b167b375" FOREIGN KEY (external_source_id) REFERENCES externalsource(id) ON DELETE RESTRICT
"fk_70e4fa78bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE RESTRICT
Referenced by:
TABLE "user_link" CONSTRAINT "fk_4c2dd538a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
TABLE "advert" CONSTRAINT "fk_54f1f40ba76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
TABLE "banner" CONSTRAINT "fk_6f9db8e7a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE SET NULL
TABLE "user_admin_module_permission" CONSTRAINT "fk_74fee7cea76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
TABLE "user_admin_resource_permission" CONSTRAINT "fk_c9fcf279a76ed395" FOREIGN KEY (user_id) REFERENCES member(id) ON DELETE CASCADE
Has OIDs: no
Network_area_flatdeep table:
Network_area_flatdeep表:
10177 rows
Table "public.network_area_flatdeep"
Column | Type | Modifiers | Storage | Stats target | Description
--------------+--------------------------------+--------------------------------------------------------------------+---------+--------------+-------------
id | integer | not null default nextval('network_area_flatdeep_id_seq'::regclass) | plain | |
network_id | integer | not null | plain | |
area_id | integer | not null | plain | |
created_date | timestamp(0) without time zone | not null | plain | |
Indexes:
"network_area_flatdeep_pkey" PRIMARY KEY, btree (id)
"area_flatdeep_idx_uq" UNIQUE, btree (area_id, network_id, created_date)
"idx_c29e880034128b91" btree (network_id)
"idx_c29e8800bd0f409c" btree (area_id)
Foreign-key constraints:
"fk_c29e880034128b91" FOREIGN KEY (network_id) REFERENCES network(id) ON DELETE CASCADE
"fk_c29e8800bd0f409c" FOREIGN KEY (area_id) REFERENCES area(id) ON DELETE CASCADE
Has OIDs: no
Short server config:
短的服务器配置:
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
name | current_setting | source
----------------------------+--------------------+----------------------
shared_buffers | 1800MB | configuration file
work_mem | 4MB | configuration file
Conclusion:
结论:
The more data (less filtered out), the faster the query is. I'm a bit clueless now. How can PostgreSQL be so slow on this? Again: 40ms doesn't look extremely slow, but with the real query with a big SELECT statement, queries are 1.5s most of the time and can take up to 3s.
数据越多(过滤得越少),查询就越快。我现在有点糊涂了。PostgreSQL怎么能这么慢呢?再一次:40ms看起来并不是特别慢,但是在真正的查询中有一个大的SELECT语句,查询在大多数情况下都是1。5,并且可以接收到3s。
All filtering is done on indexes. The ordering is done on an index too.
所有的过滤都在索引上完成。排序也是在索引上完成的。
Does anyone have an idea how to improve this simple filter I clearly need to separate data?
有人知道如何改进这个简单的过滤器吗?我需要分离数据。
4 个解决方案
#1
1
What kind of plan do you get when you remove the apparently needless (and thus distracting) join on area?
当你移除明显不必要的(因而分散注意力的)区域连接时,你会得到什么样的计划?
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND m6_.area_id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
ORDER BY a0_.created_date DESC
LIMIT 60;
#2
2
Try to replace the IN(subquery)
by the corresponding EXISTS(correlated subquery)
尝试用相应的存在(相关子查询)替换IN(子查询)
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND EXISTS (
SELECT*
FROM network_area_flatdeep xx
WHERE xx.area_id = m6_.area_id
AND xx.network_id IN (2)
)
ORDER BY a0_.created_date DESC
LIMIT 60
;
An you do need of course primary keys and relavant foreign keys. I don'rt know about network_area_flatdeep.area_id
, if it is not a PK or FK, you might need an index on it (or make it part of a composite PK)
你当然需要主键和相关的外键。我不太了解network_area_flatdeep。area_id,如果不是PK或FK,可能需要一个索引(或者使它成为复合PK的一部分)
UPDATE: a testbed with synthetic data:
更新:有合成数据的试验台:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE Table area
( id SERIAL not null PRIMARY KEY
, zzzz varchar
);
CREATE Table member
( id SERIAL not null PRIMARY KEY
, language_id integer not null DEFAULT 0
, area_id integer not null REFERENCES area(id)
, company_id integer
, external_source_id integer
, status integer not null DEFAULT 0
, name varchar not null
, zip_code varchar not null
, phone_number varchar default NULL
, using_email_service boolean not null DEFAULT False
, edited_date timestamp(0) without time zone default NULL
, created_date timestamp(0) without time zone not null
, hiding_on_own_network boolean not null DEFAULT False
, staff boolean not null DEFAULT False
, superuser boolean not null DEFAULT False
, external_id varchar default NULL
);
CREATE TABLE advert
( id SERIAL NOT NULL PRIMARY KEY
, user_id integer NOT NULL REFERENCES member(id)
, advert_category_id integer NOT NULL DEFAULT 0
, currency_id integer NOT NULL DEFAULT 0
, advert_kind_id integer NOT NULL DEFAULT 0
, advert_price_id integer
, external_source_id integer
, status integer NOT NULL DEFAULT 0
, type integer NOT NULL DEFAULT 0
, title varchar NOT NULL
, description text NOT NULL
, price numeric(10,2) default NULL
, accepting_bids boolean NOT NULL DEFAULT False
, promoted boolean NOT NULL DEFAULT False
, edited_date timestamp(0) without time zone default NULL
, created_date timestamp(0) without time zone NOT NULL
, archived_date timestamp(0) without time zone default NULL
, views integer NOT NULL DEFAULT 0
, checked_date timestamp(0) without time zone default NULL
, archived_by_cron boolean NOT NULL DEFAULT False
, unarchived_by_cron boolean NOT NULL DEFAULT False
, containting_forbidden_words boolean NOT NULL DEFAULT False
, external_id varchar default NULL
);
CREATE INDEX advert_created_date_idx ON advert (created_date);
CREATE Table network_area_flatdeep
-- the surrogate key in a junction table is questionable
( id SERIAL not null PRIMARY KEY
, network_id integer not null -- REFERENCES network(id) ON DELETE CASCADE
, area_id integer not null REFERENCES area(id) ON DELETE CASCADE
, created_date timestamp(0) without time zone not null
-- the date in the below constraint is questionable
, CONSTRAINT area_flatdeep_idx_uq UNIQUE (area_id, network_id, created_date)
);
CREATE INDEX idx_c29e880034128b91 ON network_area_flatdeep(network_id);
CREATE INDEX idx_c29e8800bd0f409c ON network_area_flatdeep(area_id);
INSERT INTO area ( zzzz)
SELECT 'Zzzz_' || gs::text
FROM generate_series(1,39) gs
;
INSERT INTO network_area_flatdeep
-- the surrogate key in a junction table is questionable
( network_id , area_id , created_date)
SELECT gs % 7 , aa.id, now()
FROM generate_series(1,76) gs
JOIN area aa ON aa.id = 1+gs % 39
;
INSERT INTO member
( area_id , name , zip_code, created_date)
SELECT aa.id
, 'Member_'|| gs::text
, 'Code_'|| gs::text
, now()
FROM generate_series(1, 10086) gs
JOIN area aa ON aa.id = 1 + gs % 39
;
INSERT INTO advert( user_id , title, description, edited_date , created_date)
SELECT 1+ (gs* 321) % 10086
, 'Tit_'|| gs::text
, 'Desc_'|| gs::text
, now()
, now() - (random() * 10000 * '1 sec'::interval)
from generate_series(1,47569 ) gs
;
UPDATE member SET status = 1 WHERE random() < .3;
UPDATE advert SET status = 1 WHERE random() < .3;
VACUUM ANALYZE member;
VACUUM ANALYZE advert;
VACUUM ANALYZE area;
VACUUM ANALYZE network_area_flatdeep;
My query executes in 12ms (PG-9.1), with the same plan as the OPs. (so this must be a configuration issue)
我的查询在12ms (PG-9.1)中执行,与项目操作系统相同。(所以这一定是配置问题)
#3
0
A small variation on one of your queries
您的一个查询的一个小变化
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
INNER JOIN (
SELECT area_id
FROM network_area_flatdeep
WHERE network_id IN (2)
) network_area_flatdeep n14_
ON a7_.id = n14_.area_id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;
#4
0
What happens if you turn the query around? So first you select the areas and then you join in the members and adverts on that one. You can also limit the join by adding the WHERE part on the JOIN
directly.
如果你把查询转过来会发生什么?首先你选择区域,然后你加入其中的成员和广告。您还可以通过直接在连接上添加WHERE部分来限制连接。
SELECT a0_.id AS id0
FROM area a7_
LEFT JOIN member m6_
ON a7_.id = m6_.area_id AND m6_.status IN ( 1 )
LEFT JOIN advert a0_
ON m6_.id = a0_.user_id
WHERE a7_.id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
AND a0_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;
#1
1
What kind of plan do you get when you remove the apparently needless (and thus distracting) join on area?
当你移除明显不必要的(因而分散注意力的)区域连接时,你会得到什么样的计划?
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND m6_.area_id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
ORDER BY a0_.created_date DESC
LIMIT 60;
#2
2
Try to replace the IN(subquery)
by the corresponding EXISTS(correlated subquery)
尝试用相应的存在(相关子查询)替换IN(子查询)
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
AND EXISTS (
SELECT*
FROM network_area_flatdeep xx
WHERE xx.area_id = m6_.area_id
AND xx.network_id IN (2)
)
ORDER BY a0_.created_date DESC
LIMIT 60
;
An you do need of course primary keys and relavant foreign keys. I don'rt know about network_area_flatdeep.area_id
, if it is not a PK or FK, you might need an index on it (or make it part of a composite PK)
你当然需要主键和相关的外键。我不太了解network_area_flatdeep。area_id,如果不是PK或FK,可能需要一个索引(或者使它成为复合PK的一部分)
UPDATE: a testbed with synthetic data:
更新:有合成数据的试验台:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE Table area
( id SERIAL not null PRIMARY KEY
, zzzz varchar
);
CREATE Table member
( id SERIAL not null PRIMARY KEY
, language_id integer not null DEFAULT 0
, area_id integer not null REFERENCES area(id)
, company_id integer
, external_source_id integer
, status integer not null DEFAULT 0
, name varchar not null
, zip_code varchar not null
, phone_number varchar default NULL
, using_email_service boolean not null DEFAULT False
, edited_date timestamp(0) without time zone default NULL
, created_date timestamp(0) without time zone not null
, hiding_on_own_network boolean not null DEFAULT False
, staff boolean not null DEFAULT False
, superuser boolean not null DEFAULT False
, external_id varchar default NULL
);
CREATE TABLE advert
( id SERIAL NOT NULL PRIMARY KEY
, user_id integer NOT NULL REFERENCES member(id)
, advert_category_id integer NOT NULL DEFAULT 0
, currency_id integer NOT NULL DEFAULT 0
, advert_kind_id integer NOT NULL DEFAULT 0
, advert_price_id integer
, external_source_id integer
, status integer NOT NULL DEFAULT 0
, type integer NOT NULL DEFAULT 0
, title varchar NOT NULL
, description text NOT NULL
, price numeric(10,2) default NULL
, accepting_bids boolean NOT NULL DEFAULT False
, promoted boolean NOT NULL DEFAULT False
, edited_date timestamp(0) without time zone default NULL
, created_date timestamp(0) without time zone NOT NULL
, archived_date timestamp(0) without time zone default NULL
, views integer NOT NULL DEFAULT 0
, checked_date timestamp(0) without time zone default NULL
, archived_by_cron boolean NOT NULL DEFAULT False
, unarchived_by_cron boolean NOT NULL DEFAULT False
, containting_forbidden_words boolean NOT NULL DEFAULT False
, external_id varchar default NULL
);
CREATE INDEX advert_created_date_idx ON advert (created_date);
CREATE Table network_area_flatdeep
-- the surrogate key in a junction table is questionable
( id SERIAL not null PRIMARY KEY
, network_id integer not null -- REFERENCES network(id) ON DELETE CASCADE
, area_id integer not null REFERENCES area(id) ON DELETE CASCADE
, created_date timestamp(0) without time zone not null
-- the date in the below constraint is questionable
, CONSTRAINT area_flatdeep_idx_uq UNIQUE (area_id, network_id, created_date)
);
CREATE INDEX idx_c29e880034128b91 ON network_area_flatdeep(network_id);
CREATE INDEX idx_c29e8800bd0f409c ON network_area_flatdeep(area_id);
INSERT INTO area ( zzzz)
SELECT 'Zzzz_' || gs::text
FROM generate_series(1,39) gs
;
INSERT INTO network_area_flatdeep
-- the surrogate key in a junction table is questionable
( network_id , area_id , created_date)
SELECT gs % 7 , aa.id, now()
FROM generate_series(1,76) gs
JOIN area aa ON aa.id = 1+gs % 39
;
INSERT INTO member
( area_id , name , zip_code, created_date)
SELECT aa.id
, 'Member_'|| gs::text
, 'Code_'|| gs::text
, now()
FROM generate_series(1, 10086) gs
JOIN area aa ON aa.id = 1 + gs % 39
;
INSERT INTO advert( user_id , title, description, edited_date , created_date)
SELECT 1+ (gs* 321) % 10086
, 'Tit_'|| gs::text
, 'Desc_'|| gs::text
, now()
, now() - (random() * 10000 * '1 sec'::interval)
from generate_series(1,47569 ) gs
;
UPDATE member SET status = 1 WHERE random() < .3;
UPDATE advert SET status = 1 WHERE random() < .3;
VACUUM ANALYZE member;
VACUUM ANALYZE advert;
VACUUM ANALYZE area;
VACUUM ANALYZE network_area_flatdeep;
My query executes in 12ms (PG-9.1), with the same plan as the OPs. (so this must be a configuration issue)
我的查询在12ms (PG-9.1)中执行,与项目操作系统相同。(所以这一定是配置问题)
#3
0
A small variation on one of your queries
您的一个查询的一个小变化
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN member m6_
ON a0_.user_id = m6_.id
INNER JOIN area a7_
ON m6_.area_id = a7_.id
INNER JOIN (
SELECT area_id
FROM network_area_flatdeep
WHERE network_id IN (2)
) network_area_flatdeep n14_
ON a7_.id = n14_.area_id
WHERE a0_.status IN ( 1 )
AND m6_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;
#4
0
What happens if you turn the query around? So first you select the areas and then you join in the members and adverts on that one. You can also limit the join by adding the WHERE part on the JOIN
directly.
如果你把查询转过来会发生什么?首先你选择区域,然后你加入其中的成员和广告。您还可以通过直接在连接上添加WHERE部分来限制连接。
SELECT a0_.id AS id0
FROM area a7_
LEFT JOIN member m6_
ON a7_.id = m6_.area_id AND m6_.status IN ( 1 )
LEFT JOIN advert a0_
ON m6_.id = a0_.user_id
WHERE a7_.id IN (SELECT area_id FROM network_area_flatdeep WHERE network_id IN (1))
AND a0_.status IN ( 1 )
ORDER BY a0_.created_date DESC
LIMIT 60;