Postgresql在不同服务器上的不同查询计划

Summary: On Postgres 9.3.15 the same query on my dev and production machines have very different query plans with the production machine being 300x slower!

总结：在Postgres 9.3.15上，我的开发和生产机器上的相同查询具有非常不同的查询计划，生产机器速度慢300倍！

I realize "Limit" and "Offset" aren't great in Postgresql, but this doesn't explain why it's fast on my dev and slow on my production.

我意识到Postgresql中的“限制”和“偏移”并不是很好，但这并不能解释为什么它在我的开发上很快而且在我的制作上很慢。

Any suggestions? I've tried changing the cpu_tuple_cost(0.1 to 0.5 - no help)

有什么建议么？我试过更改cpu_tuple_cost（0.1到0.5 - 没有帮助）

My production server (Azure: 4 cpu, 16gig ram) takes 1100ms to run this query:

我的生产服务器（Azure：4 cpu，16gig ram）运行此查询需要1100ms：

prod=# SELECT  "designs".* FROM "designs"  WHERE "designs"."user_id" IN (SELECT "users"."id" FROM "users"  WHERE (code_id=393))  ORDER BY updated_at desc, "designs"."updated_at" DESC LIMIT 20 OFFSET 0;
Time: 1175.486 ms

Meanwhile my dev server (Virtualbox, laptop, 2 gig ram) takes 4ms to run the same query, on the same database.

同时我的开发服务器（Virtualbox，笔记本电脑，2 gig ram）需要4ms才能在同一个数据库上运行相同的查询。

dev=# SELECT  "designs".* FROM "designs"  WHERE "designs"."user_id" IN (SELECT "users"."id" FROM "users"  WHERE (code_id=393))  ORDER BY updated_at desc, "designs"."updated_at" DESC LIMIT 20 OFFSET 0;
Time: 4.249 ms

The production query plan is this:

生产查询计划是这样的：

prod=# explain  SELECT  "designs".* FROM "designs"  WHERE "designs"."user_id" IN (SELECT "users"."id" FROM "users"  WHERE (code_id=393))  ORDER BY updated_at desc, "designs"."updated_at" DESC LIMIT 20 OFFSET 0;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=169.00..113691.20 rows=20 width=966)
   ->  Nested Loop Semi Join  (cost=169.00..51045428.02 rows=8993 width=966)
         ->  Index Scan Backward using design_modification_date_idx on designs  (cost=85.00..1510927.32 rows=538151 width=966)
         ->  Index Scan using "User_UserUID_key" on users  (cost=84.00..92.05 rows=1 width=4)
               Index Cond: (id = designs.user_id)
               Filter: (code_id = 393)
(6 rows)

Time: 1.165 ms

The dev query plan is this:

开发查询计划是这样的：

dev=# explain SELECT  "designs".* FROM "designs"  WHERE "designs"."user_id" IN (SELECT "users"."id" FROM "users"  WHERE (code_id=393))  ORDER BY updated_at desc, "designs"."updated_at" DESC LIMIT 20 OFFSET 0;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Limit  (cost=5686.78..5686.83 rows=20 width=964)
   ->  Sort  (cost=5686.78..5689.41 rows=1052 width=964)
         Sort Key: designs.updated_at
         ->  Nested Loop  (cost=0.71..5658.79 rows=1052 width=964)
               ->  Index Scan using code_idx on users  (cost=0.29..192.63 rows=67 width=4)
                     Index Cond: (code_id = 393)
               ->  Index Scan using "Design_idx_owneruid" on designs  (cost=0.42..73.58 rows=16 width=964)
                     Index Cond: (user_id = users.id)
(8 rows)

Time: 0.736 ms

Edit: OK after dumping a fresh copy of production data, i found the query planner to be the same (so it was a data issue - sorry!). The query is still slow though, any thoughts what could be done to improve it? I've tried adding indexes on designs(updated_at, user_id) and users(id, code_id) to no avail

编辑：在转储生产数据的新副本后确定，我发现查询规划器是相同的（所以这是一个数据问题 - 抱歉！）。但是查询仍然很慢，有什么想法可以改进它吗？我尝试在设计（updated_at，user_id）和用户（id，code_id）上添加索引无济于事

Output of EXPLAIN (ANALYZE, BUFFERS):

EXPLAIN（ANALYZE，BUFFERS）的输出：

 Limit  (cost=0.72..10390.79 rows=20 width=962) (actual time=1485.810..22025.828 rows=20 loops=1)
   Buffers: shared hit=883264 read=164340
   ->  Nested Loop Semi Join  (cost=0.72..4928529.42 rows=9487 width=962) (actual time=1485.809..22025.809 rows=20 loops=1)
         Buffers: shared hit=883264 read=164340
         ->  Index Scan Backward using design_modification_date_idx on designs  (cost=0.42..1442771.50 rows=538270 width=962) (actual time=1.737..18444.598 rows=263043 loops=1)
               Buffers: shared hit=108266 read=149409
         ->  Index Scan using "User_UserUID_key" on users  (cost=0.29..6.48 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=263043)
               Index Cond: (id = designs.user_id)
               Filter: (code_id = 393)
               Rows Removed by Filter: 1
               Buffers: shared hit=774998 read=14931
 Total runtime: 22027.477 ms
(12 rows)

EDIT: additional explain for suggested query

编辑：建议查询的附加说明

dev=# explain (analyze) SELECT designs.*
FROM designs
   JOIN (SELECT *
           FROM users
           WHERE code_id=393
           OFFSET 0
        ) users
      ON designs.user_id = users.id
ORDER BY updated_at desc
LIMIT 20;


 Limit  (cost=0.72..13326.65 rows=20 width=962) (actual time=2597.877..95734.152 rows=20 loops=1)
   ->  Nested Loop  (cost=0.72..6321154.70 rows=9487 width=962) (actual time=2597.877..95734.135 rows=20 loops=1)
         Join Filter: (designs.user_id = users.id)
         Rows Removed by Join Filter: 143621402
         ->  Index Scan Backward using design_modification_date_idx on designs  (cost=0.42..1410571.52 rows=538270 width=962) (actual time=0.024..5217.228 rows=263043 loops=1)
         ->  Materialize  (cost=0.29..1562.31 rows=608 width=4) (actual time=0.000..0.146 rows=546 loops=263043)
               ->  Subquery Scan on users  (cost=0.29..1559.27 rows=608 width=4) (actual time=0.021..1.516 rows=546 loops=1)
                     ->  Index Scan using code_idx on users users_1  (cost=0.29..1553.19 rows=608 width=602) (actual time=0.020..1.252 rows=546 loops=1)
                           Index Cond: (code_id = 393)
 Total runtime: 95734.353 ms
(10 rows)

2 个解决方案

#1

Here is how I read this. Again ANALYZE and BUFFERS may be helpful but here I don't think so.

这是我读这个的方式。同样分析和缓冲可能会有所帮助，但在这里我不这么认为。

In your dev db, it expects to find 67 users and therefore it selects these first, then sorts then does a limit and an offset. And for the amount of data looked at, this is fast.

在您的dev db中，它期望找到67个用户，因此它首先选择这些，然后排序然后执行限制和偏移。对于查看的数据量，这很快。

On production it is assuming one user per id and going backwards, but a much larger number of designs per user, and therefore it searches designs along the ordering criteria first, and filters on user. This makes some sense when you realize it can stop after it finds 20 rows. But the data statistics make this into a bad plan and you get something which checks a bunch of extra records to find ones that are relevant.

在生产时，假设每个用户一个用户并向后，但每个用户的设计数量要大得多，因此它首先按照订购标准搜索设计，并过滤用户。当你意识到它可以在找到20行后停止时，这是有道理的。但是数据统计数据使这成为一个糟糕的计划，你会得到一些检查一堆额外记录的东西，以找到相关的记录。

So that's my guess as to what is happening. Make sure you understand why before you try to fix.....

这就是我对发生的事情的猜测。在你尝试修复之前，请确保你理解为什么.....

Now, if you were to create a (user_id, code_id) index on the user table, you would likely get a significant speed up because you could avoid checking tuples during the index scan phase.

现在，如果要在用户表上创建（user_id，code_id）索引，则可能会显着提高速度，因为您可以避免在索引扫描阶段检查元组。

Another option might be to create an index of (modification_date, user_id) on the designs table. However this seems like a longer shot to me.

另一种选择可能是在设计表上创建（modification_date，user_id）索引。然而这对我来说似乎是一个更长的镜头。

#2

The problem is that the users with code_id = 393 are mostly related to designs with low updated_at, so that PostgreSQL hast to scan 263043 rows from designs before it has found 20 that satisfy the condition.

问题是具有code_id = 393的用户大多与具有低updated_at的设计相关，因此PostgreSQL在找到满足条件的20之前从设计中扫描263043行。

Since PostgreSQL does not have cross-table statistics, it does not know that its idea to avoid a sort by using the appropriate index leads to more than the few scanned rows it expects.

由于PostgreSQL没有跨表统计信息，因此它不知道通过使用适当的索引来避免排序的想法导致超过它期望的少数扫描行。

You could rewrite the query an use the old and ugly trick with OFFSET 0, which does not change the query semantics, but prevents PostgreSQL from considering the questionable optimization:

您可以使用OFFSET 0使用旧的和丑陋的技巧重写查询，这不会改变查询语义，但会阻止PostgreSQL考虑可疑的优化：

SELECT designs.*
FROM designs
   JOIN (SELECT *
           FROM users
           WHERE code_id=393
           OFFSET 0  /* avoid optimizations beyond using an index for code_id */
        ) u
      ON designs.user_id = users.id
ORDER BY updated_at desc
LIMIT 20;

That should give you the desired fast plan.

这应该给你想要的快速计划。

If that is not enough to push PostgreSQL towards choosing the good plan, you could further help it by dropping the design_modification_date_idx index, if that is an option.

如果这还不足以推动PostgreSQL选择好的计划，那么你可以通过删除design_modification_date_idx索引来进一步帮助它，如果这是一个选项。

#1