Postgres Materialize导致删除查询性能不佳

时间:2021-05-12 03:47:20

I have a DELETE query that I need to run on PostgreSQL 9.0.4. I am finding that it is performant until it hits 524,289 rows in a subselect query.

我有一个DELETE查询,我需要在PostgreSQL 9.0.4上运行。我发现它是高性能的,直到它在subselect查询中遇到524,289行。

For instance, at 524,288 there is no materialized view used and the cost looks pretty good:

例如,在524,288,没有使用物化视图,成本看起来很不错:

explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN
(SELECT userID FROM table2 fetch first 524288 rows only);
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete  (cost=13549.49..17840.67 rows=21 width=6)
   ->  Index Scan using jslps_userid_nopt on table1  (cost=13549.49..17840.67 rows=21 width=6)
         Filter: ((NOT (hashed SubPlan 1)) AND (pointlevel = 0))
         SubPlan 1
           ->  Limit  (cost=0.00..12238.77 rows=524288 width=8)
                 ->  Seq Scan on table2  (cost=0.00..17677.92 rows=757292 width=8)
(6 rows)

However, as soon as I hit 524,289, the materialized view comes into play and the DELETE query becomes much more costly:

但是,一旦我达到524,289,物化视图就会发挥作用,DELETE查询变得更加昂贵:

explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN
(SELECT userID FROM table2 fetch first 524289 rows only);

  QUERY PLAN

-----------------------------------------------------------------------------------------------------------  
Delete  (cost=0.00..386910.33 rows=21 width=6)
    ->  Index Scan using jslps_userid_nopt on table1  (cost=0.00..386910.33 rows=21 width=6)
         Filter: ((pointlevel = 0) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Materialize  (cost=0.00..16909.24 rows=524289 width=8)
                 ->  Limit  (cost=0.00..12238.79 rows=524289 width=8)
                       ->  Seq Scan on table2  (cost=0.00..17677.92 rows=757292 width=8) (7 rows)

I worked around the issue by using a JOIN in the sub-select query instead:

我通过在子选择查询中使用JOIN来解决这个问题:

SELECT s.userid 
FROM table1 s 
LEFT JOIN table2 p ON s.userid=p.userid
WHERE p.userid IS NULL AND s.pointlevel=0

However, I am still interested in understanding why the materialize decreases performance so drastically.

但是,我仍然有兴趣理解为什么物化大大降低了性能。

1 个解决方案

#1


4  

My guess is that at rows=524289 the memory buffer is filled up, so the subquery has to be materialized on the disk. Hence the dramatic increase in the time needed.

我的猜测是,在行= 524289时,内存缓冲区被填满,因此子查询必须在磁盘上实现。因此,所需时间急剧增加。

Here you can read more about configuring the memory buffers: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html
If you play with work_mem you will see the difference in the query behavior.

在这里,您可以阅读有关配置内存缓冲区的更多信息:http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html如果您使用work_mem,您将看到查询行为的不同。

However using join in the subquery is much better way to speed the query, since you are limiting the number of the rows at the source itself vs simply selecting first XYZ rows and then performing checks.

但是,在子查询中使用join是加快查询速度的更好方法,因为您要限制源本身的行数,而不是简单地选择第一个XYZ行然后执行检查。

#1


4  

My guess is that at rows=524289 the memory buffer is filled up, so the subquery has to be materialized on the disk. Hence the dramatic increase in the time needed.

我的猜测是,在行= 524289时,内存缓冲区被填满,因此子查询必须在磁盘上实现。因此,所需时间急剧增加。

Here you can read more about configuring the memory buffers: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html
If you play with work_mem you will see the difference in the query behavior.

在这里,您可以阅读有关配置内存缓冲区的更多信息:http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html如果您使用work_mem,您将看到查询行为的不同。

However using join in the subquery is much better way to speed the query, since you are limiting the number of the rows at the source itself vs simply selecting first XYZ rows and then performing checks.

但是,在子查询中使用join是加快查询速度的更好方法,因为您要限制源本身的行数,而不是简单地选择第一个XYZ行然后执行检查。