在Postgres中使用MAX函数或DESC的最大记录

时间:2020-11-30 18:03:23

I need to fetch latest user from User table.Which one of the below queries has best performance in Postgres for doing this.?

我需要从User表中获取最新用户。以下其中一个查询在Postgres中具有最佳性能。

Select MAX(u.id) from User u;

or

要么

Select u.id from User u order by desc limit 1;

2 个解决方案

#1


0  

This is an elaboration of the comment.

这是评论的详细说明。

If you have an index on user(id), then both formulations should use that index. I'm pretty sure they would have essentially the same execution plan.

如果您有一个user(id)索引,那么两个公式都应该使用该索引。我很确定他们的执行计划基本相同。

If you don't have a (b-tree) index, then I think the max() version will be faster. I think it will read the data once and extract the max() in a single pass. The order by will have to sort all the records.

如果你没有(b-tree)索引,那么我认为max()版本会更快。我认为它会读取一次数据并在一次传递中提取max()。订单必须对所有记录进行排序。

Sometimes databases have some very specific optimizations that might apply (such as an optimization that might recognize a special case with limit and order by). I don't think any apply in this case.

有时,数据库具有一些可能适用的非常具体的优化(例如可能识别具有限制和排序的特殊情况的优化)。我认为在这种情况下不适用。

#2


0  

This may depend on your PostgreSQL version, but I tested the two approaches on a representative table (which is what you should do):

这可能取决于您的PostgreSQL版本,但我在代表性表格上测试了这两种方法(这是您应该做的):

explain analyze select max(id) from versions;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.21..0.21 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.08..0.21 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=1)
           ->  Index Only Scan Backward using index_versions_on_id on versions  (cost=0.08..98474.35 rows=787172 width=4) (actual time=0.030..0.030 rows=1 loops=1)
                 Index Cond: (id IS NOT NULL)
                 Heap Fetches: 1
 Planning time: 0.143 ms
 Execution time: 0.062 ms
(8 rows)

explain analyze select id from versions order by id desc limit 1;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.08..0.21 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)
   ->  Index Only Scan Backward using index_versions_on_id on versions  (cost=0.08..98080.76 rows=787172 width=4) (actual time=0.024..0.024 rows=1 loops=1)
         Heap Fetches: 1
 Planning time: 0.099 ms
 Execution time: 0.044 ms
(5 rows)

This was from 9.4.5, on a unique index on a table with 860,000 rows.

这是从9.4.5,在一个有860,000行的表上的唯一索引。

This showed that the order by technique was marginally faster, but for me it is not enough to decide that you should use that method -- performance is not everything, and I prefer the semantics of the max() approach.

这表明技术的顺序稍微快一点,但对我来说,仅仅决定你应该使用那种方法是不够的 - 性能不是一切,我更喜欢max()方法的语义。

#1


0  

This is an elaboration of the comment.

这是评论的详细说明。

If you have an index on user(id), then both formulations should use that index. I'm pretty sure they would have essentially the same execution plan.

如果您有一个user(id)索引,那么两个公式都应该使用该索引。我很确定他们的执行计划基本相同。

If you don't have a (b-tree) index, then I think the max() version will be faster. I think it will read the data once and extract the max() in a single pass. The order by will have to sort all the records.

如果你没有(b-tree)索引,那么我认为max()版本会更快。我认为它会读取一次数据并在一次传递中提取max()。订单必须对所有记录进行排序。

Sometimes databases have some very specific optimizations that might apply (such as an optimization that might recognize a special case with limit and order by). I don't think any apply in this case.

有时,数据库具有一些可能适用的非常具体的优化(例如可能识别具有限制和排序的特殊情况的优化)。我认为在这种情况下不适用。

#2


0  

This may depend on your PostgreSQL version, but I tested the two approaches on a representative table (which is what you should do):

这可能取决于您的PostgreSQL版本,但我在代表性表格上测试了这两种方法(这是您应该做的):

explain analyze select max(id) from versions;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.21..0.21 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.08..0.21 rows=1 width=4) (actual time=0.031..0.031 rows=1 loops=1)
           ->  Index Only Scan Backward using index_versions_on_id on versions  (cost=0.08..98474.35 rows=787172 width=4) (actual time=0.030..0.030 rows=1 loops=1)
                 Index Cond: (id IS NOT NULL)
                 Heap Fetches: 1
 Planning time: 0.143 ms
 Execution time: 0.062 ms
(8 rows)

explain analyze select id from versions order by id desc limit 1;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.08..0.21 rows=1 width=4) (actual time=0.025..0.025 rows=1 loops=1)
   ->  Index Only Scan Backward using index_versions_on_id on versions  (cost=0.08..98080.76 rows=787172 width=4) (actual time=0.024..0.024 rows=1 loops=1)
         Heap Fetches: 1
 Planning time: 0.099 ms
 Execution time: 0.044 ms
(5 rows)

This was from 9.4.5, on a unique index on a table with 860,000 rows.

这是从9.4.5,在一个有860,000行的表上的唯一索引。

This showed that the order by technique was marginally faster, but for me it is not enough to decide that you should use that method -- performance is not everything, and I prefer the semantics of the max() approach.

这表明技术的顺序稍微快一点,但对我来说,仅仅决定你应该使用那种方法是不够的 - 性能不是一切,我更喜欢max()方法的语义。