Postgres:按不可变函数索引排序不使用索引

时间:2020-11-25 22:48:33

I have a simple table.

我有一张简单的桌子。

CREATE TABLE posts
(
    id uuid NOT NULL,
    vote_up_count integer,
    vote_down_count integer,
    CONSTRAINT post_pkey PRIMARY KEY(id)
);

I have an IMMUTABLE function that does simple (but could be complex) arithmetic.

我有一个IMMUTABLE函数,可以执行简单(但可能很复杂)的算法。

CREATE OR REPLACE FUNCTION score(
    ups integer,
    downs integer)
  RETURNS integer AS
$BODY$
    select $1 - $2
$BODY$
  LANGUAGE sql IMMUTABLE
  COST 100;
ALTER FUNCTION score(integer, integer)
  OWNER TO postgres;

I create an index on the posts table that uses my function.

我在使用我的函数的posts表上创建了一个索引。

CREATE INDEX posts_score_index ON posts(score(vote_up_count, vote_down_count), date_created);

When I EXPLAIN the following query, it doesn't seem to be using the index.

当我解释以下查询时,它似乎没有使用索引。

SELECT * FROM posts ORDER BY score(vote_up_count, vote_down_count), date_created

Sort  (cost=1.02..1.03 rows=1 width=310)
  Output: id, date_created, last_edit_date, slug, sub_id, user_id, user_ip, type, title, content, url, domain, send_replies, vote_up_count, vote_down_count, verdict, approved_by, removed_by, verdict_message, number_of_reports, ignore_reports, number_of_com (...)"
  Sort Key: ((posts.vote_up_count - posts.vote_down_count)), posts.date_created
  ->  Seq Scan on public.posts  (cost=0.00..1.01 rows=1 width=310)
        Output: id, date_created, last_edit_date, slug, sub_id, user_id, user_ip, type, title, content, url, domain, send_replies, vote_up_count, vote_down_count, verdict, approved_by, removed_by, verdict_message, number_of_reports, ignore_reports, number_ (...)

How do I get my ORDER BY to use an index from an IMMUTABLE function that could have some very complex arithmetic?

如何让我的ORDER BY使用IMMUTABLE函数中的索引,该索引可能有一些非常复杂的算法?

EDIT: Based off of @Егор-Рогов's suggestions, I change the query a bit to see if I can get it to use an index. Still no luck.

编辑:基于@Егор-Рогов的建议,我稍微改变了查询,看看我是否可以使用索引。仍然没有运气。

set enable_seqscan=off;
EXPLAIN VERBOSE select date_created from posts ORDER BY (hot(vote_up_count, vote_down_count, date_created),date_created);

Here is the output.

这是输出。

Sort  (cost=10000000001.06..10000000001.06 rows=1 width=16)
  Output: date_created, (ROW(round((((log((GREATEST(abs((vote_up_count - vote_down_count)), 1))::double precision) * sign(((vote_up_count - vote_down_count))::double precision)) + ((date_part('epoch'::text, date_created) - 1134028003::double precision) / 4 (...)
  Sort Key: (ROW(round((((log((GREATEST(abs((posts.vote_up_count - posts.vote_down_count)), 1))::double precision) * sign(((posts.vote_up_count - posts.vote_down_count))::double precision)) + ((date_part('epoch'::text, posts.date_created) - 1134028003::dou (...)
  ->  Seq Scan on public.posts  (cost=10000000000.00..10000000001.05 rows=1 width=16)
        Output: date_created, ROW(round((((log((GREATEST(abs((vote_up_count - vote_down_count)), 1))::double precision) * sign(((vote_up_count - vote_down_count))::double precision)) + ((date_part('epoch'::text, date_created) - 1134028003::double precision (...)

EDIT2: It seems that I was not using the index because of a second order by with date_created.

编辑2:似乎我没有使用索引因为date_created的二阶。

1 个解决方案

#1


1  

I can see a couple of points that discourages the planner from using the index.

我可以看到一些不鼓励计划者使用索引的观点。

1. Look at this line in the explain output:

1.在解释输出中查看此行:

Seq Scan on public.posts  (cost=0.00..1.01 rows=1 width=310)

It says that the planner believes there is only one row in the table. In this case it makes no sense to use index scan, for sequential scan is faster.

它说计划者认为表中只有一行。在这种情况下,使用索引扫描没有任何意义,因为顺序扫描更快。

Try to add more rows to the table, do analyze and try again. You can also test it by temporarily disabling sequential scans by set enable_seqscan=off;.

尝试向表中添加更多行,进行分析并重试。您还可以通过set enable_seqscan = off;暂时禁用顺序扫描来测试它。

2. You use your the function to sort the results. So the planner may decide to use the index in order to get tuple ids in the correct order. But then it needs to fetch each tuple from the table to get values of all columns (because of select *).

2.您可以使用该功能对结果进行排序。因此规划者可能决定使用索引以便以正确的顺序获取元组ID。但是它需要从表中获取每个元组以获取所有列的值(因为select *)。

You can make the index more attractive to the planner by adding all necessary columns to it, which make possible to avoid table scan. This is called index-only scan.

您可以通过向其添加所有必需的列来使索引对规划器更具吸引力,从而可以避免表扫描。这称为仅索引扫描。

CREATE INDEX posts_score_index ON posts(
  score(vote_up_count, vote_down_count),
  date_created,
  id,             -- do you actually need it in result set?
  vote_up_count,  -- do you actually need it in result set?
  vote_down_count -- do you actually need it in result set?
);

And make sure you run vacuum after inserting/updating/deleting rows to update the visibility map.

并确保在插入/更新/删除行后运行真空以更新可见性映射。

The downside is the increased index size, of course.

当然,缺点是指数规模增加。

#1


1  

I can see a couple of points that discourages the planner from using the index.

我可以看到一些不鼓励计划者使用索引的观点。

1. Look at this line in the explain output:

1.在解释输出中查看此行:

Seq Scan on public.posts  (cost=0.00..1.01 rows=1 width=310)

It says that the planner believes there is only one row in the table. In this case it makes no sense to use index scan, for sequential scan is faster.

它说计划者认为表中只有一行。在这种情况下,使用索引扫描没有任何意义,因为顺序扫描更快。

Try to add more rows to the table, do analyze and try again. You can also test it by temporarily disabling sequential scans by set enable_seqscan=off;.

尝试向表中添加更多行,进行分析并重试。您还可以通过set enable_seqscan = off;暂时禁用顺序扫描来测试它。

2. You use your the function to sort the results. So the planner may decide to use the index in order to get tuple ids in the correct order. But then it needs to fetch each tuple from the table to get values of all columns (because of select *).

2.您可以使用该功能对结果进行排序。因此规划者可能决定使用索引以便以正确的顺序获取元组ID。但是它需要从表中获取每个元组以获取所有列的值(因为select *)。

You can make the index more attractive to the planner by adding all necessary columns to it, which make possible to avoid table scan. This is called index-only scan.

您可以通过向其添加所有必需的列来使索引对规划器更具吸引力,从而可以避免表扫描。这称为仅索引扫描。

CREATE INDEX posts_score_index ON posts(
  score(vote_up_count, vote_down_count),
  date_created,
  id,             -- do you actually need it in result set?
  vote_up_count,  -- do you actually need it in result set?
  vote_down_count -- do you actually need it in result set?
);

And make sure you run vacuum after inserting/updating/deleting rows to update the visibility map.

并确保在插入/更新/删除行后运行真空以更新可见性映射。

The downside is the increased index size, of course.

当然,缺点是指数规模增加。