如何加快这个查询?

时间:2022-09-20 14:02:51

ATM I am trying to learn how to efficiently use database inidices and would appreciate to get some expert input. I do not have any performance issues currently. I would just like to know, how you would handle your indices with this query:

ATM我正在努力学习如何有效地使用数据库知识,并希望获得一些专家意见。我目前没有任何性能问题。我想知道,如何使用此查询处理索引:

SELECT B.event, 
       COALESCE(B.system, C.surname || ' ' || C.forename) AS name, 
       C.label, 
       B.timestamp
FROM A            
  INNER JOIN B ON A.event=B.event
  INNER JOIN C ON B.state=C.id
  LEFT OUTER JOIN D ON B.hur=D.id             
WHERE A.id IN(12,13,14,15,...) 
  ORDER BY B.event, B.timestamp

A.id, C.id and D.id are already primary keys

A.id,C.id和D.id已经是主键

UPDATE normally i would put INDEX(A.event) and INDEX(B.event, B.timestamp). Is this correct? And what about B.event, B.state and B.hur ?

UPDATE通常我会把INDEX(A.event)和INDEX(B.event,B.timestamp)。它是否正确?那么B.event,B.state和B.hur呢?

7 个解决方案

#1


3  

Rewrite your query as this:

像这样重写您的查询:

SELECT  B.event, 
        COALESCE(B.system, C.surname || ' ' || C.forename) AS name, 
        C.label, 
        B.timestamp
FROM    B            
INNER JOIN
        C
ON      C.id = B.state
LEFT OUTER JOIN
        D
ON      D.id = B.hur
WHERE   B.event IN
        (
        SELECT  event
        FROM    A
        WHERE   A.id IN (12, 13, 14, 15)
        )
ORDER BY
        B.event, B.timestamp

, and create a composite index on B (event, timestamp)

,并在B上创建一个复合索引(事件,时间戳)

#2


3  

I usually take these steps when trying to speed up my queries

我通常在尝试加快查询时采取这些步骤

  1. analyze the execution plan.
  2. 分析执行计划。
  3. try to create (covering) indexes to eliminate table scans.
  4. 尝试创建(覆盖)索引以消除表扫描。
  5. try to create (covering) indexes to eliminate index scans.
  6. 尝试创建(覆盖)索引以消除索引扫描。

As for you query, you would not go wrong with creating indexes on

至于你的查询,你不会在创建索引时出错

  • A.event
  • A.event
  • B.event
  • B.event
  • B.state
  • B.state
  • B.Hur
  • B.Hur

#3


2  

You could add indexes to everything in the WHERE and ORDER BY clauses. Ie A.event, B.event and B.timestamp.

您可以为WHERE和ORDER BY子句中的所有内容添加索引。即A.event,B.event和B.timestamp。

#4


2  

Run explain analyze of the query, and read it - if it doesn't help - put the explain analyze output on explain.depesz.com and check what it "says".

运行解释分析查询,并阅读它 - 如果它没有帮助 - 将解释分析输出放在explain.depesz.com上并检查它“说”。

#5


2  

It is important to note that the order of the fields in the index is important.

重要的是要注意索引中字段的顺序很重要。

An index is, in a sense, a search tree. If you index (B.event,B.state) then the tree will group together all records with the save "event" field, then order them by the "state" field.

在某种意义上,索引是搜索树。如果您索引(B.event,B.state),那么树将使用save“event”字段将所有记录组合在一起,然后通过“state”字段对它们进行排序。

If you were then to query that index for "b.state = x", the index would be of little use; The index is ordered by the "event" first.

如果您当时要查询该索引为“b.state = x”,那么该索引几乎没用;索引首先按“事件”排序。


In your example:
- filter A by it's "event" field
- join A.event to B.event
- join B.state to C.id
- Join B.hur = D.id
- Order by B.event, B.timestamp

在您的示例中: - 通过它的“event”字段过滤A - 将A.event连接到B.event - 将B.state连接到C.id - 加入B.hur = D.id - 按B.event排序,B.timestamp

It's important to note that the optimise will look at the statistics of your tables, and indexes, then may re-arrange the order of the joins. The result will be the same, but the order may give different performance, and the optimisers job is to try to find the best performance.

重要的是要注意优化将查看表和索引的统计信息,然后可以重新排列连接的顺序。结果将是相同的,但订单可能会提供不同的性能,优化工作的目的是尝试找到最佳性能。

In your case I would expect B.event's order to be extremely important. Simply because that's the order of the resulting output, AND it's the field you filter by.

在你的情况下,我希望B.event的命令非常重要。只是因为那是结果输出的顺序,而它是你过滤的字段。

Next you join B.state to C.id. So having and index on C.id is good, it makes the join faster. But equally, having the B table data in a nice order may also make the join faster.

接下来,您将B.state加入C.id.因此拥有和索引C.id是好的,它使连接更快。但同样地,以良好的顺序使B表数据也可以使连接更快。

But, having an index on B.event and a separate index on B.state may yield little. The B.state index becomes next to pointless because we're using the B.event index. If you combine the two into one index (b.event then b.state) the execution plan may find a way to use the b.state part of the index.

但是,拥有B.event的索引和B.state的单独索引可能收效甚微。 B.state索引变得毫无意义,因为我们正在使用B.event索引。如果将两者合并为一个索引(b.event然后是b.state),执行计划可能会找到一种方法来使用索引的b.state部分。

Finally, if you put all the fields in the index, the index does get bigger, but the query may never actually need to look at the table. The information is in the index. The time taken to go from an index to the table to find the 'missing' fields is similar to that of a join. So for read performance, adding extra fields to the index can be of significant beenfit.

最后,如果将所有字段放在索引中,索引确实会变大,但查询可能永远不需要查看表。信息在索引中。从索引到表以查找“缺失”字段所花费的时间与连接类似。因此,对于读取性能,向索引添加额外字段可能是非常重要的。

I'm wittering on now, but the summary is this:
- Usually, separate index on separate fields don't get used together
- For composite indexes, the order you specify the fields makes a difference
- Adding 'extra' fields to the index makes it bigger, but also can make queries faster
- The order of the execution plan matters more than the order of your query
- But the indexes you have can determine the order of the execution plan

我现在很烦恼,但总结如下: - 通常,单独字段上的单独索引不会一起使用 - 对于复合索引,指定字段的顺序会有所不同 - 向索引添加“额外”字段使它更大,但也可以使查询更快 - 执行计划的顺序比查询的顺序更重要 - 但你拥有的索引可以确定执行计划的顺序

This kind of work has no categorical answers. It is so dependent on your data it's closer to an art.

这种工作没有明确的答案。它依赖于您的数据,更接近艺术。

One option is to over load the tables with indexes, look at the resulting execution plan, and delete the indexes that are not necessary.

一种选择是使用索引过载表,查看生成的执行计划,并删除不必要的索引。

But even there a caveat applies. Because the execution plan is data (and table statistics) dependent, it is very important to have real world data in the tables. While the tables have 10' or 100s of rows, one execution plan may be fastest. But when you get millions of rows the execution plan can change, and so benefit from different indexes.

但即使有一点需要注意。由于执行计划依赖于数据(和表统计信息),因此在表中包含真实世界数据非常重要。虽然表有10'或100行,但一个执行计划可能最快。但是,当您获得数百万行时,执行计划可能会发生变化,因此可以从不同的索引中获益。

#6


1  

I would add indexes to anything that is joined, in the where clause or in the order by clause.

我会在where子句或order by子句中添加索引到任何已连接的东西。

In this case add indexes of the following (assuming ID fields are primary keys and already indexed) :

在这种情况下,添加以下索引(假设ID字段是主键并已编入索引):

  1. A.event
  2. A.event
  3. B.event
  4. B.event
  5. B.state
  6. B.state
  7. B.Hur
  8. B.Hur
  9. B.event, B.timestamp (combined index of both fields)
  10. B.event,B.timestamp(两个字段的综合指数)

The 5th one, being an index combination should speed the order by.

第五个,作为索引组合应该加快订单。

You need to temper the number of indexes, against any performance drop you have in inserting records into the table (the more indexes you add to the table, the slower inserts and updates will be, as indexes need to be updated).

您需要调整索引的数量,以防止在将记录插入表中时的性能下降(您添加到表中的索引越多,插入和更新就越慢,因为索引需要更新)。

#7


0  

SELECT B.event, B.system, COALESCE(C.surname) || ' ' || COALESCE(C.forename) AS name,    C.label, B.timestamp
FROM A            
INNER JOIN B ON A.event=B.event
INNER JOIN C ON B.state=C.id
LEFT OUTER JOIN D ON B.hur=D.id             
WHERE A.event = ANY(:visits) 
ORDER BY B.event, B.timestamp

Also the ORDER BY will slow things down badly. Make sure these are indexed:

ORDER BY也会让事情变得更糟。确保这些已编入索引:

A.event
B.event
B.state
C.id
B.timestamp

#1


3  

Rewrite your query as this:

像这样重写您的查询:

SELECT  B.event, 
        COALESCE(B.system, C.surname || ' ' || C.forename) AS name, 
        C.label, 
        B.timestamp
FROM    B            
INNER JOIN
        C
ON      C.id = B.state
LEFT OUTER JOIN
        D
ON      D.id = B.hur
WHERE   B.event IN
        (
        SELECT  event
        FROM    A
        WHERE   A.id IN (12, 13, 14, 15)
        )
ORDER BY
        B.event, B.timestamp

, and create a composite index on B (event, timestamp)

,并在B上创建一个复合索引(事件,时间戳)

#2


3  

I usually take these steps when trying to speed up my queries

我通常在尝试加快查询时采取这些步骤

  1. analyze the execution plan.
  2. 分析执行计划。
  3. try to create (covering) indexes to eliminate table scans.
  4. 尝试创建(覆盖)索引以消除表扫描。
  5. try to create (covering) indexes to eliminate index scans.
  6. 尝试创建(覆盖)索引以消除索引扫描。

As for you query, you would not go wrong with creating indexes on

至于你的查询,你不会在创建索引时出错

  • A.event
  • A.event
  • B.event
  • B.event
  • B.state
  • B.state
  • B.Hur
  • B.Hur

#3


2  

You could add indexes to everything in the WHERE and ORDER BY clauses. Ie A.event, B.event and B.timestamp.

您可以为WHERE和ORDER BY子句中的所有内容添加索引。即A.event,B.event和B.timestamp。

#4


2  

Run explain analyze of the query, and read it - if it doesn't help - put the explain analyze output on explain.depesz.com and check what it "says".

运行解释分析查询,并阅读它 - 如果它没有帮助 - 将解释分析输出放在explain.depesz.com上并检查它“说”。

#5


2  

It is important to note that the order of the fields in the index is important.

重要的是要注意索引中字段的顺序很重要。

An index is, in a sense, a search tree. If you index (B.event,B.state) then the tree will group together all records with the save "event" field, then order them by the "state" field.

在某种意义上,索引是搜索树。如果您索引(B.event,B.state),那么树将使用save“event”字段将所有记录组合在一起,然后通过“state”字段对它们进行排序。

If you were then to query that index for "b.state = x", the index would be of little use; The index is ordered by the "event" first.

如果您当时要查询该索引为“b.state = x”,那么该索引几乎没用;索引首先按“事件”排序。


In your example:
- filter A by it's "event" field
- join A.event to B.event
- join B.state to C.id
- Join B.hur = D.id
- Order by B.event, B.timestamp

在您的示例中: - 通过它的“event”字段过滤A - 将A.event连接到B.event - 将B.state连接到C.id - 加入B.hur = D.id - 按B.event排序,B.timestamp

It's important to note that the optimise will look at the statistics of your tables, and indexes, then may re-arrange the order of the joins. The result will be the same, but the order may give different performance, and the optimisers job is to try to find the best performance.

重要的是要注意优化将查看表和索引的统计信息,然后可以重新排列连接的顺序。结果将是相同的,但订单可能会提供不同的性能,优化工作的目的是尝试找到最佳性能。

In your case I would expect B.event's order to be extremely important. Simply because that's the order of the resulting output, AND it's the field you filter by.

在你的情况下,我希望B.event的命令非常重要。只是因为那是结果输出的顺序,而它是你过滤的字段。

Next you join B.state to C.id. So having and index on C.id is good, it makes the join faster. But equally, having the B table data in a nice order may also make the join faster.

接下来,您将B.state加入C.id.因此拥有和索引C.id是好的,它使连接更快。但同样地,以良好的顺序使B表数据也可以使连接更快。

But, having an index on B.event and a separate index on B.state may yield little. The B.state index becomes next to pointless because we're using the B.event index. If you combine the two into one index (b.event then b.state) the execution plan may find a way to use the b.state part of the index.

但是,拥有B.event的索引和B.state的单独索引可能收效甚微。 B.state索引变得毫无意义,因为我们正在使用B.event索引。如果将两者合并为一个索引(b.event然后是b.state),执行计划可能会找到一种方法来使用索引的b.state部分。

Finally, if you put all the fields in the index, the index does get bigger, but the query may never actually need to look at the table. The information is in the index. The time taken to go from an index to the table to find the 'missing' fields is similar to that of a join. So for read performance, adding extra fields to the index can be of significant beenfit.

最后,如果将所有字段放在索引中,索引确实会变大,但查询可能永远不需要查看表。信息在索引中。从索引到表以查找“缺失”字段所花费的时间与连接类似。因此,对于读取性能,向索引添加额外字段可能是非常重要的。

I'm wittering on now, but the summary is this:
- Usually, separate index on separate fields don't get used together
- For composite indexes, the order you specify the fields makes a difference
- Adding 'extra' fields to the index makes it bigger, but also can make queries faster
- The order of the execution plan matters more than the order of your query
- But the indexes you have can determine the order of the execution plan

我现在很烦恼,但总结如下: - 通常,单独字段上的单独索引不会一起使用 - 对于复合索引,指定字段的顺序会有所不同 - 向索引添加“额外”字段使它更大,但也可以使查询更快 - 执行计划的顺序比查询的顺序更重要 - 但你拥有的索引可以确定执行计划的顺序

This kind of work has no categorical answers. It is so dependent on your data it's closer to an art.

这种工作没有明确的答案。它依赖于您的数据,更接近艺术。

One option is to over load the tables with indexes, look at the resulting execution plan, and delete the indexes that are not necessary.

一种选择是使用索引过载表,查看生成的执行计划,并删除不必要的索引。

But even there a caveat applies. Because the execution plan is data (and table statistics) dependent, it is very important to have real world data in the tables. While the tables have 10' or 100s of rows, one execution plan may be fastest. But when you get millions of rows the execution plan can change, and so benefit from different indexes.

但即使有一点需要注意。由于执行计划依赖于数据(和表统计信息),因此在表中包含真实世界数据非常重要。虽然表有10'或100行,但一个执行计划可能最快。但是,当您获得数百万行时,执行计划可能会发生变化,因此可以从不同的索引中获益。

#6


1  

I would add indexes to anything that is joined, in the where clause or in the order by clause.

我会在where子句或order by子句中添加索引到任何已连接的东西。

In this case add indexes of the following (assuming ID fields are primary keys and already indexed) :

在这种情况下,添加以下索引(假设ID字段是主键并已编入索引):

  1. A.event
  2. A.event
  3. B.event
  4. B.event
  5. B.state
  6. B.state
  7. B.Hur
  8. B.Hur
  9. B.event, B.timestamp (combined index of both fields)
  10. B.event,B.timestamp(两个字段的综合指数)

The 5th one, being an index combination should speed the order by.

第五个,作为索引组合应该加快订单。

You need to temper the number of indexes, against any performance drop you have in inserting records into the table (the more indexes you add to the table, the slower inserts and updates will be, as indexes need to be updated).

您需要调整索引的数量,以防止在将记录插入表中时的性能下降(您添加到表中的索引越多,插入和更新就越慢,因为索引需要更新)。

#7


0  

SELECT B.event, B.system, COALESCE(C.surname) || ' ' || COALESCE(C.forename) AS name,    C.label, B.timestamp
FROM A            
INNER JOIN B ON A.event=B.event
INNER JOIN C ON B.state=C.id
LEFT OUTER JOIN D ON B.hur=D.id             
WHERE A.event = ANY(:visits) 
ORDER BY B.event, B.timestamp

Also the ORDER BY will slow things down badly. Make sure these are indexed:

ORDER BY也会让事情变得更糟。确保这些已编入索引:

A.event
B.event
B.state
C.id
B.timestamp