如何使用多个连接来提高查询性能

时间:2022-06-07 00:21:53

I have a query (with the purpose of making a view) which is using a few joins to get each column. Performance degrades quickly (exponentially?) for each set of joins added.

我有一个查询(用于创建视图),它使用一些连接来获取每个列。每一组连接的性能都会迅速下降(指数级?)

What would be a good approach to make this query faster? Please see comments within the query.

怎样才能使查询更快呢?请参见查询中的注释。

If it helps, this is using the WordPress DB schema.

如果有用的话,这就是使用WordPress DB模式。

Here is a screenshot of EXPLAIN 如何使用多个连接来提高查询性能

这是一个解释的截图

PRODUCTS TABLE

产品表

+--+----+
|id|name|
+--+----+
|1 |test|
+--+----+

METADATA TABLE

元数据表

+----------+--------+-----+
|product_id|meta_key|value|
+----------+--------+-----+
|1         |price   |9.99 |
+----------+--------+-----+
|1         |sku     |ABC  |
+----------+--------+-----+

TERM_RELATIONSHIPS TABLE

TERM_RELATIONSHIPS表

+---------+----------------+
|object_id|term_taxonomy_id|
+---------+----------------+
|1        |1               |
+---------+----------------+
|1        |2               |
+---------+----------------+

TERM_TAXONOMY TABLE

TERM_TAXONOMY表

+----------------+-------+--------+
|term_taxonomy_id|term_id|taxonomy|
+----------------+-------+--------+
|1               |1      |size    |
+----------------+-------+--------+
|2               |2      |stock   |
+----------------+-------+--------+

TERMS TABLE

术语表

+-------+-----+
|term_id|name |
+-------+-----+
|1      |500mg|
+-------+-----+
|2      |10   |
+-------+-----+

QUERY

查询

SELECT 
  products.id,
  products.name,
  price.value AS price,
  sku.value AS sku,
  size.name AS size
FROM products

/* These joins are performing quickly */

INNER JOIN `metadata` AS price ON products.id = price.product_id AND price.meta_key = 'price'
INNER JOIN `metadata` AS sku ON products.id = sku.product_id AND sku.meta_key = 'sku'

/* Here's the part that is really slowing it down - I run this chunk about 5 times with different strings to match */

INNER JOIN `term_relationships` AS tr ON products.id = tr.object_id
  INNER JOIN `term_taxonomy` AS tt
  ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'
    INNER JOIN `terms` AS size
    ON tt.term_id = size.term_id

7 个解决方案

#1


9  

Your performance issue is most likely caused by the join with the 'term_taxonomy' table.
All other joins seems to use the primary key (where you probobly have working indexes on).

So my suggestion is to add a compound index on term_taxonomy_id and term_id (or if you must: taxonomy). Like this:

您的性能问题很可能是由与“term_taxonomy”表的连接引起的。所有其他连接似乎都使用主键(在这里,您可以在那里找到工作索引)。所以我的建议是在term_taxonomy_id和term_id(如果必须的话:taxonomy)上添加一个复合索引。是这样的:

CREATE UNIQUE INDEX idx_term_taxonomy_id_taxonomy
ON term_taxonomy( term_taxonomy_id, taxonomy);

Hope this will help you.

希望这能对你有所帮助。

#2


1  

Make Sure all the columns on which there is "ON" conditional statements is there, should be indexed. This will significantly improve the speed.

确保有“on”条件语句的所有列都应该被索引。这将大大提高速度。

#3


0  

    Declare @query as NVARCHAR(MAX)
    set @query = ('SELECT 
    products.id,
    products.name,
    price.value AS price,
    sku.value AS sku,
    size.name AS size
    FROM products
    INNER JOIN metadata AS price ON products.id = price.product_id AND price.meta_key = price
    INNER JOIN metadata AS sku ON products.id = sku.product_id AND sku.meta_key = sku
    INNER JOIN term_relationships AS tr ON products.id = tr.object_id
    INNER JOIN term_taxonomy AS tt
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = size
    INNER JOIN terms AS size
    ON tt.term_id = size.term_id
    into #t')

    exec(@query);
    select * from #t

I Hope the above way will reduce the time utilization, or creating a temporary table with all the fields you select and updating the temporary table by joining to the temporary table to all the other tables might also be effective, well i am not sure about it but Even I am waiting for your result as your question seems intresting

我希望以上方法将减少时间利用率,或创建一个临时表的所有字段选择和更新的临时表加入到临时表的所有其他表也可能是有效的,我甚至不确定,但我在等待你的结果你的问题似乎辨析

#4


0  

Try this:

试试这个:

SELECT p.id, p.name, MAX(CASE m.meta_key WHEN 'price' THEN m.value ELSE '' END) AS price, 
       MAX(CASE m.meta_key WHEN 'sku' THEN m.value ELSE '' END) AS sku, s.name AS size
FROM products p 
INNER JOIN `metadata` AS m ON p.id = m.product_id  
INNER JOIN `term_relationships` AS tr ON p.id = tr.object_id 
INNER JOIN `term_taxonomy` AS tt ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'
INNER JOIN `terms` AS s ON tt.term_id = s.term_id
GROUP BY p.id;

If you still find that your query is slow then add the EXPLAIN plan of my query so I can find which columns needs INDEX.

如果您仍然发现您的查询很慢,那么添加我的查询的EXPLAIN计划,这样我就可以找到哪些列需要索引。

#5


0  

The below script is formatted as per SQL Server rules - You can change this as per MySQL rules and give it a try -

下面的脚本是按照SQL Server规则进行格式化的——您可以根据MySQL规则更改它,并尝试——

SELECT 
  P.id,
  P.name,
  PIVOT_METADATA.price,
  PIVOT_METADATA.sku,
  size.name AS size
FROM products P (NOLOCK)

INNER JOIN term_relationships AS tr (NOLOCK)
    ON P.id = tr.object_id

INNER JOIN term_taxonomy AS tt (NOLOCK)
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'

INNER JOIN terms AS size (NOLOCK)
    ON tt.term_id = size.term_id

INNER JOIN METADATA (NOLOCK)
    PIVOT
    (
        MAX(value)
        FOR [meta_key] IN (price,sku)
    )AS PIVOT_METADATA
    ON P.id = PIVOT_METADATA.product_id

What I feel could be the bottleneck in your query - You are joining Metadata 2 times. Since there are 1-to-many relationships in your tables, the Metadata 2-join doesn't hurt but after that as you join more tables - the number of rows due to 1-to-many relationship increase - and hence the prformance drops.

我认为这可能是您查询的瓶颈——您将2次加入元数据。由于您的表中有一到多个关系,元数据2-join不会有什么问题,但是在这之后,当您加入更多的表时(由于1到多个关系而增加的行数),性能就会下降。

What I've tried to achieve - I'm making sure that as many 1-to-1 relationships are fulfilled as possible. To do this, I've done a Pivot on Metadata adn made price & sku as columns. Now my product id shall have only one row in Metadata pivot. alos, I've made sure that I join this picot at the very end.

我想要实现的是——我要确保尽可能多的一对一的关系得到满足。为此,我对元数据adn made price & sku作为列进行了研究。现在,我的产品id在元数据透视中将只有一行。阿洛斯,我确保我在最后加入了这个picot。

Give it a try. Please share the expected performance, number of records you have & also what performance you get with my asnwer.

试一试。请分享预期的表现,你有多少记录,以及你得到什么表现与我的回答。

#6


0  

METADATA_TABLE and TERM_RELATIONSHIP_TABLE do not have any proimary key. When there are huge records in these tables your query performancy will be hit.

METADATA_TABLE和TERM_RELATIONSHIP_TABLE没有任何前置键。当这些表中有大量记录时,查询性能就会受到影响。

Checkpoints to increase your performance.

检查点以提高您的性能。

  1. All tables should have primary key. This is because rows in table will be physically sorted.
  2. 所有表都应该有主键。这是因为表中的行将被物理排序。
  3. For small and queries involving few tables keeping primary key in table would be enough. If you still wish to improve performance, create non-clustered-index for columns such as *object_Id field of term_relationships table* . Non-clustered index should be created for those columns in table which are taking part in join operation.
  4. 对于较小的查询,只包含几个表,而将主键保存在表中就足够了。如果您仍然希望提高性能,可以为诸如term_relationships表*的*object_Id字段之类的列创建非集群索引。应该为表中参与join操作的列创建非聚集索引。

However, point to be noted is that, non-clustered index should be very less on those tables where multiple insert and updates are happening. This is not a simple question and can’t be answered only based on run time. There are other factors that affect the answer especially if environment where a stored procedure is running is heavily transactional.

但是,需要注意的是,在发生多次插入和更新的表上,非集群索引的数量应该非常少。这不是一个简单的问题,并且只能基于运行时才能回答。还有其他影响答案的因素,尤其是在存储过程正在运行的环境严重为事务的情况下。

You can find more here

你可以在这里找到更多

#7


0  

I would suggest those:

我建议那些:

  • Consider reducing those joins from business level;
  • 考虑减少业务级别的连接;
  • If not possible to do from "top"(business level), and the data is not for real time, I would suggest to prepare a memory table(I know the solution is not ideal). And select your data from memory table directly.
  • 如果无法从“top”(业务级别)执行,且数据不是实时的,我建议准备一个内存表(我知道解决方案并不理想)。并直接从内存表中选择您的数据。

In my experience:

以我的经验:

  • "joins" is the killer for performance, the bigger your data is, the more pain you will feel;
  • “join”是性能的杀手,您的数据越大,您就会感到越痛苦;
  • Try to get rid of joins, not try to improve query performance by keeping joins unless you have to. Usually I will try to fix those issues from "top" to "bottom"
  • 尽量避免连接,除非必要,否则不要通过保持连接来提高查询性能。通常我会尝试从“顶部”到“底部”解决这些问题
  • The last suggestion is if all above don't work. I will consider "map/reduce + fulltext search", if that worth to do.
  • 最后一个建议是,如果以上这些都不管用。我将考虑“map/reduce + fulltext search”,如果有必要的话。

(Forgive me I didn't provide solution to improve your query performance.)

(对不起,我没有提供解决方案来提高您的查询性能。)

#1


9  

Your performance issue is most likely caused by the join with the 'term_taxonomy' table.
All other joins seems to use the primary key (where you probobly have working indexes on).

So my suggestion is to add a compound index on term_taxonomy_id and term_id (or if you must: taxonomy). Like this:

您的性能问题很可能是由与“term_taxonomy”表的连接引起的。所有其他连接似乎都使用主键(在这里,您可以在那里找到工作索引)。所以我的建议是在term_taxonomy_id和term_id(如果必须的话:taxonomy)上添加一个复合索引。是这样的:

CREATE UNIQUE INDEX idx_term_taxonomy_id_taxonomy
ON term_taxonomy( term_taxonomy_id, taxonomy);

Hope this will help you.

希望这能对你有所帮助。

#2


1  

Make Sure all the columns on which there is "ON" conditional statements is there, should be indexed. This will significantly improve the speed.

确保有“on”条件语句的所有列都应该被索引。这将大大提高速度。

#3


0  

    Declare @query as NVARCHAR(MAX)
    set @query = ('SELECT 
    products.id,
    products.name,
    price.value AS price,
    sku.value AS sku,
    size.name AS size
    FROM products
    INNER JOIN metadata AS price ON products.id = price.product_id AND price.meta_key = price
    INNER JOIN metadata AS sku ON products.id = sku.product_id AND sku.meta_key = sku
    INNER JOIN term_relationships AS tr ON products.id = tr.object_id
    INNER JOIN term_taxonomy AS tt
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = size
    INNER JOIN terms AS size
    ON tt.term_id = size.term_id
    into #t')

    exec(@query);
    select * from #t

I Hope the above way will reduce the time utilization, or creating a temporary table with all the fields you select and updating the temporary table by joining to the temporary table to all the other tables might also be effective, well i am not sure about it but Even I am waiting for your result as your question seems intresting

我希望以上方法将减少时间利用率,或创建一个临时表的所有字段选择和更新的临时表加入到临时表的所有其他表也可能是有效的,我甚至不确定,但我在等待你的结果你的问题似乎辨析

#4


0  

Try this:

试试这个:

SELECT p.id, p.name, MAX(CASE m.meta_key WHEN 'price' THEN m.value ELSE '' END) AS price, 
       MAX(CASE m.meta_key WHEN 'sku' THEN m.value ELSE '' END) AS sku, s.name AS size
FROM products p 
INNER JOIN `metadata` AS m ON p.id = m.product_id  
INNER JOIN `term_relationships` AS tr ON p.id = tr.object_id 
INNER JOIN `term_taxonomy` AS tt ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'
INNER JOIN `terms` AS s ON tt.term_id = s.term_id
GROUP BY p.id;

If you still find that your query is slow then add the EXPLAIN plan of my query so I can find which columns needs INDEX.

如果您仍然发现您的查询很慢,那么添加我的查询的EXPLAIN计划,这样我就可以找到哪些列需要索引。

#5


0  

The below script is formatted as per SQL Server rules - You can change this as per MySQL rules and give it a try -

下面的脚本是按照SQL Server规则进行格式化的——您可以根据MySQL规则更改它,并尝试——

SELECT 
  P.id,
  P.name,
  PIVOT_METADATA.price,
  PIVOT_METADATA.sku,
  size.name AS size
FROM products P (NOLOCK)

INNER JOIN term_relationships AS tr (NOLOCK)
    ON P.id = tr.object_id

INNER JOIN term_taxonomy AS tt (NOLOCK)
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'

INNER JOIN terms AS size (NOLOCK)
    ON tt.term_id = size.term_id

INNER JOIN METADATA (NOLOCK)
    PIVOT
    (
        MAX(value)
        FOR [meta_key] IN (price,sku)
    )AS PIVOT_METADATA
    ON P.id = PIVOT_METADATA.product_id

What I feel could be the bottleneck in your query - You are joining Metadata 2 times. Since there are 1-to-many relationships in your tables, the Metadata 2-join doesn't hurt but after that as you join more tables - the number of rows due to 1-to-many relationship increase - and hence the prformance drops.

我认为这可能是您查询的瓶颈——您将2次加入元数据。由于您的表中有一到多个关系,元数据2-join不会有什么问题,但是在这之后,当您加入更多的表时(由于1到多个关系而增加的行数),性能就会下降。

What I've tried to achieve - I'm making sure that as many 1-to-1 relationships are fulfilled as possible. To do this, I've done a Pivot on Metadata adn made price & sku as columns. Now my product id shall have only one row in Metadata pivot. alos, I've made sure that I join this picot at the very end.

我想要实现的是——我要确保尽可能多的一对一的关系得到满足。为此,我对元数据adn made price & sku作为列进行了研究。现在,我的产品id在元数据透视中将只有一行。阿洛斯,我确保我在最后加入了这个picot。

Give it a try. Please share the expected performance, number of records you have & also what performance you get with my asnwer.

试一试。请分享预期的表现,你有多少记录,以及你得到什么表现与我的回答。

#6


0  

METADATA_TABLE and TERM_RELATIONSHIP_TABLE do not have any proimary key. When there are huge records in these tables your query performancy will be hit.

METADATA_TABLE和TERM_RELATIONSHIP_TABLE没有任何前置键。当这些表中有大量记录时,查询性能就会受到影响。

Checkpoints to increase your performance.

检查点以提高您的性能。

  1. All tables should have primary key. This is because rows in table will be physically sorted.
  2. 所有表都应该有主键。这是因为表中的行将被物理排序。
  3. For small and queries involving few tables keeping primary key in table would be enough. If you still wish to improve performance, create non-clustered-index for columns such as *object_Id field of term_relationships table* . Non-clustered index should be created for those columns in table which are taking part in join operation.
  4. 对于较小的查询,只包含几个表,而将主键保存在表中就足够了。如果您仍然希望提高性能,可以为诸如term_relationships表*的*object_Id字段之类的列创建非集群索引。应该为表中参与join操作的列创建非聚集索引。

However, point to be noted is that, non-clustered index should be very less on those tables where multiple insert and updates are happening. This is not a simple question and can’t be answered only based on run time. There are other factors that affect the answer especially if environment where a stored procedure is running is heavily transactional.

但是,需要注意的是,在发生多次插入和更新的表上,非集群索引的数量应该非常少。这不是一个简单的问题,并且只能基于运行时才能回答。还有其他影响答案的因素,尤其是在存储过程正在运行的环境严重为事务的情况下。

You can find more here

你可以在这里找到更多

#7


0  

I would suggest those:

我建议那些:

  • Consider reducing those joins from business level;
  • 考虑减少业务级别的连接;
  • If not possible to do from "top"(business level), and the data is not for real time, I would suggest to prepare a memory table(I know the solution is not ideal). And select your data from memory table directly.
  • 如果无法从“top”(业务级别)执行,且数据不是实时的,我建议准备一个内存表(我知道解决方案并不理想)。并直接从内存表中选择您的数据。

In my experience:

以我的经验:

  • "joins" is the killer for performance, the bigger your data is, the more pain you will feel;
  • “join”是性能的杀手,您的数据越大,您就会感到越痛苦;
  • Try to get rid of joins, not try to improve query performance by keeping joins unless you have to. Usually I will try to fix those issues from "top" to "bottom"
  • 尽量避免连接,除非必要,否则不要通过保持连接来提高查询性能。通常我会尝试从“顶部”到“底部”解决这些问题
  • The last suggestion is if all above don't work. I will consider "map/reduce + fulltext search", if that worth to do.
  • 最后一个建议是,如果以上这些都不管用。我将考虑“map/reduce + fulltext search”,如果有必要的话。

(Forgive me I didn't provide solution to improve your query performance.)

(对不起,我没有提供解决方案来提高您的查询性能。)