sql查询连接多个表 - 太慢(8个表)

时间:2022-05-25 04:21:21

i'm trying to join 8 tables into one in order to create index used by other application, my query is like : (my mysql skill's very amateur)

我正在尝试将8个表连接成一个以创建其他应用程序使用的索引,我的查询就像:(我的mysql技能非常业余)

SELECT t1_id, t2_name, t3_name, t4_name, t5_name, 
       t6_name, t7_name, t8_name, t9_name 
FROM t1 
  LEFT JOIN t2 ON (t1_id = t2_id) 
  LEFT JOIN t3 ON (t3_id = t1_id) 
  LEFT JOIN t4 ON (t4_id = t1_id)
  LEFT JOIN t5 ON (t5_id = t1_id)
  LEFT JOIN t6 ON (t6_id = t1_id) 
  LEFT JOIN t7 ON (t7_id = t1_id)
  LEFT JOIN t8 ON (t8_id = t1_id)
  LEFT JOIN t9 ON (t9_id = t1_id)

i can't even see the query results when i executing it, any ways to speed it up? :) any kinds of help is appreciated, but it's better be only one query (outside application rules)

当我执行它时,我甚至无法看到查询结果,任何加速它的方法? :)赞赏任何种类的帮助,但最好只有一个查询(外部应用程序规则)

thanks in advance

提前致谢

8 个解决方案

#1


50  

I had a similar problem with several lookup tables joining to a large table with all id fields indexed. To monitor the effect of the joins on query time execution, I ran my query several times (limiting to first 100 rows), adding a Join to an additional table each time. After joining 12 tables, there was no significant change in query execution time. By the time I had joined the 13th table the execution time jumped to a 1 second; 14th table 4 seconds, 15th table 20 s, 16th 90 seconds.

我有一个类似的问题,几个查找表连接到一个所有id字段索引的大表。为了监视联接对查询时间执行的影响,我多次运行我的查询(限制为前100行),每次都向另一个表添加一个Join。加入12个表后,查询执行时间没有显着变化。当我加入第13个表时,执行时间跳到1秒;第14桌4秒,第15桌20秒,第16秒90秒。

Keijro's suggestion to use a correlated subqueries instead of joins e.g.

Keijro建议使用相关子查询而不是连接,例如

SELECT t1_id, 
        (select t2_name from t2 where t1_id = t2_id), 
        (select t3_name from t3 where t1_id = t3_id), 
        (select t4_name from t4 where t1_id = t4_id), 
        (select t5_name from t5 where t1_id = t5_id), 
        (select t6_name from t6 where t1_id = t6_id), 
        (select t7_name from t7 where t1_id = t7_id), 
        (select t8_name from t8 where t1_id = t8_id), 
        (select t9_name from t9 where t1_id = t9_id)  FROM t1

improved query performance dramatically. In fact the subqueries did not seem to lengthen the time to execute the query (the query was almost instanteous).

显着提高查询性能。事实上,子查询似乎没有延长执行查询的时间(查询几乎是不稳定的)。

I am a little suprised as I thought correlated subqueries perform worse than joins.

我有点惊讶,因为我认为相关子查询比连接更糟糕。

#2


28  

Depending on how much data is in the tables, you may need to place indexes on the columns that are being joined against. Often slow querying speed comes down to lack of an index in the right place.

根据表中的数据量,您可能需要在要连接的列上放置索引。查询速度通常很慢,因为缺少正确的索引。

Also:

也:

LEFT JOINs are slower than INNER JOINs (though this is dependent on what you're doing exactly) - can you accomplish what you're looking for with inner joins?

LEFT JOIN比INNER JOIN慢(虽然这取决于你正在做的事情) - 你能用内连接完成你想要的东西吗?

#3


5  

How much data are we talking about ? It might be you have a lot of data and as the where clause is being run at the end of the query process you are joining huge volumes of data before filtering it.

我们谈论了多少数据?可能是您拥有大量数据,并且在查询过程结束时运行where子句,您在过滤之前会加入大量数据。

In that case its better to filter the data as soon as possible so if you can restrict the data from T1 in the first inner select all the other joins will join to a more limited set of data.

在这种情况下,最好尽快过滤数据,因此,如果您可以在第一个内部选择中限制T1中的数据,则所有其他连接将连接到更有限的数据集。

Select <your fields> from
(
Select * from t1 where t1_id = t1_value
) t1

Inner join t2
on t1.ID = t2.ID
...

if its not masses of data; check your indexes are correct then check server type things; index fragmentation; disk queues etc.

如果它不是大量的数据;检查索引是否正确然后检查服务器类型的东西;索引碎片;磁盘队列等

#4


4  

It would help a bit if you could post the explain plan of the query.

如果您可以发布查询的解释计划,那会有所帮助。

But, first of all, you have indexes on all the fields used in the join? something like CREATE INDEX ix_t2_id on t2 (t2_id, t2_name);

但是,首先,您在连接中使用的所有字段都有索引?类似于t2上的CREATE INDEX ix_t2_id(t2_id,t2_name);

Instead of the joins you could do something like

而不是连接,你可以做类似的事情

SELECT t1_id, 
    (select t2_name from t2 where t1_id = t2_id), 
    (select t3_name from t3 where t1_id = t3_id), 
    (select t4_name from t4 where t1_id = t4_id), 
    (select t5_name from t5 where t1_id = t5_id), 
    (select t6_name from t6 where t1_id = t6_id), 
    (select t7_name from t7 where t1_id = t7_id), 
    (select t8_name from t8 where t1_id = t8_id), 
    (select t9_name from t9 where t1_id = t9_id) 
FROM t1 

But, with a good query planner, that shouldn't differ from the joins.

但是,有了一个好的查询规划器,它应该与连接不同。

#5


1  

If you need all the rows of t1, and you left join on the primary key (I guess it's also the clustered index) of the other tables, there is no way to improve the speed of the query.

如果您需要t1的所有行,并且您在其他表的主键(我猜它也是聚簇索引)上保持连接,则无法提高查询的速度。

To improve performance you either need to reduce the result set or perform a nasty trick (eg make a denormalized copy of the data).

要提高性能,您需要减少结果集或执行令人讨厌的技巧(例如,制作数据的非规范化副本)。

#6


1  

From your query plan I can conclude that the tables referred to as s, n and q do not have an index on the field they are being joined on.

根据您的查询计划,我可以得出结论,被称为s,n和q的表在它们被连接的字段上没有索引。

Since there are lot of rows in these tables (about 400,000 rows in their cartesian product) and MySQL's only way to do JOIN's is using NESTED LOOPS, it will really take forever.

由于这些表中有很多行(其笛卡尔产品中约有400,000行)而且MySQL使用NESTED LOOPS的唯一方法就是使用NESTED LOOPS,它真的需要永远。

Create an index on these tables or define the joined field as a PRIMARY KEY.

在这些表上创建索引或将连接的字段定义为PRIMARY KEY。

#7


0  

As i can see, t1 table is the one which is being joined with all the tables, instead of putting them in a single query with so many joins, you can possibly try a Union of different queries something like this.

正如我所看到的,t1表是与所有表连接的表,而不是将它们放在具有如此多连接的单个查询中,您可以尝试使用这样的不同查询的联合。

SELECT  t1_id, t2_name 
FROM    t1 LEFT JOIN t2 ON (t1_id = t2_id)
union 
SELECT  t1_id, t3_name 
FROM    t1 LEFT JOIN t3 ON (t1_id = t3_id)

however, in that case the result you will get will not have 8 columns but just 1 column. not sure if that is an option available with you.

但是,在这种情况下,您将获得的结果将不会有8列,而只有1列。不确定这是否是您的选择。

one more thing, which you must in whatever solution you implement is - create appropriate index on all your tables. the best practice of index columns is to create it on the column which is most frequently used for joins or where clause.

还有一件事,你必须在你实现的任何解决方案中 - 在所有表上创建适当的索引。索引列的最佳实践是在最常用于连接或where子句的列上创建它。

#8


-1  

Depending on your version of SQL server, simply putting your query into a stored procedure may make a big difference. Try this after you have tried the other optimizations first.(Yes, I know there are cached execution plans and other internal server optimizations, but in my practical real-world experience, stored procedures can execute faster.)

根据您的SQL Server版本,只需将查询放入存储过程可能会产生很大的不同。在您首先尝试其他优化之后尝试这一点。(是的,我知道有缓存的执行计划和其他内部服务器优化,但在我的实际实际经验中,存储过程可以更快地执行。)

#1


50  

I had a similar problem with several lookup tables joining to a large table with all id fields indexed. To monitor the effect of the joins on query time execution, I ran my query several times (limiting to first 100 rows), adding a Join to an additional table each time. After joining 12 tables, there was no significant change in query execution time. By the time I had joined the 13th table the execution time jumped to a 1 second; 14th table 4 seconds, 15th table 20 s, 16th 90 seconds.

我有一个类似的问题,几个查找表连接到一个所有id字段索引的大表。为了监视联接对查询时间执行的影响,我多次运行我的查询(限制为前100行),每次都向另一个表添加一个Join。加入12个表后,查询执行时间没有显着变化。当我加入第13个表时,执行时间跳到1秒;第14桌4秒,第15桌20秒,第16秒90秒。

Keijro's suggestion to use a correlated subqueries instead of joins e.g.

Keijro建议使用相关子查询而不是连接,例如

SELECT t1_id, 
        (select t2_name from t2 where t1_id = t2_id), 
        (select t3_name from t3 where t1_id = t3_id), 
        (select t4_name from t4 where t1_id = t4_id), 
        (select t5_name from t5 where t1_id = t5_id), 
        (select t6_name from t6 where t1_id = t6_id), 
        (select t7_name from t7 where t1_id = t7_id), 
        (select t8_name from t8 where t1_id = t8_id), 
        (select t9_name from t9 where t1_id = t9_id)  FROM t1

improved query performance dramatically. In fact the subqueries did not seem to lengthen the time to execute the query (the query was almost instanteous).

显着提高查询性能。事实上,子查询似乎没有延长执行查询的时间(查询几乎是不稳定的)。

I am a little suprised as I thought correlated subqueries perform worse than joins.

我有点惊讶,因为我认为相关子查询比连接更糟糕。

#2


28  

Depending on how much data is in the tables, you may need to place indexes on the columns that are being joined against. Often slow querying speed comes down to lack of an index in the right place.

根据表中的数据量,您可能需要在要连接的列上放置索引。查询速度通常很慢,因为缺少正确的索引。

Also:

也:

LEFT JOINs are slower than INNER JOINs (though this is dependent on what you're doing exactly) - can you accomplish what you're looking for with inner joins?

LEFT JOIN比INNER JOIN慢(虽然这取决于你正在做的事情) - 你能用内连接完成你想要的东西吗?

#3


5  

How much data are we talking about ? It might be you have a lot of data and as the where clause is being run at the end of the query process you are joining huge volumes of data before filtering it.

我们谈论了多少数据?可能是您拥有大量数据,并且在查询过程结束时运行where子句,您在过滤之前会加入大量数据。

In that case its better to filter the data as soon as possible so if you can restrict the data from T1 in the first inner select all the other joins will join to a more limited set of data.

在这种情况下,最好尽快过滤数据,因此,如果您可以在第一个内部选择中限制T1中的数据,则所有其他连接将连接到更有限的数据集。

Select <your fields> from
(
Select * from t1 where t1_id = t1_value
) t1

Inner join t2
on t1.ID = t2.ID
...

if its not masses of data; check your indexes are correct then check server type things; index fragmentation; disk queues etc.

如果它不是大量的数据;检查索引是否正确然后检查服务器类型的东西;索引碎片;磁盘队列等

#4


4  

It would help a bit if you could post the explain plan of the query.

如果您可以发布查询的解释计划,那会有所帮助。

But, first of all, you have indexes on all the fields used in the join? something like CREATE INDEX ix_t2_id on t2 (t2_id, t2_name);

但是,首先,您在连接中使用的所有字段都有索引?类似于t2上的CREATE INDEX ix_t2_id(t2_id,t2_name);

Instead of the joins you could do something like

而不是连接,你可以做类似的事情

SELECT t1_id, 
    (select t2_name from t2 where t1_id = t2_id), 
    (select t3_name from t3 where t1_id = t3_id), 
    (select t4_name from t4 where t1_id = t4_id), 
    (select t5_name from t5 where t1_id = t5_id), 
    (select t6_name from t6 where t1_id = t6_id), 
    (select t7_name from t7 where t1_id = t7_id), 
    (select t8_name from t8 where t1_id = t8_id), 
    (select t9_name from t9 where t1_id = t9_id) 
FROM t1 

But, with a good query planner, that shouldn't differ from the joins.

但是,有了一个好的查询规划器,它应该与连接不同。

#5


1  

If you need all the rows of t1, and you left join on the primary key (I guess it's also the clustered index) of the other tables, there is no way to improve the speed of the query.

如果您需要t1的所有行,并且您在其他表的主键(我猜它也是聚簇索引)上保持连接,则无法提高查询的速度。

To improve performance you either need to reduce the result set or perform a nasty trick (eg make a denormalized copy of the data).

要提高性能,您需要减少结果集或执行令人讨厌的技巧(例如,制作数据的非规范化副本)。

#6


1  

From your query plan I can conclude that the tables referred to as s, n and q do not have an index on the field they are being joined on.

根据您的查询计划,我可以得出结论,被称为s,n和q的表在它们被连接的字段上没有索引。

Since there are lot of rows in these tables (about 400,000 rows in their cartesian product) and MySQL's only way to do JOIN's is using NESTED LOOPS, it will really take forever.

由于这些表中有很多行(其笛卡尔产品中约有400,000行)而且MySQL使用NESTED LOOPS的唯一方法就是使用NESTED LOOPS,它真的需要永远。

Create an index on these tables or define the joined field as a PRIMARY KEY.

在这些表上创建索引或将连接的字段定义为PRIMARY KEY。

#7


0  

As i can see, t1 table is the one which is being joined with all the tables, instead of putting them in a single query with so many joins, you can possibly try a Union of different queries something like this.

正如我所看到的,t1表是与所有表连接的表,而不是将它们放在具有如此多连接的单个查询中,您可以尝试使用这样的不同查询的联合。

SELECT  t1_id, t2_name 
FROM    t1 LEFT JOIN t2 ON (t1_id = t2_id)
union 
SELECT  t1_id, t3_name 
FROM    t1 LEFT JOIN t3 ON (t1_id = t3_id)

however, in that case the result you will get will not have 8 columns but just 1 column. not sure if that is an option available with you.

但是,在这种情况下,您将获得的结果将不会有8列,而只有1列。不确定这是否是您的选择。

one more thing, which you must in whatever solution you implement is - create appropriate index on all your tables. the best practice of index columns is to create it on the column which is most frequently used for joins or where clause.

还有一件事,你必须在你实现的任何解决方案中 - 在所有表上创建适当的索引。索引列的最佳实践是在最常用于连接或where子句的列上创建它。

#8


-1  

Depending on your version of SQL server, simply putting your query into a stored procedure may make a big difference. Try this after you have tried the other optimizations first.(Yes, I know there are cached execution plans and other internal server optimizations, but in my practical real-world experience, stored procedures can execute faster.)

根据您的SQL Server版本,只需将查询放入存储过程可能会产生很大的不同。在您首先尝试其他优化之后尝试这一点。(是的,我知道有缓存的执行计划和其他内部服务器优化,但在我的实际实际经验中,存储过程可以更快地执行。)