I am very new to MySQL and am trying to use LEFT Joins for the first time. While the results that are returned are what is expected the time the query takes is far too long.
我是MySQL的新手,我第一次尝试使用LEFT Joins。虽然返回的结果是预期的,但查询所花费的时间太长。
I have 3 tables that I am joining and when I run the queries separately the data is returned in 0.0002 seconds. With a join the query takes up to 300 seconds.
我有3个表正在加入,当我单独运行查询时,数据以0.0002秒的形式返回。使用连接时,查询最多需要300秒。
evmbd has 6012 rows that I need to return and I just want one column from both the xlba and 3pla tables to be joined on the naa
nambd有6012行,我需要返回,我只想要xlba和3pla表中的一列加入naa
I have been reading up on optimizing LEFT JOIN queries and have added in INDEX on all 3 tables on the naa columns.
我一直在阅读有关优化LEFT JOIN查询的内容,并已在INDA中添加了naa列上的所有3个表。
SELECT e.cluster,e.vm,e.TotalCapacityGB,e.naa,p.array,x.array
FROM evmbd AS e
LEFT JOIN xlba AS x ON x.naa like replace(e.naa,'naa.','') AND x.date = (select max(date) from xlba )
LEFT JOIN 3pla AS p ON p.naa = replace(e.naa,'naa.','') AND p.date = (select max(date) from 3pla )
WHERE e.date = (select max(date) from evmbd)
GROUP BY e.cluster, e.vm
ORDER BY e.cluster, e.vm
Any help would be amazing as well as any documentation on advanced queries
任何帮助都是惊人的,以及有关高级查询的任何文档
3 个解决方案
#1
1
I don't know your tables but maybe this could be helpful:
我不知道你的桌子,但也许这可能会有所帮助:
DECLARE @MAX_XLBA DATETIME
DECLARE @MAX_3PLA DATETIME
DECLARE @MAX_EVMBD DATETIME
SET @MAX_XLBA = (select max(date) from xlba)
SET @MAX_3PLA = (select max(date) from 3pla)
SET @MAX_EVMBD = (select max(date) from evmbd)
SELECT e.cluster,e.vm,e.TotalCapacityGB,e.naa,p.array,x.array
FROM evmbd AS e
LEFT JOIN xlba AS x ON x.naa = LTRIM(RTRIM(SUBSTRING(e.naa,4,15))) AND x.date = @MAX_XLBA
LEFT JOIN 3pla AS p ON p.naa = LTRIM(RTRIM(SUBSTRING(e.naa,4,15))) AND p.date = @MAX_3PLA
WHERE e.date = @MAX_EVMBD
Maybe if the dates are not calculated at the time the query is running, the performance could be better. Same for the replace you were doing.
也许如果在查询运行时不计算日期,性能可能会更好。你正在做的替换也一样。
Regards
问候
#2
1
When you join on the replace function, then it has to evaluate every row with the replace to see if it matches. That is making you select every record, then run the replace. Additionally, when your second comparison is a subselect using a field from the current row. So, for every possible row in e, you are testing against every row in x, and for every one of those, you are testing against every row of p. And for every single one of xep you are also running another select query. You are certainly giving the server a workout. There are a bunch of things you can do to speed this up.
当您加入替换函数时,它必须使用替换来评估每一行以查看它是否匹配。这使您选择每条记录,然后运行替换。此外,当您的第二个比较是使用当前行中的字段的子选择。因此,对于e中的每个可能的行,您将对x中的每一行进行测试,并且对于每一行,您都要针对p的每一行进行测试。对于xep中的每一个,您还运行另一个选择查询。你肯定给服务器一个锻炼。你可以做很多事情来加快速度。
- Make the naa fields consistent across all the tables. Remove the "naa." on e, or add it to the others, so you can do a direct comparison, instead of a like or replace function.
- 使所有表中的naa字段保持一致。删除“naa”。在e上,或将其添加到其他人,所以你可以直接比较,而不是喜欢或替换功能。
- prefetch the max dates you need instead of running subselects. Rodrigo had the right idea on that.
- 预取所需的最大日期而不是运行子选择。罗德里戈对此有正确的想法。
- If you can't do either of those, select the truncated values and matching dates into temp tables, and run your select with joins off of that.
- 如果您不能执行其中任何一项,请选择截断的值并将日期匹配到临时表中,然后运行带有连接的选择。
Doing #1 will probably eliminate most of the slowdown, but it depends on how many rows are matching your joins.
做#1可能会消除大部分减速,但这取决于你的联接匹配的行数。
For #2
In Rodrigo's post, he gave an example of this, but you don't need the trim functions. Also, for optimum performance, you should index the date field as well.
对于#2在罗德里戈的帖子中,他给出了一个例子,但你不需要修剪功能。此外,为了获得最佳性能,您还应该为日期字段建立索引。
SET @MAX_XLBA = (select max(date) from xlba)
SET @MAX_3PLA = (select max(date) from 3pla)
SET @MAX_EVMBD = (select max(date) from evmbd)
SELECT e.cluster,e.vm,e.TotalCapacityGB,e.naa,p.array,x.array FROM evmbd AS e
LEFT JOIN xlba AS x ON x.naa = e.naa AND x.date = @MAX_XLBA
LEFT JOIN 3pla AS p ON p.naa = e.naa AND p.date = @MAX_3PLA
WHERE e.date = @MAX_EVMBD
#3
-1
For projects with huge data amounts I got the same problem in SQLite. Too many joins is slowing down the program a lot. There, I used following solutions:
对于数据量巨大的项目,我在SQLite中遇到了同样的问题。太多的连接正在减慢程序的速度。在那里,我使用以下解决方案:
#1
1
I don't know your tables but maybe this could be helpful:
我不知道你的桌子,但也许这可能会有所帮助:
DECLARE @MAX_XLBA DATETIME
DECLARE @MAX_3PLA DATETIME
DECLARE @MAX_EVMBD DATETIME
SET @MAX_XLBA = (select max(date) from xlba)
SET @MAX_3PLA = (select max(date) from 3pla)
SET @MAX_EVMBD = (select max(date) from evmbd)
SELECT e.cluster,e.vm,e.TotalCapacityGB,e.naa,p.array,x.array
FROM evmbd AS e
LEFT JOIN xlba AS x ON x.naa = LTRIM(RTRIM(SUBSTRING(e.naa,4,15))) AND x.date = @MAX_XLBA
LEFT JOIN 3pla AS p ON p.naa = LTRIM(RTRIM(SUBSTRING(e.naa,4,15))) AND p.date = @MAX_3PLA
WHERE e.date = @MAX_EVMBD
Maybe if the dates are not calculated at the time the query is running, the performance could be better. Same for the replace you were doing.
也许如果在查询运行时不计算日期,性能可能会更好。你正在做的替换也一样。
Regards
问候
#2
1
When you join on the replace function, then it has to evaluate every row with the replace to see if it matches. That is making you select every record, then run the replace. Additionally, when your second comparison is a subselect using a field from the current row. So, for every possible row in e, you are testing against every row in x, and for every one of those, you are testing against every row of p. And for every single one of xep you are also running another select query. You are certainly giving the server a workout. There are a bunch of things you can do to speed this up.
当您加入替换函数时,它必须使用替换来评估每一行以查看它是否匹配。这使您选择每条记录,然后运行替换。此外,当您的第二个比较是使用当前行中的字段的子选择。因此,对于e中的每个可能的行,您将对x中的每一行进行测试,并且对于每一行,您都要针对p的每一行进行测试。对于xep中的每一个,您还运行另一个选择查询。你肯定给服务器一个锻炼。你可以做很多事情来加快速度。
- Make the naa fields consistent across all the tables. Remove the "naa." on e, or add it to the others, so you can do a direct comparison, instead of a like or replace function.
- 使所有表中的naa字段保持一致。删除“naa”。在e上,或将其添加到其他人,所以你可以直接比较,而不是喜欢或替换功能。
- prefetch the max dates you need instead of running subselects. Rodrigo had the right idea on that.
- 预取所需的最大日期而不是运行子选择。罗德里戈对此有正确的想法。
- If you can't do either of those, select the truncated values and matching dates into temp tables, and run your select with joins off of that.
- 如果您不能执行其中任何一项,请选择截断的值并将日期匹配到临时表中,然后运行带有连接的选择。
Doing #1 will probably eliminate most of the slowdown, but it depends on how many rows are matching your joins.
做#1可能会消除大部分减速,但这取决于你的联接匹配的行数。
For #2
In Rodrigo's post, he gave an example of this, but you don't need the trim functions. Also, for optimum performance, you should index the date field as well.
对于#2在罗德里戈的帖子中,他给出了一个例子,但你不需要修剪功能。此外,为了获得最佳性能,您还应该为日期字段建立索引。
SET @MAX_XLBA = (select max(date) from xlba)
SET @MAX_3PLA = (select max(date) from 3pla)
SET @MAX_EVMBD = (select max(date) from evmbd)
SELECT e.cluster,e.vm,e.TotalCapacityGB,e.naa,p.array,x.array FROM evmbd AS e
LEFT JOIN xlba AS x ON x.naa = e.naa AND x.date = @MAX_XLBA
LEFT JOIN 3pla AS p ON p.naa = e.naa AND p.date = @MAX_3PLA
WHERE e.date = @MAX_EVMBD
#3
-1
For projects with huge data amounts I got the same problem in SQLite. Too many joins is slowing down the program a lot. There, I used following solutions:
对于数据量巨大的项目,我在SQLite中遇到了同样的问题。太多的连接正在减慢程序的速度。在那里,我使用以下解决方案: