I am working on someone else's PHP code and seeing this pattern over and over:
我正在研究别人的PHP代码,一遍又一遍地看到这种模式:
(pseudocode)
result = SELECT blah1, blah2, foreign_key FROM foo WHERE key=bar
if foreign_key > 0
other_result = SELECT something FROM foo2 WHERE key=foreign_key
end
The code needs to branch if there is no related row in the other table, but couldn't this be done better by doing a LEFT JOIN in a single SELECT statement? Am I missing some performance benefit? Portability issue? Or am I just nitpicking?
如果另一个表中没有相关的行,代码需要分支,但是通过在单个SELECT语句中执行LEFT JOIN,这不能更好吗?我错过了一些性能优势吗?便携性问题?或者我只是在挑剔?
13 个解决方案
#1
There is not enough information to really answer the question. I've worked on applications where decreasing the query count for one reason and increasing the query count for another reason both gave performance improvements. In the same application!
没有足够的信息来真正回答这个问题。我曾致力于减少查询次数的应用程序,原因之一是由于其他原因而增加了查询次数,这些都提高了性能。在同一个应用程序中!
For certain combinations of table size, database configuration and how often the foreign table would be queried, doing the two queries can be much faster than a LEFT JOIN. But experience and testing is the only thing that will tell you that. MySQL with moderately large tables seems to be susceptable to this, IME. Performing three queries on one table can often be much faster than one query JOINing the three. I've seen speedups of an order of magnitude.
对于表大小,数据库配置以及查询外表的频率的某些组合,执行两个查询可以比LEFT JOIN快得多。但经验和测试是唯一可以告诉你的事情。具有中等大小的表的MySQL似乎是可以接受的,IME。在一个表上执行三个查询通常比一个查询加*个查询要快得多。我已经看到了一个数量级的加速。
#2
This is definitely wrong. You are going over the wire a second time for no reason. DBs are very fast at their problem space. Joining tables is one of those and you'll see more of a performance degradation from the second query then the join. Unless your tablespace is hundreds of millions of records, this is not a good idea.
这绝对是错误的。你无缘无故地第二次越过电线。数据库在他们的问题空间非常快。连接表就是其中之一,你会看到第二个查询然后连接的性能下降。除非你的表空间是数以亿计的记录,否则这不是一个好主意。
#3
I'm with you - a single SQL would be better
我和你在一起 - 一个SQL会更好
#4
There's a danger of treating your SQL DBMS as if it was a ISAM file system, selecting from a single table at a time. It might be cleaner to use a single SELECT with the outer join. On the other hand, detecting null in the application code and deciding what to do based on null vs non-null is also not completely clean.
存在将SQL DBMS视为ISAM文件系统的危险,一次从单个表中进行选择。将单个SELECT与外部联接一起使用可能更简洁。另一方面,在应用程序代码中检测null并基于null与非null决定做什么也不是完全干净的。
One advantage of a single statement - you have fewer round trips to the server - especially if the SQL is prepared dynamically each time the other result is needed.
单个语句的一个优点 - 您可以减少到服务器的往返次数 - 尤其是每次需要其他结果时动态准备SQL。
On average, then, a single SELECT statement is better. It gives the optimizer something to do and saves it getting too bored as well.
那么,平均而言,单个SELECT语句更好。它为优化器提供了一些操作,并将其保存得过于无聊。
#5
It seems to me that what you're saying is fairly valid - why fire off two calls to the database when one will do - unless both records are needed independently as objects(?)
在我看来,你所说的是相当有效的 - 为什么在一个人做的情况下发出两次对数据库的调用 - 除非两个记录都作为对象独立需要(?)
Of course while it might not be as simple code wise to pull it all back in one call from the database and separate out the fields into the two separate objects, it does mean that you're only dependent on the database for one call rather than two...
当然,虽然在数据库的一次调用中将它全部拉回来并将字段分成两个单独的对象可能不是那么简单的代码,但它确实意味着您只依赖于数据库进行一次调用而不是二...
This would be nicer to read as a query:
这可以作为查询更好地阅读:
Select a.blah1, a.blah2, b.something From foo a Left Join foo2 b On a.foreign_key = b.key Where a.Key = bar;
And this way you can check you got a result in one go and have the database do all the heavy lifting in one query rather than two...
通过这种方式,您可以检查您是否一次性获得结果并让数据库在一个查询中完成所有繁重的工作,而不是两个...
Yeah, I think it seems like what you're saying is correct.
是的,我认为你所说的似乎是正确的。
#6
The most likely explanation is that the developer simply doesn't know how outer joins work. This is very common, even among developers who are quite experienced in their own specialty.
最可能的解释是开发人员根本不知道外连接是如何工作的。这种情况非常普遍,即使是在自己专业方面经验丰富的开发人员也是如此。
There's also a widespread myth that "queries with joins are slow." So many developers blindly avoid joins at all costs, even to the extreme of running multiple queries where one would be better.
还有一个普遍的说法是“连接查询速度很慢”。因此,许多开发人员不惜一切代价盲目地避免加入,即使是在运行多个查询的情况下也会更好。
The myth of avoiding joins is like saying we should avoid writing loops in our application code, because running a line of code multiple times is obviously slower than running it once. To say nothing of the "overhead" of ++i
and testing i<20
during every iteration!
避免连接的神话就像说我们应该避免在应用程序代码中编写循环,因为多次运行一行代码显然比运行一次要慢。更不用说++ i的“开销”,并且在每次迭代期间测试i <20!
#7
You are completely correct that the single query is the way to go. To add some value to the other answers offered let me add this axiom: "Use the right tool for the job, the Database server should handle the querying work, the code should handle the procedural work."
你完全正确的是单一查询是要走的路。为了给其他答案添加一些价值,让我添加这个公理:“使用正确的工具来完成工作,数据库服务器应该处理查询工作,代码应该处理程序工作。”
The key idea behind this concept is that the compiler/query optimizers can do a better job if they know the entire problem domain instead of half of it.
这个概念背后的关键思想是,如果编译器/查询优化器知道整个问题域而不是其中的一半,那么编译器/查询优化器可以做得更好。
#8
Considering that in one database hit you have all the data you need having one single SQL statement would be better performance 99% of the time. Not sure if the connections is being creating dynamically in this case or not but if so doing so is expensive. Even if the process if reusing existing connections the DBMS is not getting optimize the queries be best way and not really making use of the relationships.
考虑到在一个数据库中,您拥有所需的所有数据,只需一个SQL语句就可以在99%的时间内获得更好的性能。在这种情况下不确定连接是否正在动态创建,但如果这样做则很昂贵。即使重用现有连接的过程,DBMS也没有优化查询,这是最好的方式,而不是真正利用关系。
The only way I could ever see doing the calls like this for performance reasons is if the data being retrieved by the foreign key is a large amount and it is only needed in some cases. But in the sample you describe it just grabs it if it exists so this is not the case and therefore not gaining any performance.
出于性能原因,我能看到这样的调用的唯一方法是,外键检索的数据量很大,并且只在某些情况下才需要。但是在你描述的样本中,它只是抓住它,如果它存在,所以情况并非如此,因此没有获得任何性能。
#9
The only "gotcha" to all of this is if the result set to work with contains a lot of joins, or even nested joins.
所有这一切的唯一“问题”是,如果要使用的结果集包含大量连接,甚至嵌套连接。
I've had two or three instances now where the original query I was inheriting consisted of a single query that had so a lot of joins in it and it would take the SQL a good minute to prepare the statement.
我现在有两三个实例,我继承的原始查询包含一个查询,其中有很多连接,并且SQL需要花费很长时间来准备语句。
I went back into the procedure, leveraged some table variables (or temporary tables) and broke the query down into a lot of the smaller single select type statements and constructed the final result set in this manner.
我回到过程中,利用了一些表变量(或临时表)并将查询分解为许多较小的单一选择类型语句,并以这种方式构造最终结果集。
This update dramatically fixed the response time, down to a few seconds, because it was easier to do a lot of simple "one shots" to retrieve the necessary data.
这个更新大大地修复了响应时间,只需几秒钟,因为更容易做很多简单的“一次性”来检索必要的数据。
I'm not trying to object for objections sake here, but just to point out that the code may have been broken down to such a granular level to address a similar issue.
我不打算在此反对反对意见,而只是指出代码可能已被细分到如此精细的级别以解决类似的问题。
#10
A single SQL query would lead in more performance as the SQL server (Which sometimes doesn't share the same location) just needs to handle one request, if you would use multiple SQL queries then you introduce a lot of overhead:
单个SQL查询会带来更高的性能,因为SQL服务器(有时不共享相同的位置)只需要处理一个请求,如果您使用多个SQL查询,那么您会引入大量开销:
Executing more CPU instructions, sending a second query to the server, create a second thread on the server, execute possible more CPU instructions on the sever, destroy a second thread on the server, send the second results back.
执行更多CPU指令,向服务器发送第二个查询,在服务器上创建第二个线程,在服务器上执行可能的更多CPU指令,销毁服务器上的第二个线程,然后发回第二个结果。
There might be exceptional cases where the performance could be better, but for simple things you can't reach better performance by doing a bit more work.
可能存在性能可能更好的特殊情况,但对于简单的事情,通过做更多的工作无法达到更好的性能。
#11
Doing a simple two table join is usually the best way to go after this problem domain, however depending on the state of the tables and indexing, there are certain cases where it may be better to do the two select statements, but typically I haven't run into this problem until I started approaching 3-5 joined tables, not just 2.
做一个简单的两个表连接通常是解决这个问题域的最好方法,但是根据表的状态和索引,在某些情况下,做两个select语句可能会更好,但通常我没有'遇到这个问题,直到我开始接近3-5个连接表,而不仅仅是2。
Just make sure you have covering indexes on both tables to ensure you aren't scanning the disk for all records, that is the biggest performance hit a database gets (in my limited experience)
只需确保在两个表上都有覆盖索引,以确保您没有扫描磁盘上的所有记录,这是数据库获得的最大性能损失(在我有限的经验中)
#12
You should always try to minimize the number of query to the database when you can. Your example is perfect for only 1 query. This way you will be able later to cache more easily or to handle more request in same time because instead of always using 2-3 query that require a connexion, you will have only 1 each time.
您应该尽可能地尽量减少对数据库的查询次数。您的示例仅适用于1个查询。这样您以后可以更轻松地缓存或同时处理更多请求,因为不是总是使用需要连接的2-3查询,而是每次只有1个。
#13
There are many cases that will require different solutions and it isn't possible to explain all together.
有许多情况需要不同的解决方案,因此无法一起解释。
Join scans both the tables and loops to match the first table record in second table. Simple select query will work faster in many cases as It only take cares for the primary/unique key(if exists) to search the data internally.
Join扫描表和循环以匹配第二个表中的第一个表记录。在许多情况下,简单选择查询将更快地工作,因为它只关注主/唯一密钥(如果存在)在内部搜索数据。
#1
There is not enough information to really answer the question. I've worked on applications where decreasing the query count for one reason and increasing the query count for another reason both gave performance improvements. In the same application!
没有足够的信息来真正回答这个问题。我曾致力于减少查询次数的应用程序,原因之一是由于其他原因而增加了查询次数,这些都提高了性能。在同一个应用程序中!
For certain combinations of table size, database configuration and how often the foreign table would be queried, doing the two queries can be much faster than a LEFT JOIN. But experience and testing is the only thing that will tell you that. MySQL with moderately large tables seems to be susceptable to this, IME. Performing three queries on one table can often be much faster than one query JOINing the three. I've seen speedups of an order of magnitude.
对于表大小,数据库配置以及查询外表的频率的某些组合,执行两个查询可以比LEFT JOIN快得多。但经验和测试是唯一可以告诉你的事情。具有中等大小的表的MySQL似乎是可以接受的,IME。在一个表上执行三个查询通常比一个查询加*个查询要快得多。我已经看到了一个数量级的加速。
#2
This is definitely wrong. You are going over the wire a second time for no reason. DBs are very fast at their problem space. Joining tables is one of those and you'll see more of a performance degradation from the second query then the join. Unless your tablespace is hundreds of millions of records, this is not a good idea.
这绝对是错误的。你无缘无故地第二次越过电线。数据库在他们的问题空间非常快。连接表就是其中之一,你会看到第二个查询然后连接的性能下降。除非你的表空间是数以亿计的记录,否则这不是一个好主意。
#3
I'm with you - a single SQL would be better
我和你在一起 - 一个SQL会更好
#4
There's a danger of treating your SQL DBMS as if it was a ISAM file system, selecting from a single table at a time. It might be cleaner to use a single SELECT with the outer join. On the other hand, detecting null in the application code and deciding what to do based on null vs non-null is also not completely clean.
存在将SQL DBMS视为ISAM文件系统的危险,一次从单个表中进行选择。将单个SELECT与外部联接一起使用可能更简洁。另一方面,在应用程序代码中检测null并基于null与非null决定做什么也不是完全干净的。
One advantage of a single statement - you have fewer round trips to the server - especially if the SQL is prepared dynamically each time the other result is needed.
单个语句的一个优点 - 您可以减少到服务器的往返次数 - 尤其是每次需要其他结果时动态准备SQL。
On average, then, a single SELECT statement is better. It gives the optimizer something to do and saves it getting too bored as well.
那么,平均而言,单个SELECT语句更好。它为优化器提供了一些操作,并将其保存得过于无聊。
#5
It seems to me that what you're saying is fairly valid - why fire off two calls to the database when one will do - unless both records are needed independently as objects(?)
在我看来,你所说的是相当有效的 - 为什么在一个人做的情况下发出两次对数据库的调用 - 除非两个记录都作为对象独立需要(?)
Of course while it might not be as simple code wise to pull it all back in one call from the database and separate out the fields into the two separate objects, it does mean that you're only dependent on the database for one call rather than two...
当然,虽然在数据库的一次调用中将它全部拉回来并将字段分成两个单独的对象可能不是那么简单的代码,但它确实意味着您只依赖于数据库进行一次调用而不是二...
This would be nicer to read as a query:
这可以作为查询更好地阅读:
Select a.blah1, a.blah2, b.something From foo a Left Join foo2 b On a.foreign_key = b.key Where a.Key = bar;
And this way you can check you got a result in one go and have the database do all the heavy lifting in one query rather than two...
通过这种方式,您可以检查您是否一次性获得结果并让数据库在一个查询中完成所有繁重的工作,而不是两个...
Yeah, I think it seems like what you're saying is correct.
是的,我认为你所说的似乎是正确的。
#6
The most likely explanation is that the developer simply doesn't know how outer joins work. This is very common, even among developers who are quite experienced in their own specialty.
最可能的解释是开发人员根本不知道外连接是如何工作的。这种情况非常普遍,即使是在自己专业方面经验丰富的开发人员也是如此。
There's also a widespread myth that "queries with joins are slow." So many developers blindly avoid joins at all costs, even to the extreme of running multiple queries where one would be better.
还有一个普遍的说法是“连接查询速度很慢”。因此,许多开发人员不惜一切代价盲目地避免加入,即使是在运行多个查询的情况下也会更好。
The myth of avoiding joins is like saying we should avoid writing loops in our application code, because running a line of code multiple times is obviously slower than running it once. To say nothing of the "overhead" of ++i
and testing i<20
during every iteration!
避免连接的神话就像说我们应该避免在应用程序代码中编写循环,因为多次运行一行代码显然比运行一次要慢。更不用说++ i的“开销”,并且在每次迭代期间测试i <20!
#7
You are completely correct that the single query is the way to go. To add some value to the other answers offered let me add this axiom: "Use the right tool for the job, the Database server should handle the querying work, the code should handle the procedural work."
你完全正确的是单一查询是要走的路。为了给其他答案添加一些价值,让我添加这个公理:“使用正确的工具来完成工作,数据库服务器应该处理查询工作,代码应该处理程序工作。”
The key idea behind this concept is that the compiler/query optimizers can do a better job if they know the entire problem domain instead of half of it.
这个概念背后的关键思想是,如果编译器/查询优化器知道整个问题域而不是其中的一半,那么编译器/查询优化器可以做得更好。
#8
Considering that in one database hit you have all the data you need having one single SQL statement would be better performance 99% of the time. Not sure if the connections is being creating dynamically in this case or not but if so doing so is expensive. Even if the process if reusing existing connections the DBMS is not getting optimize the queries be best way and not really making use of the relationships.
考虑到在一个数据库中,您拥有所需的所有数据,只需一个SQL语句就可以在99%的时间内获得更好的性能。在这种情况下不确定连接是否正在动态创建,但如果这样做则很昂贵。即使重用现有连接的过程,DBMS也没有优化查询,这是最好的方式,而不是真正利用关系。
The only way I could ever see doing the calls like this for performance reasons is if the data being retrieved by the foreign key is a large amount and it is only needed in some cases. But in the sample you describe it just grabs it if it exists so this is not the case and therefore not gaining any performance.
出于性能原因,我能看到这样的调用的唯一方法是,外键检索的数据量很大,并且只在某些情况下才需要。但是在你描述的样本中,它只是抓住它,如果它存在,所以情况并非如此,因此没有获得任何性能。
#9
The only "gotcha" to all of this is if the result set to work with contains a lot of joins, or even nested joins.
所有这一切的唯一“问题”是,如果要使用的结果集包含大量连接,甚至嵌套连接。
I've had two or three instances now where the original query I was inheriting consisted of a single query that had so a lot of joins in it and it would take the SQL a good minute to prepare the statement.
我现在有两三个实例,我继承的原始查询包含一个查询,其中有很多连接,并且SQL需要花费很长时间来准备语句。
I went back into the procedure, leveraged some table variables (or temporary tables) and broke the query down into a lot of the smaller single select type statements and constructed the final result set in this manner.
我回到过程中,利用了一些表变量(或临时表)并将查询分解为许多较小的单一选择类型语句,并以这种方式构造最终结果集。
This update dramatically fixed the response time, down to a few seconds, because it was easier to do a lot of simple "one shots" to retrieve the necessary data.
这个更新大大地修复了响应时间,只需几秒钟,因为更容易做很多简单的“一次性”来检索必要的数据。
I'm not trying to object for objections sake here, but just to point out that the code may have been broken down to such a granular level to address a similar issue.
我不打算在此反对反对意见,而只是指出代码可能已被细分到如此精细的级别以解决类似的问题。
#10
A single SQL query would lead in more performance as the SQL server (Which sometimes doesn't share the same location) just needs to handle one request, if you would use multiple SQL queries then you introduce a lot of overhead:
单个SQL查询会带来更高的性能,因为SQL服务器(有时不共享相同的位置)只需要处理一个请求,如果您使用多个SQL查询,那么您会引入大量开销:
Executing more CPU instructions, sending a second query to the server, create a second thread on the server, execute possible more CPU instructions on the sever, destroy a second thread on the server, send the second results back.
执行更多CPU指令,向服务器发送第二个查询,在服务器上创建第二个线程,在服务器上执行可能的更多CPU指令,销毁服务器上的第二个线程,然后发回第二个结果。
There might be exceptional cases where the performance could be better, but for simple things you can't reach better performance by doing a bit more work.
可能存在性能可能更好的特殊情况,但对于简单的事情,通过做更多的工作无法达到更好的性能。
#11
Doing a simple two table join is usually the best way to go after this problem domain, however depending on the state of the tables and indexing, there are certain cases where it may be better to do the two select statements, but typically I haven't run into this problem until I started approaching 3-5 joined tables, not just 2.
做一个简单的两个表连接通常是解决这个问题域的最好方法,但是根据表的状态和索引,在某些情况下,做两个select语句可能会更好,但通常我没有'遇到这个问题,直到我开始接近3-5个连接表,而不仅仅是2。
Just make sure you have covering indexes on both tables to ensure you aren't scanning the disk for all records, that is the biggest performance hit a database gets (in my limited experience)
只需确保在两个表上都有覆盖索引,以确保您没有扫描磁盘上的所有记录,这是数据库获得的最大性能损失(在我有限的经验中)
#12
You should always try to minimize the number of query to the database when you can. Your example is perfect for only 1 query. This way you will be able later to cache more easily or to handle more request in same time because instead of always using 2-3 query that require a connexion, you will have only 1 each time.
您应该尽可能地尽量减少对数据库的查询次数。您的示例仅适用于1个查询。这样您以后可以更轻松地缓存或同时处理更多请求,因为不是总是使用需要连接的2-3查询,而是每次只有1个。
#13
There are many cases that will require different solutions and it isn't possible to explain all together.
有许多情况需要不同的解决方案,因此无法一起解释。
Join scans both the tables and loops to match the first table record in second table. Simple select query will work faster in many cases as It only take cares for the primary/unique key(if exists) to search the data internally.
Join扫描表和循环以匹配第二个表中的第一个表记录。在许多情况下,简单选择查询将更快地工作,因为它只关注主/唯一密钥(如果存在)在内部搜索数据。