I am working on rewriting some poorly written SQL queries and they are over-utilizing sub-queries. I am looking for best-practices regarding the use of sub-queries.
我正在努力重写一些写得不好的SQL查询,并且他们过度利用子查询。我正在寻找有关子查询使用的最佳实践。
Any help would be appreciated.
任何帮助,将不胜感激。
3 个解决方案
#1
42
Subqueries are usually fine unless they are dependent subqueries (also known as correlated subqueries). If you are only using independent subqueries and they are using appropriate indexes then they should run quickly. If you have a dependent subquery you might run into performance problems because a dependent subquery typically needs to be run once for each row in the outer query. So if your outer query has 1000 rows, the subquery will be run 1000 times. On the other hand an independent subquery typically only needs to be evaluated once.
子查询通常很好,除非它们是从属子查询(也称为相关子查询)。如果您只使用独立子查询并且他们使用适当的索引,那么它们应该快速运行。如果您有从属子查询,则可能会遇到性能问题,因为依赖子查询通常需要为外部查询中的每一行运行一次。因此,如果外部查询有1000行,则子查询将运行1000次。另一方面,独立子查询通常只需要评估一次。
If you're not sure what is meant by a subquery being dependent or independent here's a rule of thumb - if you can take the subquery, remove it from its context, run it, and get a result set then it's an independent subquery
.
如果您不确定子查询的依赖或独立意味着什么是经验法则 - 如果您可以使用子查询,将其从上下文中删除,运行它,并获得结果集,那么它就是一个独立的子查询。
If you get a syntax error because it refers to some tables outside of the subquery then its a dependent subquery
.
如果您遇到语法错误,因为它引用子查询之外的某些表,那么它是一个从属子查询。
The general rule of course has a few exceptions. For example:
一般规则当然有一些例外。例如:
- Many optimizers can take a dependent subquery and find a way to run it efficiently as a JOIN. For example an NOT EXISTS query might result in an ANTI JOIN query plan, so it will not necessarily be any slower than writing the query with a JOIN.
- 许多优化器可以使用从属子查询并找到一种方法来有效地将其作为JOIN运行。例如,NOT EXISTS查询可能会导致ANTI JOIN查询计划,因此它不一定比使用JOIN编写查询慢。
- MySQL has a bug where an independent subquery inside an IN expression is incorrectly identified as a dependent subquery and so a suboptimal query plan is used. This is apparently fixed in the very newest versions of MySQL.
- MySQL有一个错误,其中IN表达式中的独立子查询被错误地标识为从属子查询,因此使用了次优的查询计划。这显然是在最新版本的MySQL中修复的。
If performance is an issue then measure your specific queries and see what works best for you.
如果性能是一个问题,那么衡量您的具体查询,看看什么最适合您。
#2
5
There is no silver bullet here. Each and every usage has to be independently assessed. There are some cases where correlated subqueries are plain inefficient, this one below is better written as a JOIN
这里没有银弹。每个用法都必须独立评估。在某些情况下,相关子查询效率很低,下面这个更好地写为JOIN
select nickname, (select top 1 votedate from votes where user_id=u.id order by 1 desc)
from users u
On the other hand, EXISTS and NOT EXISTS queries will win out over JOINs.
另一方面,EXISTS和NOT EXISTS查询将胜过JOIN。
select ...
where NOT EXISTS (.....)
Is normally faster than
通常比快
select ...
FROM A LEFT JOIN B
where B.ID is null
Yet even these generalizations can be untrue for any particular schema and data distribution.
然而,即使这些概括对于任何特定的模式和数据分布也是不正确的。
#3
4
Unfortunately the answer greatly depends on the sql server you're using. In theory, joins are better from a pure-relational-theory point of view. They let the server do the right thing under the hood and gives them more control and thus in the end can be faster. If the server is implemented well. In practice, some SQL servers perform better if you trick it into optimizing it's queries through sub-queries and the like.
不幸的是,答案很大程度上取决于你正在使用的sql server。理论上,从纯关系理论的角度来看,连接更好。他们让服务器在引擎盖下做正确的事情,并给予他们更多的控制,因此最终可以更快。如果服务器实现良好。在实践中,如果您通过子查询等方式欺骗它来优化查询,则某些SQL服务器的性能会更好。
#1
42
Subqueries are usually fine unless they are dependent subqueries (also known as correlated subqueries). If you are only using independent subqueries and they are using appropriate indexes then they should run quickly. If you have a dependent subquery you might run into performance problems because a dependent subquery typically needs to be run once for each row in the outer query. So if your outer query has 1000 rows, the subquery will be run 1000 times. On the other hand an independent subquery typically only needs to be evaluated once.
子查询通常很好,除非它们是从属子查询(也称为相关子查询)。如果您只使用独立子查询并且他们使用适当的索引,那么它们应该快速运行。如果您有从属子查询,则可能会遇到性能问题,因为依赖子查询通常需要为外部查询中的每一行运行一次。因此,如果外部查询有1000行,则子查询将运行1000次。另一方面,独立子查询通常只需要评估一次。
If you're not sure what is meant by a subquery being dependent or independent here's a rule of thumb - if you can take the subquery, remove it from its context, run it, and get a result set then it's an independent subquery
.
如果您不确定子查询的依赖或独立意味着什么是经验法则 - 如果您可以使用子查询,将其从上下文中删除,运行它,并获得结果集,那么它就是一个独立的子查询。
If you get a syntax error because it refers to some tables outside of the subquery then its a dependent subquery
.
如果您遇到语法错误,因为它引用子查询之外的某些表,那么它是一个从属子查询。
The general rule of course has a few exceptions. For example:
一般规则当然有一些例外。例如:
- Many optimizers can take a dependent subquery and find a way to run it efficiently as a JOIN. For example an NOT EXISTS query might result in an ANTI JOIN query plan, so it will not necessarily be any slower than writing the query with a JOIN.
- 许多优化器可以使用从属子查询并找到一种方法来有效地将其作为JOIN运行。例如,NOT EXISTS查询可能会导致ANTI JOIN查询计划,因此它不一定比使用JOIN编写查询慢。
- MySQL has a bug where an independent subquery inside an IN expression is incorrectly identified as a dependent subquery and so a suboptimal query plan is used. This is apparently fixed in the very newest versions of MySQL.
- MySQL有一个错误,其中IN表达式中的独立子查询被错误地标识为从属子查询,因此使用了次优的查询计划。这显然是在最新版本的MySQL中修复的。
If performance is an issue then measure your specific queries and see what works best for you.
如果性能是一个问题,那么衡量您的具体查询,看看什么最适合您。
#2
5
There is no silver bullet here. Each and every usage has to be independently assessed. There are some cases where correlated subqueries are plain inefficient, this one below is better written as a JOIN
这里没有银弹。每个用法都必须独立评估。在某些情况下,相关子查询效率很低,下面这个更好地写为JOIN
select nickname, (select top 1 votedate from votes where user_id=u.id order by 1 desc)
from users u
On the other hand, EXISTS and NOT EXISTS queries will win out over JOINs.
另一方面,EXISTS和NOT EXISTS查询将胜过JOIN。
select ...
where NOT EXISTS (.....)
Is normally faster than
通常比快
select ...
FROM A LEFT JOIN B
where B.ID is null
Yet even these generalizations can be untrue for any particular schema and data distribution.
然而,即使这些概括对于任何特定的模式和数据分布也是不正确的。
#3
4
Unfortunately the answer greatly depends on the sql server you're using. In theory, joins are better from a pure-relational-theory point of view. They let the server do the right thing under the hood and gives them more control and thus in the end can be faster. If the server is implemented well. In practice, some SQL servers perform better if you trick it into optimizing it's queries through sub-queries and the like.
不幸的是,答案很大程度上取决于你正在使用的sql server。理论上,从纯关系理论的角度来看,连接更好。他们让服务器在引擎盖下做正确的事情,并给予他们更多的控制,因此最终可以更快。如果服务器实现良好。在实践中,如果您通过子查询等方式欺骗它来优化查询,则某些SQL服务器的性能会更好。