Let's say I have this query:
假设我有这个问题:
select * from table1 r where r.x = 5
Does the speed of this query depend on the number of rows that are present in table1
?
此查询的速度是否取决于table1中存在的行数?
6 个解决方案
#1
6
The are many factors on the speed of a query, one of which can be the number of rows.
查询速度有很多因素,其中一个可以是行数。
Others include:
其他包括:
- index strategy (if you index column "x", you will see better performance than if it's not indexed)
- 索引策略(如果索引列“x”,您将看到比未编入索引更好的性能)
- server load
- 服务器负载
- data caching - once you've executed a query, the data will be added to the data cache. So subsequent reruns will be much quicker as the data is coming from memory, not disk. Until such point where the data is removed from the cache
- 数据缓存 - 一旦执行了查询,数据就会被添加到数据缓存中。因此,随着数据来自内存而不是磁盘,后续重新运行会更快。直到从缓存中删除数据的那一点
- execution plan caching - to a lesser extent. Once a query is executed for the first time, the execution plan SQL Server comes up with will be cached for a period of time, for future executions to reuse.
- 执行计划缓存 - 在较小程度上。一旦第一次执行查询,SQL Server提出的执行计划将被缓存一段时间,以便将来执行重用。
- server hardware
- 服务器硬件
- the way you've written the query (often one of the biggest contibutors to poor performance!). e.g. writing something using a cursor instead of a set-based operation
- 你编写查询的方式(通常是表现不佳的最大的contibutors之一!)。例如使用游标而不是基于集合的操作来编写内容
For databases with a large number of rows in tables, partitioning is usually something to consider (with SQL Server 2005 onwards, Enterprise Edition there is built-in support). This is to split the data down into smaller units. Generally, smaller units = smaller tables = smaller indexes = better performance.
对于表中包含大量行的数据库,通常需要考虑分区(从SQL Server 2005开始,Enterprise Edition有内置支持)。这是将数据拆分为更小的单位。通常,较小的单位=较小的表=较小的索引=较好的性能。
#2
3
Yes, and it can be very significant.
是的,这可能非常重要。
If there's 100 million rows, SQL server has to go through each of them and see if it matches. That takes a lot more time compared to there being 10 rows.
如果有1亿行,SQL服务器必须遍历每个行并查看它是否匹配。与10行相比,这需要更多的时间。
You probably want an index on the 'x' column, in which case the sql server might check the index rather than going through all the rows - which can be significantly faster as the sql server might not even need to check all the values in the index.
你可能想要一个'x'列的索引,在这种情况下,sql server可能会检查索引而不是遍历所有行 - 这可能会明显更快,因为sql server可能甚至不需要检查所有的值指数。
On the other hand, if there's 100 million rows matching x = 5, it's slower than 10 rows.
另一方面,如果有1亿行匹配x = 5,则它比10行慢。
#3
1
Almost always yes. The real question is: what is the rate at which the query slows down as the table size increases? And the answer is: by not much if r.x is indexed, and by a large amount if not.
几乎总是肯定的。真正的问题是:随着表格大小的增加,查询速度变慢了多少?答案是:如果r.x被索引,则不是很多,如果没有,则大量的。
#4
1
Not the rows (to a certain degree of course) per se, but the amount of data (columns) is what can make a query slow. The data also needs to be transfered from the backend to the frontend.
不是行(在某种程度上当然)本身,但数据量(列)是可以使查询变慢的原因。数据也需要从后端传输到前端。
#5
1
The Answer is Yes. But not the only factor. if you did appropriate optimizations and tuning the performance drop will be negligible Main Performance factors
答案是肯定的。但不是唯一的因素。如果您进行了适当的优化并且调整性能下降将是可忽略不计的主要性能因素
- Indexing Clustered or None clustered
- 索引聚簇或无聚簇
- Data Caching
- 数据缓存
- Table Partitioning
- 表分区
- Execution Plan caching
- 执行计划缓存
- Data Distribution
- 数据分布
- Hardware specs
- 硬件规格
There are some other factors but these are mainly considered. Even how you designed your Schema makes effect on the performance.
还有一些其他因素,但主要考虑这些因素。即使您设计Schema的方式也会影响性能。
#6
1
You should assume that your query always depends on the number of rows. In fact, you should assume the worst case (linear or O(N)
for the example you provided) and exponential for more complex queries. There are database specific manuals filled with tricks to help you avoid the worst case but SQL itself is a language and doesn't specify how to execute your query. Instead, the database implementation decides how to execute any given query: if you have indexed a column or set of columns in your database then you will get O(log(N))
performance for a simple lookup; if the system has effective query caching you might get O(1)
response. Here is a good introductory article: High scalability: SQL and computational complexity
您应该假设您的查询总是取决于行数。实际上,您应该假设最坏的情况(您提供的示例的线性或O(N))和更复杂查询的指数。有一些数据库特定的手册充满了技巧,以帮助您避免最坏的情况,但SQL本身是一种语言,并没有指定如何执行您的查询。相反,数据库实现决定如何执行任何给定的查询:如果您已在数据库中索引了一列或一组列,那么您将获得简单查找的O(log(N))性能;如果系统具有有效的查询缓存,则可能会得到O(1)响应。这是一篇很好的介绍性文章:高可伸缩性:SQL和计算复杂性
#1
6
The are many factors on the speed of a query, one of which can be the number of rows.
查询速度有很多因素,其中一个可以是行数。
Others include:
其他包括:
- index strategy (if you index column "x", you will see better performance than if it's not indexed)
- 索引策略(如果索引列“x”,您将看到比未编入索引更好的性能)
- server load
- 服务器负载
- data caching - once you've executed a query, the data will be added to the data cache. So subsequent reruns will be much quicker as the data is coming from memory, not disk. Until such point where the data is removed from the cache
- 数据缓存 - 一旦执行了查询,数据就会被添加到数据缓存中。因此,随着数据来自内存而不是磁盘,后续重新运行会更快。直到从缓存中删除数据的那一点
- execution plan caching - to a lesser extent. Once a query is executed for the first time, the execution plan SQL Server comes up with will be cached for a period of time, for future executions to reuse.
- 执行计划缓存 - 在较小程度上。一旦第一次执行查询,SQL Server提出的执行计划将被缓存一段时间,以便将来执行重用。
- server hardware
- 服务器硬件
- the way you've written the query (often one of the biggest contibutors to poor performance!). e.g. writing something using a cursor instead of a set-based operation
- 你编写查询的方式(通常是表现不佳的最大的contibutors之一!)。例如使用游标而不是基于集合的操作来编写内容
For databases with a large number of rows in tables, partitioning is usually something to consider (with SQL Server 2005 onwards, Enterprise Edition there is built-in support). This is to split the data down into smaller units. Generally, smaller units = smaller tables = smaller indexes = better performance.
对于表中包含大量行的数据库,通常需要考虑分区(从SQL Server 2005开始,Enterprise Edition有内置支持)。这是将数据拆分为更小的单位。通常,较小的单位=较小的表=较小的索引=较好的性能。
#2
3
Yes, and it can be very significant.
是的,这可能非常重要。
If there's 100 million rows, SQL server has to go through each of them and see if it matches. That takes a lot more time compared to there being 10 rows.
如果有1亿行,SQL服务器必须遍历每个行并查看它是否匹配。与10行相比,这需要更多的时间。
You probably want an index on the 'x' column, in which case the sql server might check the index rather than going through all the rows - which can be significantly faster as the sql server might not even need to check all the values in the index.
你可能想要一个'x'列的索引,在这种情况下,sql server可能会检查索引而不是遍历所有行 - 这可能会明显更快,因为sql server可能甚至不需要检查所有的值指数。
On the other hand, if there's 100 million rows matching x = 5, it's slower than 10 rows.
另一方面,如果有1亿行匹配x = 5,则它比10行慢。
#3
1
Almost always yes. The real question is: what is the rate at which the query slows down as the table size increases? And the answer is: by not much if r.x is indexed, and by a large amount if not.
几乎总是肯定的。真正的问题是:随着表格大小的增加,查询速度变慢了多少?答案是:如果r.x被索引,则不是很多,如果没有,则大量的。
#4
1
Not the rows (to a certain degree of course) per se, but the amount of data (columns) is what can make a query slow. The data also needs to be transfered from the backend to the frontend.
不是行(在某种程度上当然)本身,但数据量(列)是可以使查询变慢的原因。数据也需要从后端传输到前端。
#5
1
The Answer is Yes. But not the only factor. if you did appropriate optimizations and tuning the performance drop will be negligible Main Performance factors
答案是肯定的。但不是唯一的因素。如果您进行了适当的优化并且调整性能下降将是可忽略不计的主要性能因素
- Indexing Clustered or None clustered
- 索引聚簇或无聚簇
- Data Caching
- 数据缓存
- Table Partitioning
- 表分区
- Execution Plan caching
- 执行计划缓存
- Data Distribution
- 数据分布
- Hardware specs
- 硬件规格
There are some other factors but these are mainly considered. Even how you designed your Schema makes effect on the performance.
还有一些其他因素,但主要考虑这些因素。即使您设计Schema的方式也会影响性能。
#6
1
You should assume that your query always depends on the number of rows. In fact, you should assume the worst case (linear or O(N)
for the example you provided) and exponential for more complex queries. There are database specific manuals filled with tricks to help you avoid the worst case but SQL itself is a language and doesn't specify how to execute your query. Instead, the database implementation decides how to execute any given query: if you have indexed a column or set of columns in your database then you will get O(log(N))
performance for a simple lookup; if the system has effective query caching you might get O(1)
response. Here is a good introductory article: High scalability: SQL and computational complexity
您应该假设您的查询总是取决于行数。实际上,您应该假设最坏的情况(您提供的示例的线性或O(N))和更复杂查询的指数。有一些数据库特定的手册充满了技巧,以帮助您避免最坏的情况,但SQL本身是一种语言,并没有指定如何执行您的查询。相反,数据库实现决定如何执行任何给定的查询:如果您已在数据库中索引了一列或一组列,那么您将获得简单查找的O(log(N))性能;如果系统具有有效的查询缓存,则可能会得到O(1)响应。这是一篇很好的介绍性文章:高可伸缩性:SQL和计算复杂性