CASE 1: I have a table with 30 columns and I query using 4 columns in the where clause.
情况1:我有一个包含30列的表,我在where子句中使用4列进行查询。
CASE 2: I have a table with 6 columns and I query using 4 columns in the where clause.
情况2:我有一个包含6列的表,我在where子句中使用4列进行查询。
What is the difference in performance in both cases?
两种情况下的表现有何不同?
For example i have table
例如我有桌子
table A
{
b varchar(10),
c varchar(10),
d varchar(10),
e varchar(10),
f varchar(10),
g varchar(10),
h varchar(10)
}
SELECT b,c,d
FROM A
WHERE f='foo'
create table B
{
b varchar(10),
c varchar(10),
d varchar(10),
e varchar(10),
f varchar(10)
}
SELECT b,c,d
FROM B
WHERE f='foo'
Both A And B table have same structure means only difference in number of column and column used in where condition is also same and column in select is also same. difference is that table B only have some unused column these are not being used in select and where condition in that case is there any difference in performance of both queries ?
A和B表具有相同的结构意味着在条件也相同且列中的列也相同时使用的列数和列数的差异。区别在于表B只有一些未使用的列,这些列未在select中使用,在这种情况下,两种查询的性能有何不同?
5 个解决方案
#1
11
Does the number of columns affect query performance?
列数是否会影响查询性能?
Yes, because the main benefit of returning fewer columns in a SELECT
is that SQL might be able to avoid reading from the table / cluster, and instead, if it can retrieve all the selected
data from an index (either as indexed columns and / or included columns in the case of a covering index).
是的,因为在SELECT中返回较少列的主要好处是SQL可能能够避免从表/集群中读取,而是,如果它可以从索引中检索所有选定的数据(作为索引列和/或在覆盖索引的情况下包括列)。
Obviously, the columns used in the predicate (where filter), i.e. f
in your example, MUST be in the indexed columns of the index, and the must be sufficiently selective, in order for an index to be used in the first place.
显然,谓词中使用的列(其中过滤器),即示例中的f,必须位于索引的索引列中,并且必须具有足够的选择性,以便首先使用索引。
There is also a secondary benefit in returning fewer columns from a SELECT
, as this will reduce any I/O overhead, especially if there is a slow network between the Database server and the app consuming the data - i.e. it is good practice to only ever return the columns you actually need, and to avoid using SELECT *
.
从SELECT返回较少的列还有一个第二个好处,因为这会减少任何I / O开销,特别是如果数据库服务器和使用数据的应用程序之间的网络速度很慢 - 也就是说,这是一个好习惯。返回您实际需要的列,并避免使用SELECT *。
Edit : In response to the OP's updated post:
编辑:回应OP的更新帖子:
With no indexes at all, both queries will do table scans. Given that Table B
has fewer columns than Table A
, the rows per page (density) will be higher on B
and so B
will be marginally quicker as SQL will need fetch fewer pages.
如果没有索引,则两个查询都将执行表扫描。鉴于表B的列数少于表A,每页的行数(密度)将在B上更高,因此B将稍微快一些,因为SQL需要获取更少的页面。
However, with indices as per below
但是,索引如下
- Index on
A(f) INCLUDE (b,c,d)
- A(f)指数包括(b,c,d)
- Index on
B(f) INCLUDE (b,c,d)
- 关于B(f)的索引包括(b,c,d)
The performance should be identical for the queries (assuming same data in both tables), given that SQL will hit the indexes which are now of similar column widths and row densities.
对于查询,性能应该是相同的(假设两个表中的数据相同),因为SQL将达到现在具有相似列宽和行密度的索引。
Edit
编辑
Some other plans:
其他一些计划:
- Index on
B(f)
with no other key orINCLUDE
columns, or with an incomplete set ofINCLUDE
columns (i.e. one or more ofb, c or d
are missing): - B(f)上的索引没有其他键或INCLUDE列,或者包含不完整的INCLUDE列(即缺少b,c或d中的一个或多个):
SQL Server will likely need to do a Key or RID Lookup as even if the index is used, there will be a need to "join" back to the table to retrieve the missing columns in the select clause. (The lookup type depends on whether the table has a clustered PK or not)
SQL Server可能需要进行密钥或RID查找,因为即使使用索引,也需要“连接”回表以检索select子句中缺少的列。 (查找类型取决于表是否具有聚簇PK)
- Straight non clustered index on
B(f,b,c,d)
- B上的直非聚簇索引(f,b,c,d)
This will still be very performant, as the index will be used and the table avoided, but won't be quite as good as the covering index, because the density of the index tree will be less due to the additional key columns in the index.
这仍然是非常高效的,因为将使用索引并避免使用表,但不会像覆盖索引那样好,因为索引树的密度将因索引中的其他键列而更少。
#2
4
Test it and see!
测试一下,看看!
There will be a performance difference, however 99% of the time you won't notice it - usually you won't even be able to detect it!
会有性能差异,但99%的时间你都不会注意到它 - 通常你甚至都无法检测到它!
You can't even guarantee that that the table with fewer columns will be quicker - if its bothering you then try it and see.
你甚至不能保证列数较少的表会更快 - 如果它困扰你,那么试试看看。
Technical rubbish: (from the perspective of Microsoft SQL Server)
技术垃圾:(从Microsoft SQL Server的角度来看)
With the assumption that in all other respects (indexes, row counts, the data contained in the 6 common columns etc...) the tables are identical, then the only real difference will be that the larger table is spread over more pages on disk / in memory.
假设在所有其他方面(索引,行数,包含在6个公共列中的数据等......)表是相同的,那么唯一真正的区别是较大的表分布在磁盘上的更多页面上/ 在记忆中。
SQL server only attempts to read the data it absolutely requires, however it will always load an entire page at a time (8 KB). Even with the exact same amount data is required as the output to the query, if that data is spread over more pages then more IO is required.
SQL服务器只尝试读取它绝对需要的数据,但它总是一次加载整个页面(8 KB)。即使具有完全相同的数量,也需要数据作为查询的输出,如果该数据分布在更多页面上,则需要更多的IO。
That said, SQL server is incredibly efficient with its data access, and so you are very unlikely to see a noticeable impact on performance except in extreme circumstances.
也就是说,SQL服务器的数据访问效率非常高,因此除极端情况外,您不太可能看到对性能的显着影响。
Besides, it is also likely that your query will be run against the index rather than the table anyway, and so with indexes exactly the same size the change is likely to be 0.
此外,您的查询也可能会针对索引而不是表格运行,因此对于索引完全相同的大小,更改可能为0。
#3
2
There will be no performance difference based on the column position. Now the construction of the table is a different story e.g. number of rows, indexes, number of columns etc.
根据列位置不会有性能差异。现在桌子的构造是一个不同的故事,例如行数,索引,列数等
The scenario you are talking about where you are comparing the position of the column in the two tables is like comparing apples to oranges almost, because there are so many different variables besides the column position.
您正在讨论的场景比较两个表中列的位置,就像几乎将苹果与橙子进行比较一样,因为除了列位置之外还有很多不同的变量。
#4
2
Unless you have a very wide column set difference with no index being used (thus a table scan) you should see little difference in performance. That being said, it is always useful/benificial to return as few columns as possible to satisfy your needs. The catch here is that greater benifit can be had by returning the columns you need rather than a second database fetch for other columns.
除非您在没有使用索引的情况下具有非常宽的列集差异(因此表扫描),否则您应该看到性能上的差异很小。话虽如此,为了满足您的需求,返回尽可能少的列总是有用/有益的。这里的问题是,通过返回所需的列而不是其他列的第二个数据库提取,可以获得更大的好处。
- Get what you need
- 得到你需要的
- avoid second database query on same table for same rows
- 避免对同一行在同一个表上进行第二次数据库查询
- use an index on the select column(s) (WHERE clause restricter)
- 在select列上使用索引(WHERE子句限制器)
- restrict columns if you do not need them to enhance data server memory efficiency/paging
- 如果您不需要它们来限制列以增强数据服务器内存效率/分页
#5
1
Depends on width of the table (Bytes per row), how many rows in the table, and whether there are indices on the columns used by the query. No definitive answer without that info. However, the more columns in the table, chances are it is wider. But the effect of a proper index is much more significant than the effect of the table size.
取决于表的宽度(每行的字节数),表中的行数,以及查询使用的列是否有索引。没有这些信息,没有明确的答案。但是,表中的列越多,它的可能性就越大。但是适当指数的影响比表格大小的影响更为显着。
#1
11
Does the number of columns affect query performance?
列数是否会影响查询性能?
Yes, because the main benefit of returning fewer columns in a SELECT
is that SQL might be able to avoid reading from the table / cluster, and instead, if it can retrieve all the selected
data from an index (either as indexed columns and / or included columns in the case of a covering index).
是的,因为在SELECT中返回较少列的主要好处是SQL可能能够避免从表/集群中读取,而是,如果它可以从索引中检索所有选定的数据(作为索引列和/或在覆盖索引的情况下包括列)。
Obviously, the columns used in the predicate (where filter), i.e. f
in your example, MUST be in the indexed columns of the index, and the must be sufficiently selective, in order for an index to be used in the first place.
显然,谓词中使用的列(其中过滤器),即示例中的f,必须位于索引的索引列中,并且必须具有足够的选择性,以便首先使用索引。
There is also a secondary benefit in returning fewer columns from a SELECT
, as this will reduce any I/O overhead, especially if there is a slow network between the Database server and the app consuming the data - i.e. it is good practice to only ever return the columns you actually need, and to avoid using SELECT *
.
从SELECT返回较少的列还有一个第二个好处,因为这会减少任何I / O开销,特别是如果数据库服务器和使用数据的应用程序之间的网络速度很慢 - 也就是说,这是一个好习惯。返回您实际需要的列,并避免使用SELECT *。
Edit : In response to the OP's updated post:
编辑:回应OP的更新帖子:
With no indexes at all, both queries will do table scans. Given that Table B
has fewer columns than Table A
, the rows per page (density) will be higher on B
and so B
will be marginally quicker as SQL will need fetch fewer pages.
如果没有索引,则两个查询都将执行表扫描。鉴于表B的列数少于表A,每页的行数(密度)将在B上更高,因此B将稍微快一些,因为SQL需要获取更少的页面。
However, with indices as per below
但是,索引如下
- Index on
A(f) INCLUDE (b,c,d)
- A(f)指数包括(b,c,d)
- Index on
B(f) INCLUDE (b,c,d)
- 关于B(f)的索引包括(b,c,d)
The performance should be identical for the queries (assuming same data in both tables), given that SQL will hit the indexes which are now of similar column widths and row densities.
对于查询,性能应该是相同的(假设两个表中的数据相同),因为SQL将达到现在具有相似列宽和行密度的索引。
Edit
编辑
Some other plans:
其他一些计划:
- Index on
B(f)
with no other key orINCLUDE
columns, or with an incomplete set ofINCLUDE
columns (i.e. one or more ofb, c or d
are missing): - B(f)上的索引没有其他键或INCLUDE列,或者包含不完整的INCLUDE列(即缺少b,c或d中的一个或多个):
SQL Server will likely need to do a Key or RID Lookup as even if the index is used, there will be a need to "join" back to the table to retrieve the missing columns in the select clause. (The lookup type depends on whether the table has a clustered PK or not)
SQL Server可能需要进行密钥或RID查找,因为即使使用索引,也需要“连接”回表以检索select子句中缺少的列。 (查找类型取决于表是否具有聚簇PK)
- Straight non clustered index on
B(f,b,c,d)
- B上的直非聚簇索引(f,b,c,d)
This will still be very performant, as the index will be used and the table avoided, but won't be quite as good as the covering index, because the density of the index tree will be less due to the additional key columns in the index.
这仍然是非常高效的,因为将使用索引并避免使用表,但不会像覆盖索引那样好,因为索引树的密度将因索引中的其他键列而更少。
#2
4
Test it and see!
测试一下,看看!
There will be a performance difference, however 99% of the time you won't notice it - usually you won't even be able to detect it!
会有性能差异,但99%的时间你都不会注意到它 - 通常你甚至都无法检测到它!
You can't even guarantee that that the table with fewer columns will be quicker - if its bothering you then try it and see.
你甚至不能保证列数较少的表会更快 - 如果它困扰你,那么试试看看。
Technical rubbish: (from the perspective of Microsoft SQL Server)
技术垃圾:(从Microsoft SQL Server的角度来看)
With the assumption that in all other respects (indexes, row counts, the data contained in the 6 common columns etc...) the tables are identical, then the only real difference will be that the larger table is spread over more pages on disk / in memory.
假设在所有其他方面(索引,行数,包含在6个公共列中的数据等......)表是相同的,那么唯一真正的区别是较大的表分布在磁盘上的更多页面上/ 在记忆中。
SQL server only attempts to read the data it absolutely requires, however it will always load an entire page at a time (8 KB). Even with the exact same amount data is required as the output to the query, if that data is spread over more pages then more IO is required.
SQL服务器只尝试读取它绝对需要的数据,但它总是一次加载整个页面(8 KB)。即使具有完全相同的数量,也需要数据作为查询的输出,如果该数据分布在更多页面上,则需要更多的IO。
That said, SQL server is incredibly efficient with its data access, and so you are very unlikely to see a noticeable impact on performance except in extreme circumstances.
也就是说,SQL服务器的数据访问效率非常高,因此除极端情况外,您不太可能看到对性能的显着影响。
Besides, it is also likely that your query will be run against the index rather than the table anyway, and so with indexes exactly the same size the change is likely to be 0.
此外,您的查询也可能会针对索引而不是表格运行,因此对于索引完全相同的大小,更改可能为0。
#3
2
There will be no performance difference based on the column position. Now the construction of the table is a different story e.g. number of rows, indexes, number of columns etc.
根据列位置不会有性能差异。现在桌子的构造是一个不同的故事,例如行数,索引,列数等
The scenario you are talking about where you are comparing the position of the column in the two tables is like comparing apples to oranges almost, because there are so many different variables besides the column position.
您正在讨论的场景比较两个表中列的位置,就像几乎将苹果与橙子进行比较一样,因为除了列位置之外还有很多不同的变量。
#4
2
Unless you have a very wide column set difference with no index being used (thus a table scan) you should see little difference in performance. That being said, it is always useful/benificial to return as few columns as possible to satisfy your needs. The catch here is that greater benifit can be had by returning the columns you need rather than a second database fetch for other columns.
除非您在没有使用索引的情况下具有非常宽的列集差异(因此表扫描),否则您应该看到性能上的差异很小。话虽如此,为了满足您的需求,返回尽可能少的列总是有用/有益的。这里的问题是,通过返回所需的列而不是其他列的第二个数据库提取,可以获得更大的好处。
- Get what you need
- 得到你需要的
- avoid second database query on same table for same rows
- 避免对同一行在同一个表上进行第二次数据库查询
- use an index on the select column(s) (WHERE clause restricter)
- 在select列上使用索引(WHERE子句限制器)
- restrict columns if you do not need them to enhance data server memory efficiency/paging
- 如果您不需要它们来限制列以增强数据服务器内存效率/分页
#5
1
Depends on width of the table (Bytes per row), how many rows in the table, and whether there are indices on the columns used by the query. No definitive answer without that info. However, the more columns in the table, chances are it is wider. But the effect of a proper index is much more significant than the effect of the table size.
取决于表的宽度(每行的字节数),表中的行数,以及查询使用的列是否有索引。没有这些信息,没有明确的答案。但是,表中的列越多,它的可能性就越大。但是适当指数的影响比表格大小的影响更为显着。