临时表和SQL SELECT性能

时间:2022-07-25 23:40:50

Why does the use of temp tables with a SELECT statement improve the logical I/O count? Wouldn't it increase the amount of hits to a database instead of decreasing it. Is this because the 'problem' is broken down into sections? I'd like to know what's going on behind the scenes.

为什么使用带有SELECT语句的临时表可以改善逻辑I / O数量?它不会增加数据库的点击量而不是减少它。这是因为'问题'被分解成了几个部分吗?我想知道幕后发生了什么。

4 个解决方案

#1


3  

There's no general answer. It depends on how the temp table is being used.

没有一般的答案。这取决于临时表的使用方式。

The temp table may reduce IO by caching rows created after a complex filter/join that are used multiple times later in the batch. This way, the DB can avoid hitting the base tables multiple times when only a subset of the records are needed.

临时表可以通过缓存在批处理中多次使用的复杂过滤器/连接之后创建的行来减少IO。这样,当只需要一部分记录时,DB可以避免多次命中基表。

The temp table may increase IO by storing records that are never used later in the query, or by taking up a lot of space in the engine's cache that could have been better used by other data.

临时表可以通过存储以后在查询中从未使用的记录来增加IO,或者通过占用引擎缓存中可能被其他数据更好地使用的大量空间来增加IO。

Creating a temp table to use all of its contents once is slower than including the temp's query in the main query because the query optimizer can't see past the temp table and it forces a (probably) unnecessary spool of the data instead of allowing it to stream from the source tables.

创建临时表以使用其所有内容一次比在主查询中包含temp的查询要慢,因为查询优化器无法看到临时表,并且它强制(可能)不必要的数据假脱机而不是允许它从源表流式传输。

#2


1  

I'm going to assume by temp tables you mean a sub-select in a WHERE clause. (This is referred to as a semijoin operation and you can usually see that in the text execution plan for your query.)

我将假设临时表是指WHERE子句中的子选择。 (这被称为半连接操作,您通常可以在查询的文本执行计划中看到它。)

When the query optimizer encounter a sub-select/temp table, it makes some assumptions about what to do with that data. Essentially, the optimizer will create an execution plan that performs a join on the sub-select's result set, reducing the number of rows that need to be read from the other tables. Since there are less rows, the query engine is able to read less pages from disk/memory and reduce the amount of I/O required.

当查询优化器遇到子选择/临时表时,它会对如何处理该数据做出一些假设。实质上,优化器将创建一个执行计划,该计划在子选择的结果集上执行连接,从而减少需要从其他表读取的行数。由于行数较少,查询引擎能够从磁盘/内存中读取较少的页面并减少所需的I / O量。

#3


0  

AFAIK, at least with mysql, tmp tables are kept in RAM, making SELECTs much faster than anything that hits the HD

AFAIK,至少使用mysql,tmp表保存在RAM中,使SELECT比任何打击HD的速度快得多

#4


0  

There are a class of problems where building the result in a collection structure on the database side is much preferable to returning the result's parts to the client, roundtripping for each part.

有一类问题,在数据库端的集合结构中构建结果比将结果的部分返回给客户端更为可取,每个部分都需要进行往返。

For example: arbitrary depth recursive relationships (boss of)

例如:任意深度递归关系(老板)

There's another class of query problems where the data is not and will not be indexed in a manner that makes the query run efficiently. Pulling results into a collection structure, which can be indexed in a custom way, will reduce the logical IO for these queries.

还有另一类查询问题,其中数据不会也不会以使查询高效运行的方式编制索引。将结果拉入可以自定义方式索引的集合结构将减少这些查询的逻辑IO。

#1


3  

There's no general answer. It depends on how the temp table is being used.

没有一般的答案。这取决于临时表的使用方式。

The temp table may reduce IO by caching rows created after a complex filter/join that are used multiple times later in the batch. This way, the DB can avoid hitting the base tables multiple times when only a subset of the records are needed.

临时表可以通过缓存在批处理中多次使用的复杂过滤器/连接之后创建的行来减少IO。这样,当只需要一部分记录时,DB可以避免多次命中基表。

The temp table may increase IO by storing records that are never used later in the query, or by taking up a lot of space in the engine's cache that could have been better used by other data.

临时表可以通过存储以后在查询中从未使用的记录来增加IO,或者通过占用引擎缓存中可能被其他数据更好地使用的大量空间来增加IO。

Creating a temp table to use all of its contents once is slower than including the temp's query in the main query because the query optimizer can't see past the temp table and it forces a (probably) unnecessary spool of the data instead of allowing it to stream from the source tables.

创建临时表以使用其所有内容一次比在主查询中包含temp的查询要慢,因为查询优化器无法看到临时表,并且它强制(可能)不必要的数据假脱机而不是允许它从源表流式传输。

#2


1  

I'm going to assume by temp tables you mean a sub-select in a WHERE clause. (This is referred to as a semijoin operation and you can usually see that in the text execution plan for your query.)

我将假设临时表是指WHERE子句中的子选择。 (这被称为半连接操作,您通常可以在查询的文本执行计划中看到它。)

When the query optimizer encounter a sub-select/temp table, it makes some assumptions about what to do with that data. Essentially, the optimizer will create an execution plan that performs a join on the sub-select's result set, reducing the number of rows that need to be read from the other tables. Since there are less rows, the query engine is able to read less pages from disk/memory and reduce the amount of I/O required.

当查询优化器遇到子选择/临时表时,它会对如何处理该数据做出一些假设。实质上,优化器将创建一个执行计划,该计划在子选择的结果集上执行连接,从而减少需要从其他表读取的行数。由于行数较少,查询引擎能够从磁盘/内存中读取较少的页面并减少所需的I / O量。

#3


0  

AFAIK, at least with mysql, tmp tables are kept in RAM, making SELECTs much faster than anything that hits the HD

AFAIK,至少使用mysql,tmp表保存在RAM中,使SELECT比任何打击HD的速度快得多

#4


0  

There are a class of problems where building the result in a collection structure on the database side is much preferable to returning the result's parts to the client, roundtripping for each part.

有一类问题,在数据库端的集合结构中构建结果比将结果的部分返回给客户端更为可取,每个部分都需要进行往返。

For example: arbitrary depth recursive relationships (boss of)

例如:任意深度递归关系(老板)

There's another class of query problems where the data is not and will not be indexed in a manner that makes the query run efficiently. Pulling results into a collection structure, which can be indexed in a custom way, will reduce the logical IO for these queries.

还有另一类查询问题,其中数据不会也不会以使查询高效运行的方式编制索引。将结果拉入可以自定义方式索引的集合结构将减少这些查询的逻辑IO。