为什么添加不必要的ToList()会大大加快LINQ查询的速度?

时间:2022-09-20 13:58:09

Why does forcing materialization using ToList() make my query orders of magnitude faster when, if anything, it should do the exact opposite?

为什么使用ToList()强制物化会使我的查询速度更快,如果有的话,它应该做完全相反的事情?

1) Calling First() immediately

1)立即调用第一个()

    // "Context" is an Entity Framework DB-first model

    var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

    var User = query.First();

    //  ** The above takes 30+ seconds to run **

2) Calling First() after calling ToList():

2)调用ToList()后先调用First():

    var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

    var User = query.ToList().First();     // Added ToList() before First()

    // ** Now it takes < 1 second to run! **

Update and Resolution

After getting the generated SQL, the only difference is, as expected, the addition of TOP (1) in the first query. As Andyz Smith says in his answer below, the root cause is that the SQL Server optimizer, in this particular case, chooses a worse execution plan when TOP (1) is added. Thus the problem has nothing to do with LINQ (which did the right thing by adding TOP (1)) and everything to do with the idiosyncrasies of SQL Server.

在获得生成的SQL之后,惟一的区别是,如预期的那样,在第一个查询中添加TOP(1)。正如Andyz Smith在下面的回答中所说,根本的原因是,在这个特殊的例子中,当添加TOP(1)时,SQL服务器优化器会选择更糟糕的执行计划。因此,这个问题与LINQ没有任何关系(通过添加TOP(1))和所有与SQL Server的特性有关的事情。

3 个解决方案

#1


0  

So, the optimizer chooses a bad way to run the query.

因此,优化器选择了一种糟糕的方式来运行查询。

Since you can't add optimizer hints to the SQL to force the optimizer to choose a better plan I see two options.

由于无法向SQL添加优化器提示以迫使优化器选择更好的计划,因此我看到了两个选项。

  1. Add a covering index/indexed view on all the columns that are retrieved/included in the select Pretty ludicrous, but I think it will work, because that index will make it easy peasy for the optimizer to choose a better plan.

    在select中检索/包含的所有列上添加一个覆盖索引/索引视图,这非常荒唐,但是我认为它会起作用,因为该索引将使优化器更容易选择更好的计划。

  2. Always prematurely materialize queries that include First or Last or Take.  Dangerous because as the data gets larger the break even point between pulling all the data locally and doing the First()  and doing the query with Top on the server is going to change.

    总是过早地实现包含第一个或最后一个或最后一个的查询。因为当数据变得更大时,从本地拉取所有数据到执行第一个()和执行服务器顶部的查询之间的平衡点将会改变。

http://geekswithblogs.net/Martinez/archive/2013/01/30/why-sql-top-may-slow-down-your-query-and-how.aspx

http://geekswithblogs.net/Martinez/archive/2013/01/30/why-sql-top-may-slow-down-your-query-and-how.aspx

https://groups.google.com/forum/m/#!topic/microsoft.public.sqlserver.server/L2USxkyV1uw

https://groups.google.com/forum/m/ # ! / microsoft.public.sqlserver.server / L2USxkyV1uw话题

http://connect.microsoft.com/SQLServer/feedback/details/781990/top-1-is-not-considered-as-a-factor-for-query-optimization

http://connect.microsoft.com/SQLServer/feedback/details/781990/top-1-is-not-considered-as-a-factor-for-query-optimization

TOP slows down query

前减慢查询

Why does TOP or SET ROWCOUNT make my query so slow?

为什么TOP或SET ROWCOUNT让我的查询如此缓慢?

#2


11  

I can only think of one reason... To test it, can you please remove the Where clause and re-run the test? Comment here if the result is the first statement being faster, and i will explain why.

我只能想到一个原因……要测试它,请删除Where子句并重新运行测试?如果结果是第一个语句更快,请在这里进行注释,我将解释原因。

Edit
In the LINQ statement Where clause, you are using the .ToLower() method of the string. My guess is that LINQ does not have built in conversion to SQL for this method, so the resultant SQL is something line

在LINQ语句Where子句中编辑,您使用的是. tolower()方法的字符串。我的猜测是LINQ并没有为这个方法构建到SQL的转换中,所以生成的SQL是一些线

SELECT *
FROM Users

Now, we know that LINQ lazy loads, but it also knows that since it has not evaluated the WHERE clause, it needs to load the elements to do the comparison.

现在,我们知道LINQ延迟加载,但是它也知道,由于它没有计算WHERE子句,所以需要加载元素来进行比较。

Hypothesis
The first query is lazy loading EVERY element in the result set. It is then doing the .ToLower() comparison and returning the first result. This results in n requests to the server and a huge performance overhead. Cannot be sure without seeing the SQL Tracelog.

假设第一个查询是延迟加载结果集中的每个元素,然后执行. tolower()比较并返回第一个结果。这会导致对服务器的n个请求和巨大的性能开销。如果没有看到SQL Tracelog,就无法确定。

The Second statement calls ToList, which requests a batch SQL before doing the ToLower comparison, resulting in only one request to the server

第二个语句调用ToList,它在执行ToLower比较之前请求批处理SQL,结果只对服务器发出一个请求。

Alternative Hypothesis
If the profiler shows only one server execution, try executing the same query with the Top 1 clause and see if it takes as long. As per this post (Why is doing a top(1) on an indexed column in SQL Server slow?) the TOP clause can sometimes mess with the SQL server optimiser and stop it using the correct indices.

备选假设如果分析器只显示一个服务器执行,那么尝试使用Top 1子句执行相同的查询,看看是否需要同样长的时间。正如这篇文章(为什么在SQL Server中索引列上做top(1)会比较慢?)

Curiosity edit
try changing the LINQ to

好奇心编辑尝试改变LINQ到

var query = from x in Context.Users
            where x.Username.Equals(User.Identity.Name, StringComparison.OrdinalIgnoreCase)
            select x;

Credit to @Scott for finding the way to do case insensitive comparison in LINQ. Give it a go and see if it is faster.

感谢@Scott找到了在LINQ中进行不区分大小写比较的方法。试一试,看看是不是更快。

#3


3  

The SQL won't be the same as Linq is lazy loading. So your call to .ToList() will force .Net to evaluate the expression, then in memory select the first() item.

SQL不会与Linq是惰性加载一样。因此,您对. tolist()的调用将强制. net计算表达式,然后在内存中选择第一个()项。

Where as the other option should add top 1 into the SQL

另一个选项应该将top 1添加到SQL中吗

E.G.

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here
 var User = query.First();

and

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here!
 var list = query.ToList();
 var User = query.First();

As below, the first query should be faster! I would suggest doing a SQL profiler to see what's going on. The speed of the queries will depend on your data structure, number of records, indexes, etc.

如下所示,第一个查询应该更快!我建议做一个SQL分析器来看看发生了什么。查询的速度将取决于您的数据结构、记录数量、索引等。

The timing of your test will alter the results also. As a couple of people have mentioned in comments, the first time you hit EF it needs to initialise and load the metadata. so if you run these together, the first one should always be slow.

测试的时间也会改变结果。正如一些人在注释中提到的,您第一次点击EF时需要初始化并加载元数据。所以如果你把它们放在一起,第一个应该是慢的。

Here's some more info on EF performance considerations

这里有更多关于EF性能考虑的信息

notice the line:

通知行:

Model and mapping metadata used by the Entity Framework is loaded into a MetadataWorkspace. This metadata is cached globally and is available to other instances of ObjectContext in the same application domain.

实体框架使用的模型和映射元数据被加载到元数据工作区中。此元数据是全局缓存的,对同一应用程序域中的ObjectContext的其他实例可用。

&

&

Because an open connection to the database consumes a valuable resource, the Entity Framework opens and closes the database connection only as needed. You can also explicitly open the connection. For more information, see Managing Connections and Transactions in the Entity Framework.

由于对数据库的开放连接消耗宝贵的资源,因此实体框架只在需要时打开和关闭数据库连接。您还可以显式地打开连接。有关更多信息,请参见管理实体框架中的连接和事务。

#1


0  

So, the optimizer chooses a bad way to run the query.

因此,优化器选择了一种糟糕的方式来运行查询。

Since you can't add optimizer hints to the SQL to force the optimizer to choose a better plan I see two options.

由于无法向SQL添加优化器提示以迫使优化器选择更好的计划,因此我看到了两个选项。

  1. Add a covering index/indexed view on all the columns that are retrieved/included in the select Pretty ludicrous, but I think it will work, because that index will make it easy peasy for the optimizer to choose a better plan.

    在select中检索/包含的所有列上添加一个覆盖索引/索引视图,这非常荒唐,但是我认为它会起作用,因为该索引将使优化器更容易选择更好的计划。

  2. Always prematurely materialize queries that include First or Last or Take.  Dangerous because as the data gets larger the break even point between pulling all the data locally and doing the First()  and doing the query with Top on the server is going to change.

    总是过早地实现包含第一个或最后一个或最后一个的查询。因为当数据变得更大时,从本地拉取所有数据到执行第一个()和执行服务器顶部的查询之间的平衡点将会改变。

http://geekswithblogs.net/Martinez/archive/2013/01/30/why-sql-top-may-slow-down-your-query-and-how.aspx

http://geekswithblogs.net/Martinez/archive/2013/01/30/why-sql-top-may-slow-down-your-query-and-how.aspx

https://groups.google.com/forum/m/#!topic/microsoft.public.sqlserver.server/L2USxkyV1uw

https://groups.google.com/forum/m/ # ! / microsoft.public.sqlserver.server / L2USxkyV1uw话题

http://connect.microsoft.com/SQLServer/feedback/details/781990/top-1-is-not-considered-as-a-factor-for-query-optimization

http://connect.microsoft.com/SQLServer/feedback/details/781990/top-1-is-not-considered-as-a-factor-for-query-optimization

TOP slows down query

前减慢查询

Why does TOP or SET ROWCOUNT make my query so slow?

为什么TOP或SET ROWCOUNT让我的查询如此缓慢?

#2


11  

I can only think of one reason... To test it, can you please remove the Where clause and re-run the test? Comment here if the result is the first statement being faster, and i will explain why.

我只能想到一个原因……要测试它,请删除Where子句并重新运行测试?如果结果是第一个语句更快,请在这里进行注释,我将解释原因。

Edit
In the LINQ statement Where clause, you are using the .ToLower() method of the string. My guess is that LINQ does not have built in conversion to SQL for this method, so the resultant SQL is something line

在LINQ语句Where子句中编辑,您使用的是. tolower()方法的字符串。我的猜测是LINQ并没有为这个方法构建到SQL的转换中,所以生成的SQL是一些线

SELECT *
FROM Users

Now, we know that LINQ lazy loads, but it also knows that since it has not evaluated the WHERE clause, it needs to load the elements to do the comparison.

现在,我们知道LINQ延迟加载,但是它也知道,由于它没有计算WHERE子句,所以需要加载元素来进行比较。

Hypothesis
The first query is lazy loading EVERY element in the result set. It is then doing the .ToLower() comparison and returning the first result. This results in n requests to the server and a huge performance overhead. Cannot be sure without seeing the SQL Tracelog.

假设第一个查询是延迟加载结果集中的每个元素,然后执行. tolower()比较并返回第一个结果。这会导致对服务器的n个请求和巨大的性能开销。如果没有看到SQL Tracelog,就无法确定。

The Second statement calls ToList, which requests a batch SQL before doing the ToLower comparison, resulting in only one request to the server

第二个语句调用ToList,它在执行ToLower比较之前请求批处理SQL,结果只对服务器发出一个请求。

Alternative Hypothesis
If the profiler shows only one server execution, try executing the same query with the Top 1 clause and see if it takes as long. As per this post (Why is doing a top(1) on an indexed column in SQL Server slow?) the TOP clause can sometimes mess with the SQL server optimiser and stop it using the correct indices.

备选假设如果分析器只显示一个服务器执行,那么尝试使用Top 1子句执行相同的查询,看看是否需要同样长的时间。正如这篇文章(为什么在SQL Server中索引列上做top(1)会比较慢?)

Curiosity edit
try changing the LINQ to

好奇心编辑尝试改变LINQ到

var query = from x in Context.Users
            where x.Username.Equals(User.Identity.Name, StringComparison.OrdinalIgnoreCase)
            select x;

Credit to @Scott for finding the way to do case insensitive comparison in LINQ. Give it a go and see if it is faster.

感谢@Scott找到了在LINQ中进行不区分大小写比较的方法。试一试,看看是不是更快。

#3


3  

The SQL won't be the same as Linq is lazy loading. So your call to .ToList() will force .Net to evaluate the expression, then in memory select the first() item.

SQL不会与Linq是惰性加载一样。因此,您对. tolist()的调用将强制. net计算表达式,然后在内存中选择第一个()项。

Where as the other option should add top 1 into the SQL

另一个选项应该将top 1添加到SQL中吗

E.G.

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here
 var User = query.First();

and

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here!
 var list = query.ToList();
 var User = query.First();

As below, the first query should be faster! I would suggest doing a SQL profiler to see what's going on. The speed of the queries will depend on your data structure, number of records, indexes, etc.

如下所示,第一个查询应该更快!我建议做一个SQL分析器来看看发生了什么。查询的速度将取决于您的数据结构、记录数量、索引等。

The timing of your test will alter the results also. As a couple of people have mentioned in comments, the first time you hit EF it needs to initialise and load the metadata. so if you run these together, the first one should always be slow.

测试的时间也会改变结果。正如一些人在注释中提到的,您第一次点击EF时需要初始化并加载元数据。所以如果你把它们放在一起,第一个应该是慢的。

Here's some more info on EF performance considerations

这里有更多关于EF性能考虑的信息

notice the line:

通知行:

Model and mapping metadata used by the Entity Framework is loaded into a MetadataWorkspace. This metadata is cached globally and is available to other instances of ObjectContext in the same application domain.

实体框架使用的模型和映射元数据被加载到元数据工作区中。此元数据是全局缓存的,对同一应用程序域中的ObjectContext的其他实例可用。

&

&

Because an open connection to the database consumes a valuable resource, the Entity Framework opens and closes the database connection only as needed. You can also explicitly open the connection. For more information, see Managing Connections and Transactions in the Entity Framework.

由于对数据库的开放连接消耗宝贵的资源,因此实体框架只在需要时打开和关闭数据库连接。您还可以显式地打开连接。有关更多信息,请参见管理实体框架中的连接和事务。