LINQ性能 - 延迟v / s立即执行

时间:2022-08-23 11:37:12

I have seen that sometimes the performance of LINQ to Objects queries can be improved significantly if they forced to execute immediately by using .ToArray(), but can't quite understand why. For example, in the sample below, the execution of the function Deferred() is much slower than the function Immediate(), and grows exponentially with the value of limit (perhaps it was exponential in both functions, but execution time with Immediate() was too little for me to say definitively).

我已经看到,如果使用.ToArray()强制立即执行LINQ to Objects查询的性能,有时可以显着提高它们的性能,但是不能理解为什么。例如,在下面的示例中,函数Deferred()的执行比函数Immediate()慢得多,并且随着limit的值呈指数增长(也许它在两个函数中都是指数,但执行时间与Immediate()对我来说太明确无误地说。

public void Deferred()
{
    var all = Range(limit);
    var even = from e in EvenRange(limit) where all.Contains(e) select e;
    var odd = from o in OddRange(limit) where !even.Contains(o) select o;

    var query = from q in odd select q;

    foreach(var i in query) { var j = i+1; }
}

public void Immediate()
{
    var all = Range(limit);
    var even = (from e in EvenRange(limit) where all.Contains(e) select e)
        .ToArray();
    var odd = (from o in OddRange(limit) where !even.Contains(o) select o)
        .ToArray();

    var query = (from q in odd select q).ToArray();

    foreach(var i in query) { var j = i+1; }
}

public static IEnumerable<int> OddRange(int stop)
{
    for (int i = 1; i < stop; i+=2) yield return i;
}

public static IEnumerable<int> EvenRange(int stop)
{
    for (int i = 2; i < stop; i+=2) yield return i;
}

public static IEnumerable<int> Range(int stop)
{
    for (int i = 0; i < stop; ++i) yield return i;
}

What causes this slowdown? Is it just the extra code that the compiler generates and has to run every time I access the enumerable, or does Deferred() also go into some extra loops, changing the "Big O" complexity of the query?

是什么导致这种放缓?它只是编译器生成的额外代码,每次访问可枚举时都必须运行,或者Deferred()是否也会进入一些额外的循环,从而改变查询的“大O”复杂性?

4 个解决方案

#1


As far as I know IEnumerable doesn't cache the result, so your clause involving even.Contains (which must check all elements of even) forces a complete re-enumeration of even, thus showing the performance behavior you've noticed. Transforming even in an array or in a List should show you good performance when enumerating odd. Other .NET languages (such as F#) include methods to cache the results of an enumeration (if you're interested, you may look at F#'s Seq.cache method)

据我所知,IEnumerable不会缓存结果,所以你的子句涉及even.Contains(必须检查所有偶数元素)强制完全重新枚举偶数,从而显示你注意到的性能行为。即使在数组或列表中进行转换也应该在枚举奇数时显示出良好的性能。其他.NET语言(如F#)包含缓存枚举结果的方法(如果您有兴趣,可以查看F#的Seq.cache方法)

#2


As emaster70 explained, the Deferred version recalcuates the even collection for every call to even.Contains. That makes it degrade very quickly for higher values of limit.

正如emaster70所解释的那样,Deferred版本会为每次调用even.Contains重新计算均匀集合。这使得它对于更高的极限值非常快地降级。

The Immediate version is better, and if you also make the all collection immediate, like this:

立即版本更好,如果你也立即使所有集合,像这样:

var all = Range(limit).ToArray();

it gets about three times faster.

它的速度提高了大约三倍。

However, the Contains method for an array is an O(n) operation, so that doesn't scale well either. If the condition is equality, you should use a join instead of looking up each value using Contains.

但是,数组的Contains方法是O(n)操作,因此也不能很好地扩展。如果条件相等,则应使用连接而不是使用Contains查找每个值。

If you need to make the lookup, you should use a HashSet instead of an array, as the Contains method for that is an O(1) operation.

如果需要进行查找,则应使用HashSet而不是数组,因为Contains方法是O(1)操作。

For a limit of 10000, this is 100 times faster than the Immediate version:

对于10000的限制,这比立即版本快100倍:

public void Joined() {
    var all = Range(limit);
    var even = from e in EvenRange(limit) join a in all on e equals a select e;
    var evenSet = new HashSet<int>(even);
    var odd = from o in OddRange(limit) where !evenSet.Contains(o) select o;

    var query = from q in odd select q;

    foreach (var i in query) { var j = i + 1; }
}

#3


In addition to what emaster70 said, when reading values from an array an entire cache line is fetched into the system cache meaning that when looking into the arrays it goes much faster than if you had to fetch from main memory.

除了emaster70所说的,当从数组中读取值时,整个高速缓存行被提取到系统高速缓存中,这意味着在查看数组时它比从主存储器获取要快得多。

#4


In this code

在这段代码中

var all = Range(limit);    
var even = from e in EvenRange(limit) where all.Contains(e) select e;    
var odd = from o in OddRange(limit) where !even.Contains(o) select o;

since IEnumerables constructed from iterator blocks or LINQ queries are lazy, computing each element of 'odd' requires recomputing the entire 'even' sequence. As a result, the big-O of this code is awful. It may be instructive to use a very small 'limit' and put a Console.WriteLine() inside the loop in Range(), and watch the behavior.

因为从迭代器块或LINQ查询构造的IEnumerables是惰性的,所以计算'odd'的每个元素需要重新计算整个'even'序列。结果,这段代码的大O很糟糕。使用非常小的“限制”并将一个Console.WriteLine()放在Range()中的循环中并观察行为可能是有益的。

#1


As far as I know IEnumerable doesn't cache the result, so your clause involving even.Contains (which must check all elements of even) forces a complete re-enumeration of even, thus showing the performance behavior you've noticed. Transforming even in an array or in a List should show you good performance when enumerating odd. Other .NET languages (such as F#) include methods to cache the results of an enumeration (if you're interested, you may look at F#'s Seq.cache method)

据我所知,IEnumerable不会缓存结果,所以你的子句涉及even.Contains(必须检查所有偶数元素)强制完全重新枚举偶数,从而显示你注意到的性能行为。即使在数组或列表中进行转换也应该在枚举奇数时显示出良好的性能。其他.NET语言(如F#)包含缓存枚举结果的方法(如果您有兴趣,可以查看F#的Seq.cache方法)

#2


As emaster70 explained, the Deferred version recalcuates the even collection for every call to even.Contains. That makes it degrade very quickly for higher values of limit.

正如emaster70所解释的那样,Deferred版本会为每次调用even.Contains重新计算均匀集合。这使得它对于更高的极限值非常快地降级。

The Immediate version is better, and if you also make the all collection immediate, like this:

立即版本更好,如果你也立即使所有集合,像这样:

var all = Range(limit).ToArray();

it gets about three times faster.

它的速度提高了大约三倍。

However, the Contains method for an array is an O(n) operation, so that doesn't scale well either. If the condition is equality, you should use a join instead of looking up each value using Contains.

但是,数组的Contains方法是O(n)操作,因此也不能很好地扩展。如果条件相等,则应使用连接而不是使用Contains查找每个值。

If you need to make the lookup, you should use a HashSet instead of an array, as the Contains method for that is an O(1) operation.

如果需要进行查找,则应使用HashSet而不是数组,因为Contains方法是O(1)操作。

For a limit of 10000, this is 100 times faster than the Immediate version:

对于10000的限制,这比立即版本快100倍:

public void Joined() {
    var all = Range(limit);
    var even = from e in EvenRange(limit) join a in all on e equals a select e;
    var evenSet = new HashSet<int>(even);
    var odd = from o in OddRange(limit) where !evenSet.Contains(o) select o;

    var query = from q in odd select q;

    foreach (var i in query) { var j = i + 1; }
}

#3


In addition to what emaster70 said, when reading values from an array an entire cache line is fetched into the system cache meaning that when looking into the arrays it goes much faster than if you had to fetch from main memory.

除了emaster70所说的,当从数组中读取值时,整个高速缓存行被提取到系统高速缓存中,这意味着在查看数组时它比从主存储器获取要快得多。

#4


In this code

在这段代码中

var all = Range(limit);    
var even = from e in EvenRange(limit) where all.Contains(e) select e;    
var odd = from o in OddRange(limit) where !even.Contains(o) select o;

since IEnumerables constructed from iterator blocks or LINQ queries are lazy, computing each element of 'odd' requires recomputing the entire 'even' sequence. As a result, the big-O of this code is awful. It may be instructive to use a very small 'limit' and put a Console.WriteLine() inside the loop in Range(), and watch the behavior.

因为从迭代器块或LINQ查询构造的IEnumerables是惰性的,所以计算'odd'的每个元素需要重新计算整个'even'序列。结果,这段代码的大O很糟糕。使用非常小的“限制”并将一个Console.WriteLine()放在Range()中的循环中并观察行为可能是有益的。