Under the hood – Linq’s Where*Iterators
Many of the LINQ extension methods have ‘optimisations’ which essentially violate the Liskov Substution Principal and peek
for stronger types on the collections passed as parameters, in order to apply performance improvements. An example is this optimisation is the WhereListIterator and WhereArrayIterator classes.
At first glance, chaining Where
clauses in LINQ seems like a bad idea:
// Approach 1 : Chained Wheres
var result = Foos
.Where(f => predicate1(f))
.Where(f => predicate2(f))
...
In a naive implementation (where each Where filter needed to be executed to completion, in sequence), the expectation would be a complexity of
O(N) + O(N-x)
total iterations, where x is the number of elements eliminated by predicate1.
Seemingly a more optimal O(N)
approach is to apply both predicates would be in a single pass:
// Approach 2 : Predicates combined into a single pass
var result = Foos
.Where(f => predicate1(f) && predicate2(f))
However, it turns out that this fear is unfounded: there are specialised branches in IEnumerable.Where
which check the source collection for the presence of an existing Iterator. If the type is an iterator, the second predicate can simply chained to the source’s iterator, rather than starting a new iterator.
From the source:
Note that both approaches will still benefit from short circuit evaluation of the predicates. In either case, if predicate1(f) is false for an element, then evaluation of predicate2(f) for that element will be skipped.
This avoids the need for a second iterator – i.e. chaining multiple .Where
expressions will require just the one iteration.
This optimisation is of considerable importance in the use of conditional parameters to dynamically compose an IEnumerable or IQueryable, e.g. patterns using an approach like:
var query = Foos.Where(f => f.IsActive); // Always applied
if (parameter1.HasValue)
{
query = query.Where(f => f.SomeField == parameter1.Value); // Conditionally applied
}
... Other conditional parameters
var results = await query.ToListAsync();
Footnote
The performance of the approaches in Option1 and Option2 are still not identical, as there are clearly 2 lambdas created for predicate1 and predicate2, so when executed in memory, there’s still* the potential for additional function call overhead on each loop when executing the predicates (i.e. combining, rather than chaining the lambdas is still more performant in memory, although the IQueryable / Expression tree overloads will likely map to equivalent SQL)
* The function call overhead could be eliminated if the lambdas were inlined. However, it’s unlikely as to whether lambdas will be inlined by the JITter.