IEnumerable Is Not A List!

James

James Charlesworth

March 18, 20204 min read

IEnumerable Is Not A List!

Have a close look at the following C# function and see if you can spot what's wrong

public void LogLongAndShortNames(IEnumerable<string> names)
{
foreach (var name in names.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in names.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

It's simple right? We are taking in a list of names, looping through the long names and writing them to the console, then looping through the short names and writing them.

Except we aren't.

That names parameter is not a List<string> at all, it is an IEnumerable<string> and they are not the same thing.

From the Microsoft docs

List\<T> Class

Represents a strongly typed list of objects that can be accessed by index.

… and …

IEnumerable\<T> Interface

Exposes the enumerator, which supports a simple iteration over a collection of a specified type.

The important distinction here is that while a List<T> is an actual collection of objects stored somewhere in memory, IEnumerable<T> is simply something that exposes an enumerator. Nothing more. There is nothing about a class implementing IEnumerable<T> that says it must contain the same data each time you enumerate it. There is nothing that specifies the data is even held in memory. It just exposes the enumerator, which supports a simple iteration over a collection of a specified type.

Let's take another look at that function, but this time I'm going to turn ReSharper on and let it analyse the code for me.

public void LogLongAndShortNames(IEnumerable<string> names)
{
foreach (var name in _names_.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in _names_.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

Those squiggly lines are ReSharper warnings telling us about a Possible multiple enumeration of IEnumerable

Since the names parameter is specified as just something that exposes an enumerator, the function may not work entirely as expected. For example, what would happen if I passed in an instance of the following:

public class AlternatingEnumerable : IEnumerable<string>
{
private readonly List<string> _setA = new List<string> {"aaa", "bbb", "ccc"};
private readonly List<string> _setB = new List<string> { "aaaaaaaaaaaa", "bbbbbbbbbbbbbbb", "ccccccccccccccc" };
private bool flip = false;
public IEnumerator<string> GetEnumerator()
{
return (flip = !flip) == true ? _setA.GetEnumerator() : _setB.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}

Here AlternatingEnumerable is an implementation of IEnumerable<string> that returns a different set of strings each time you enumerate it. Here is the result...

LogLongAndShortNames(new [] { "aaaaaaaaaaa", "bbb" });
// outputs...
// aaaaaaaaaaa is a long name
// bbb is a short name
LogLongAndShortNames(new AlternatingEnumerable());
// outputs nothing!

Weird, I know, but entirely possible. And if I were writing that LogLongAndShortNames(...) function for another developer to call then you can bet money that one day somebody will pass in something weird and your function will not act the way you expected.

Now you might be thinking this AlternatingEnumerable is not realistic, but there are real world situations like this. What if your enumerable was reading from a file and the file were changed between each enumeration? What if it were reading from an un-buffered HTTP stream that can only be read once? You can't ever rely on the data being the same each time it is enumerated, or even being able to enumerate it more than once in the first place. You should write your code so that is not an issue.

So why do we love lists and arrays?

Attribute Substitution may be to blame for this. When we deal with interfaces in our code we often subconsciously replace the abstract interface with something more concrete for the purposes of holding all the logic in our minds. The following pattern is common in the .NET enterprise software world

public class MyService : IMyService
{ }
public class MyOtherService
{
public MyOtherService(IMyService myService)
{
// It is common, but incorrect, to think of myService as an instance of the concrete class MyService
}
}

Our minds don't deal so well with the idea that myService is an abstract behaviour and not an actual object backed by specific code we can read. An interface can have so many different implementations it's just too much to have to consider, we like to think of it as a real class with a known implementation.

So when it comes to IEnumerable<T> we equally like to think of it as something simple, like a list or an array, as opposed to what it really is - something abstract and vague. The problem is, it is not good practise to just accept a List<T> or an array as an input parameter to our function.

Accept the Most Generic Type

It is generally (and correctly) accepted that you should accept the most generic type and return the most specific type when writing function signatures. After all you wouldn't want to write your methods like this

public void Foo(List<string> strings, Guid[] guids)
{
// Bad signature
}

A much better method signature would be

public void Foo(IEnumerable<string> strings, IEnumerable<Guid> guids)
{
// Better signature
}

It's better because now we aren't restricting the two parameters to being a list and an array. Anything that implements IEnumerable<T> can be passed to our method and we have just made our code a hundred times more usable!

But just because a List<string> can be passed as a IEnumerable<string> does not mean every IEnumerable<string> is a list.

IEnumerable as a function

So hopefully I've now convinced you that you can't treat a parameter of type IEnumerable<T> as a list or an array - so what should you treat it as? Well the best answer is that you should treat it as a function. A function that can give you enumerators. This means you can free yourself from horrible 1990's style code and start writing pure functions that act on enumerables. For example take this method

public List<string> GetInterweavedReaults(List<string> sourceA, List<string> sourceB)
{
var results = new List<int>();
for(int i = 0; i < sourceA.Count; i++)
{
results.Add(sourceA[i]);
results.Add(sourceB[i]);
}
return results;
}

The method above takes two lists and returns a new list that contains alternating items from the source lists.

Sure it works, but it's long winded and horrible because we are working with lists. But look what happens if we change to IEnumerable<T> and treat the inputs as functions not lists

public IEnumerable<string> GetInterweavedReaults(IEnumerable<string> sourceA, IEnumerable<string> sourceB)
{
using (var enumeratorA = sourceA.GetEnumerator())
using (var enumeratorB = sourceA.GetEnumerator())
{
while(enumeratorA.MoveNext() && enumeratorB.MoveNext())
{
yield return enumeratorA.Current;
yield return enumeratorB.Current;
}
}
}

This code still does the same, but it treats the inputs as what they are - functions that return an enumerator.

Let's revisit our original function but with the names parameter renamed to getNames to reflect thinking of it as as function

// Parameter renamed to "getNames" to reflect its functional nature
public void LogLongAndShortNames(IEnumerable<string> getNames)
{
foreach (var name in getNames.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in getNames.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

Suddenly the multiple enumeration seems much less acceptable. After all you wouldn't call an actual Func<string[]> multiple times would you?

// Parameter replace with an actual function
public void LogLongAndShortNames(Func<string[]> getNames)
{
foreach (var name in getNames().Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in getNames().Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

This is much more apparent that calling getNames() twice is bad design, however it is surprisingly similar to our original snippet. If we were asked to refactor this version of the method we would probably cache the result of the function and end up with something like the below

public void LogLongAndShortNames(Func<string[]> getNames)
{
// Call the function only once!
var namesArray = getNames();
foreach (var name in namesArray.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in namesArray.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

Replacing IEnumerable<string> back in gives us the correct way to write the function at the top of the page

public void LogLongAndShortNames(IEnumerable<string> nameSource)
{
var namesArray = nameSource.ToArray();
foreach (var name in namesArray.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in namesArray.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}

Conclusion

So in conclusion I'd like to stress that yes, while you should write your methods to Return the most specific type, accept the most generic type, this doesn't mean just blindly reducing everything down to IEnumerable<T>. And remember to treat any IEnumerable<T> you see as a function that can return an enumerator, not a list of things.

Subscribe to our newsletter

Software architecture, design patterns, paradigms and general rants. Get notified when we publish new articles by joining our mailing list [privacy policy]

More Articles from CodeTrain.io