Iterators in C#, IEnumerable<T>, and IAsyncEnumerable<T>
Spencer Schneidenbach
—January 16, 2020
TL;DR – Iterators - the thing that allows you to use the yield
keyword in functions that return IEnumerable<T>
- do
magic under the hood and are significantly different from non-iterator code that returns IEnumerable<T>
.
In addition, up until recently iterators did not support async
/await
but IAsyncEnumerable<T>
has changed that - now you
can use yield
inside of your async code.
Let's talk about iterators in C# (e.g. any method that can use yield return
) - a concept that I find still confuses developers -
and then discuss the recently added IAsyncEnumerable<T>
and why it's useful.
Iterator basics
Iterators in C# - the ability to use yield
to return elements in a function that is declared as IEnumerable<T>
- have been
around for a while, but I still find that there are developers who don't know what they are or how to use them. Most commonly,
they know of the yield
keyword - but don't know the implications of using it.
The best way I've found to demonstrate the differences between code that uses yield
and code that doesn't is to talk about the
IL that's generated. I'm no IL expert by any means, but I can at least compare two sets of IL and figure out which one I think is more complicated.
Let's take this method for example:
public IEnumerable<string> GetStrings() {
return new[] {
"Spencer",
"Schneidenbach",
"Louie"
};
}
The IL that's generated is very straightforward - declare an array, set the elements of that array in said array, and return it:
.method public hidebysig
instance class [System.Private.CoreLib]System.Collections.Generic.IEnumerable`1<string> GetStrings () cil managed
{
.maxstack 4
.locals init (
[0] class [System.Private.CoreLib]System.Collections.Generic.IEnumerable`1<string>
)
IL_0000: nop
IL_0001: ldc.i4.3
IL_0002: newarr [System.Private.CoreLib]System.String
IL_0007: dup
IL_0008: ldc.i4.0
IL_0009: ldstr "Spencer"
IL_000e: stelem.ref
IL_000f: dup
IL_0010: ldc.i4.1
IL_0011: ldstr "Schneidenbach"
IL_0016: stelem.ref
IL_0017: dup
IL_0018: ldc.i4.2
IL_0019: ldstr "Louie"
IL_001e: stelem.ref
IL_001f: stloc.0
IL_0020: br.s IL_0022
IL_0022: ldloc.0
IL_0023: ret
}
Now, what about its iterator cousin? It's still pretty straightforward looking on the surface:
public IEnumerable<string> GetStrings() {
yield return "Spencer";
yield return "Schneidenbach";
yield return "Louie";
}
But under the hood
it looks a liiiiiitle different. That's because it generates a state machine under the hood to track which elements have been
returned from the method - for instance, if you call GetStrings().First()
, the state machine suspends after it yields its first element and doesn't
run the rest of the method until you request more elements after the first. You can find a good explanation of this in
Microsoft's documentation.
Iterators and async/await
Iterators are an important and useful abstraction over data streams - it's generally good to process data as you retrieve
it if that's possible. However, until recently iterators had one big problem: they did not support async
/await
natively.
Then, C# 8 came along and brought with it IAsyncEnumerable<T>
.
Previously, you had to write some pretty nasty code to get iterators to work in normally async code (please don't do this):
public IEnumerable<string> GetStrings()
{
var httpClient = new HttpClient();
var websites = new[] {
"https://google.com",
"https://microsoft.com",
"https://schneids.net"
}
foreach (var website in websites)
{
var requestTask = httpClient.GetAsync(website);
var request = requestTask.GetAwaiter().GetResult(); //bad
yield return request.Content.ReadAsStringAsync().Result; //WORSE
}
}
You could forego it and use async all the way down:
async Task<IEnumerable<string>> GetStrings()
{
var websites = new[] {
"https://schneids.net",
"https://google.com",
"https://microsoft.com"
};
var httpClient = new HttpClient();
var list = new List<string>();
foreach (var website in websites)
{
var resp = await httpClient.GetAsync(website);
list.Add(await resp.Content.ReadAsStringAsync());
}
return list;
}
The problem with this code is that it required you to build up a list in memory and return the data all at once, as opposed to as the data was returned to you.
Of course there were other options, like using the Reactive Extensions - a perfectly valid option and one I've used before. However, it's nicer to have async iterators in the language. Now, we can have the best of all worlds: code that is very simple to read and understand yet also very powerful:
async IAsyncEnumerable<string> GetWebsitesAsync()
{
var websites = new[] {
"https://schneids.net",
"https://google.com",
"https://microsoft.com"
};
foreach (var website in websites) {
var req = await HttpClient.GetAsync(website);
yield return await req.Content.ReadAsStringAsync();
}
}
Which can be consumed by doing await foreach
:
await foreach (var website in GetWebsitesAsync()) {
Console.WriteLine(website.Substring(0, 100));
}
Stuart Lang has a great post on IAsyncEnumerable<T>
which covers it differently from me -
different perspectives are always good :) Surprisingly, the official Microsoft docs don't seem to have a lot on IAsyncEnumerable<T>
yet,
but at least their async
/await
docs are good!