Did you know that C# foreach statement is your enemy in games development?

Giuseppe 'Pino' De FrancescoPrincipal Solutions Architect
CERTIFIED EXPERT
40+ years in software and security
Published:
Updated:
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.

When a developer lands in the games industry he has to change his state of mind about performances. In this industry we have to perform a lot of operations in less than 33 milliseconds (30 FPS, frames per second), possibly tuning the logic and the art assets to achieve 60 FPS on standalone (Windows/Linux/Mac) and consoles (Xbox One/PS4) and that means rendering the scene content, computing physics and game logic in no more than 16 milliseconds! Not really an easy task, that's why in our industry every CPU tick counts really a lot.


So, what about the foreach statement? Well, this one is really bad, killing hundreds CPU ticks just to allow the programmer to write less code!  You think I'm exaggerating here? Let's have a look to some code to give definitive proof.


Let's open Visual Studio (originally tested in VS2008 Professional, then VS2010 Professional, then VS2015 Enterprise and tested again with VS2017 Enterprise with .Net 4.6.2, to produce the compiled code below) and let's create a simple C# console app, and in that let's write the following very simple code:

 

    public class DemoRefType
    {
        public List<Object> intList = new List<Object>();
        public void Costly()
        {
            Object a = 0;
            foreach (int x in intList)
                a = x;
        }
        public void Cheap()
        {
            Object a = 0;
            for (int i = 0; i < intList.Count; i++)
                a = intList[i];
        }
    }


That's an easy one, right? these two methods perform the same job, but one costs a lot in term of CPU ticks... let's see why. I use ILSpy (http://ilspy.net/) to look into the compiled code, so let's analyze the IL (intermediate language) I get after Visual Studio builds it (result unchanged over the years!).


Let's start with the Cheap method:

.method public hidebysig 
    instance void Cheap () cil managed 
{
    // Method begins at RVA 0x2140
    // Code size 36 (0x24)
    .maxstack 2
    .locals init (
        [0] int32 i
    )

    IL_0000: ldc.i4.0
    IL_0001: stloc.0
    IL_0002: br.s IL_0015
    // loop start (head: IL_0015)
        IL_0004: ldarg.0
        IL_0005: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
        IL_000a: ldloc.0
        IL_000b: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1<object>::get_Item(int32)
        IL_0010: pop
        IL_0011: ldloc.0
        IL_0012: ldc.i4.1
        IL_0013: add
        IL_0014: stloc.0
        IL_0015: ldloc.0
        IL_0016: ldarg.0
        IL_0017: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
        IL_001c: callvirt instance int32 class [mscorlib]System.Collections.Generic.List`1<object>::get_Count()
        IL_0021: blt.s IL_0004
    // end loop

    IL_0023: ret
} // end of method DemoRefType::Cheap

So, nothing odd in the above, it's pretty much what I would expect: a simple loop and a straight move of reference value, nothing more.


Now let's have a look to what we get in IL from the Costly method:

.method public hidebysig 
    instance void Costly () cil managed 
{
    // Method begins at RVA 0x20ec
    // Code size 53 (0x35)
    .maxstack 1
    .locals init (
        [0] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>
    )

    IL_0000: ldarg.0
    IL_0001: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
    IL_0006: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<object>::GetEnumerator()
    IL_000b: stloc.0
    .try
    {
        IL_000c: br.s IL_001b
        // loop start (head: IL_001b)
            IL_000e: ldloca.s 0
            IL_0010: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>::get_Current()
            IL_0015: unbox.any [mscorlib]System.Int32
            IL_001a: pop

            IL_001b: ldloca.s 0
            IL_001d: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>::MoveNext()
            IL_0022: brtrue.s IL_000e
        // end loop

        IL_0024: leave.s IL_0034
    } // end .try
    finally
    {
        IL_0026: ldloca.s 0
        IL_0028: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>
        IL_002e: callvirt instance void [mscorlib]System.IDisposable::Dispose()
        IL_0033: endfinally
    } // end handler

    IL_0034: ret
} // end of method DemoRefType::Costly


Well, well, well... it's many lines longer and it contains some quite nasty logic. As we can see it allocates a generic enumerator (IL_0006) that gets  disposed finally (IL_0028 to IL_002e), and that obviously is creating load on the GC (Garbage Collector). Is that it? Not really! We can also see (IL_0015) the nasty unbox operation, one of the most costly and slow in the framework! Please also note how the loop end is caught by the finally clause in case something happens (mostly an invalid casting), not really code we would write in the first place... and still we get it just using a foreach.


So, imagine to have a few of these in your game logic executing at every frame... obviously it's never simple code like in this example, so it will be way nastier than the result shown in this above.


We struggle already so much to keep our games above 30FPS while presenting beautiful artwork (really costly to render), and a lot of nice VFX (visual effects, definitely costly) and we all love to rely on the underlying physics engine to improve the overall gaming experience: all that costs quite a lot... so when it comes to the game logic we have to write, every clock cycle and CPU tick are so valuable... we cannot possibly waste any of them, so let's remember two rule of thumbs:

 

  • Language helpers that make it easier to code come with a performance cost
  • Always verify the efficiency of your code habits looking into the generated IL code

In the game industry we are all aiming at improving gamers' experiences, making it immersive as much as technically possible: gamers are quite demanding, so let's make sure that we always keep performance testing at the top of our coding practice, because losing even one frame in a second can be a failure factor from a market perspective.

 

© Copyright Giuseppe "Pino" De Francesco - 2016
3
2,220 Views
Giuseppe 'Pino' De FrancescoPrincipal Solutions Architect
CERTIFIED EXPERT
40+ years in software and security

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.