When a developer lands in the games industry he has to change his state of mind about performances. In this industry we have to perform a lot of operations in less than 33 milliseconds (30 FPS, frames per second), possibly tuning the logic and the art assets to achieve 60 FPS on standalone (Windows/Linux/Mac) and consoles (Xbox One/PS4) and that means rendering the scene content, computing physics and game logic in no more than 16 milliseconds! Not really an easy task, that's why in our industry every CPU tick counts really a lot.
So, what about the foreach statement? Well, this one is really bad, killing hundreds CPU ticks just to allow the programmer to write less code! You think I'm exaggerating here? Let's have a look to some code to give definitive proof.
Let's open Visual Studio (originally tested in VS2008 Professional, then VS2010 Professional, then VS2015 Enterprise and tested again with VS2017 Enterprise with .Net 4.6.2, to produce the compiled code below) and let's create a simple C# console app, and in that let's write the following very simple code:
public class DemoRefType
{
public List<Object> intList = new List<Object>();
public void Costly()
{
Object a = 0;
foreach (int x in intList)
a = x;
}
public void Cheap()
{
Object a = 0;
for (int i = 0; i < intList.Count; i++)
a = intList[i];
}
}
That's an easy one, right? these two methods perform the same job, but one costs a lot in term of CPU ticks... let's see why. I use ILSpy (http://ilspy.net/) to look into the compiled code, so let's analyze the IL (intermediate language) I get after Visual Studio builds it (result unchanged over the years!).
Let's start with the Cheap method:
.method public hidebysig
instance void Cheap () cil managed
{
// Method begins at RVA 0x2140
// Code size 36 (0x24)
.maxstack 2
.locals init (
[0] int32 i
)
IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: br.s IL_0015
// loop start (head: IL_0015)
IL_0004: ldarg.0
IL_0005: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
IL_000a: ldloc.0
IL_000b: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1<object>::get_Item(int32)
IL_0010: pop
IL_0011: ldloc.0
IL_0012: ldc.i4.1
IL_0013: add
IL_0014: stloc.0
IL_0015: ldloc.0
IL_0016: ldarg.0
IL_0017: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
IL_001c: callvirt instance int32 class [mscorlib]System.Collections.Generic.List`1<object>::get_Count()
IL_0021: blt.s IL_0004
// end loop
IL_0023: ret
} // end of method DemoRefType::Cheap
So, nothing odd in the above, it's pretty much what I would expect: a simple loop and a straight move of reference value, nothing more.
Now let's have a look to what we get in IL from the Costly method:
.method public hidebysig
instance void Costly () cil managed
{
// Method begins at RVA 0x20ec
// Code size 53 (0x35)
.maxstack 1
.locals init (
[0] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>
)
IL_0000: ldarg.0
IL_0001: ldfld class [mscorlib]System.Collections.Generic.List`1<object> performanceDemo.DemoRefType::intList
IL_0006: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<object>::GetEnumerator()
IL_000b: stloc.0
.try
{
IL_000c: br.s IL_001b
// loop start (head: IL_001b)
IL_000e: ldloca.s 0
IL_0010: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>::get_Current()
IL_0015: unbox.any [mscorlib]System.Int32
IL_001a: pop
IL_001b: ldloca.s 0
IL_001d: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>::MoveNext()
IL_0022: brtrue.s IL_000e
// end loop
IL_0024: leave.s IL_0034
} // end .try
finally
{
IL_0026: ldloca.s 0
IL_0028: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<object>
IL_002e: callvirt instance void [mscorlib]System.IDisposable::Dispose()
IL_0033: endfinally
} // end handler
IL_0034: ret
} // end of method DemoRefType::Costly
Well, well, well... it's many lines longer and it contains some quite nasty logic. As we can see it allocates a generic enumerator (IL_0006) that gets disposed finally (IL_0028 to IL_002e), and that obviously is creating load on the GC (Garbage Collector). Is that it? Not really! We can also see (IL_0015) the nasty unbox operation, one of the most costly and slow in the framework! Please also note how the loop end is caught by the finally clause in case something happens (mostly an invalid casting), not really code we would write in the first place... and still we get it just using a foreach.
So, imagine to have a few of these in your game logic executing at every frame... obviously it's never simple code like in this example, so it will be way nastier than the result shown in this above.
We struggle already so much to keep our games above 30FPS while presenting beautiful artwork (really costly to render), and a lot of nice VFX (visual effects, definitely costly) and we all love to rely on the underlying physics engine to improve the overall gaming experience: all that costs quite a lot... so when it comes to the game logic we have to write, every clock cycle and CPU tick are so valuable... we cannot possibly waste any of them, so let's remember two rule of thumbs:
In the game industry we are all aiming at improving gamers' experiences, making it immersive as much as technically possible: gamers are quite demanding, so let's make sure that we always keep performance testing at the top of our coding practice, because losing even one frame in a second can be a failure factor from a market perspective.
Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.
Comments (0)