Sunday, November 22, 2015

C# Value Types, Stack and .NET Intermediate Language

Most basic built-in types in C#, such as integers, doubles and other numeric types, booleans, but notably except string, are so-called value types, as opposed to reference types, such as arrays and classes. The difference between the two is how they are stored in memory.

Value types are stored on the stack. So when we, for example, do an integer assignment like this:

int x = 18;

The value 18 is pushed to the stack. When this variable goes out of scope (like when the method where it is declared has finished executing), it is popped out of the stack and discarded. This is a very efficient mechanism, but it makes value types very short lived and hence less suitable for sharing between classes.

If we want to pass such a value to a different method, the value is pushed to the stack, picked up by this other method, which copies this value and loads the copy on the stack, performs operations on it, and when done, discards the copy from the stack. Then we are back in our original method, which may perform other actions on the original value, but when done, it discards the value from the stack.



Let's see if we can see this by examining this process in the Intermediate Language Disassembler (ildasm.exe), which can be found in the .NET Software Development Kit (SDK). On my computer it is located in

"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6 Tools\ildasm.exe"

Intermediate Language (IL) code is produced when we compile our source code. At run time this code is translated into native machine instructions, which are then executed by the processor.

So I let's see what Intermediate Language (IL) code is produced from this simple C# code:

public static void Main()
{
    int x = 18;
    int square = GetSquare(x);
}
 
private static int GetSquare(int x)
{
    return x * x;
}

We build this code and open the resulting dll or executable in ildasm (I also chose to show source code lines as comments). This is what our Main method looks like in IL:

.method public hidebysig static void  Main() cil managed
{
  // Code size       12 (0xc)
  .maxstack  1
  .locals init ([0] int32 x,
           [1] int32 square)
//000012:         {
  IL_0000:  nop
//000013:             int x = 18;
  IL_0001:  ldc.i4.s   18
  IL_0003:  stloc.0
//000014:             int square = GetSquare(x);
  IL_0004:  ldloc.0
  IL_0005:  call       int32 HappyCoding.ValueTypes::GetSquare(int32)
  IL_000a:  stloc.1
//000015:         }
  IL_000b:  ret// end of method ValueTypes::Main

It may be a bit difficult to read IL in the beginning, there is a good tutorial on that here: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm

The IL syntax highlighting is provided by this useful Visual Studio extension: IL Support

So this is what's happening here:

  1. The .maxstack  1 directive indicates that the maximum stack depth used in our code is 1, meaning there won't be more than one value on the stack at any time during the execution of our code.
  2. The .locals init directive declares local variables accessible through an index, so the variable x will be known in further code as variable 0, while square will be known as 1. The init keyword requests that the variables be initialized to a default value before the method executes.
  3. nop just means: no operation (do nothing)
  4. ldc.i4.s 18 pushes the value 18 as a 32-bit (4-byte) integer onto the stack. So ldc stands for load constant onto the stack (push). i4 stands for a 4 byte integer, also known as int or int32 in C#. If the value of the constant were less or equal to 8, then this command would use the value directly, like in: ldc.i4.7 
  5. stloc.0 pops the value from the stack into local variable 0 (which is the index of our variable x). stloc stands for store (pop) to local variable. So in order to assign a constant value to a local variable, we need two commands: push the constant value onto the stack and pop it from the stack into the local variable.
  6. Now we are ready to call our GetSquare method. We start by loading onto the stack the value of local variable 0 (which is x): ldloc.0 
  7. The the GetSquare function is called:  call int32 HappyCoding.ValueTypes::GetSquare(int32)
    (we'll look at the execution of that call a bit later)
  8. The return value of the function call is then popped from the stack into the local variable 1 (which is square): stloc.1
  9. Finally we return from our Main method, but without any value the return type is void: ret

Let us now see what happens in the GetSquare function:

.method private hidebysig static int32  GetSquare(int32 number) cil managed
{
  // Code size       9 (0x9)
  .maxstack  2
  .locals init ([0] int32 V_0)
//000018:         {
  IL_0000:  nop
//000019:             return number * number;
  IL_0001:  ldarg.0
  IL_0002:  ldarg.0
  IL_0003:  mul
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
//000020:         }
  IL_0007:  ldloc.0
  IL_0008:  ret// end of method ValueTypes::GetSquare
  1. We see the familiar directives that the max stack depth will be 2 and that there is one local variable V_0. But we do not create any local variable in code!? We just return the product. So it looks like the compiler does the creation of a local variable for us and calls it V_0 !
  2. By repeating ldarg.0 two times, the programs loads onto the stack the value of the first argument of our function twice. So now the stack contains two copies of the same value (which was passed to our function as first (and only) argument).
  3. Next the multiplication command mul is called which multiplies the two upper values on the stack, giving us the square of our argument. Internally the mul command pops the two values from the stack, multiplies them and pushes the result back on the stack. You can read more about it here.
  4. stloc.0 pops the result from the stack into the local variable 0 (remember this variable is created for us by the compiler)
  5. br.s IL_0007 stands for branch to target and transfers control to a target instruction, in our case to IL_0007 
  6. At this point ldloc.0 loads the value of our local variable 0 to the stack again
  7. And we return from our function: ret, with the return value already on the stack, to be picked up in the Main function.
To sum up, we saw that value types are stored and processed directly on the evaluation stack.


Reference types are allocated on the heap, which is a different area of memory. When we declare an array of 5 elements like this:

int[] arr = new int[5];

the space for the 5 integers is allocated on the heap. When our array goes out of scope, this memory is not discarded immediately. The C# garbage collection will eventually discard it, when it determines that the memory is no longer needed. Reference types involve greater overhead, but they have the advantage that they are accessible from other classes.

We shall look at the reference types in more detail in my following post.

No comments:

Post a Comment