Sunday, November 22, 2015

C# Value Types, Stack and .NET Intermediate Language

Most basic built-in types in C#, such as integers, doubles and other numeric types, booleans, but notably except string, are so-called value types, as opposed to reference types, such as arrays and classes. The difference between the two is how they are stored in memory.

Value types are stored on the stack. So when we, for example, do an integer assignment like this:

int x = 18;

The value 18 is pushed to the stack. When this variable goes out of scope (like when the method where it is declared has finished executing), it is popped out of the stack and discarded. This is a very efficient mechanism, but it makes value types very short lived and hence less suitable for sharing between classes.

If we want to pass such a value to a different method, the value is pushed to the stack, picked up by this other method, which copies this value and loads the copy on the stack, performs operations on it, and when done, discards the copy from the stack. Then we are back in our original method, which may perform other actions on the original value, but when done, it discards the value from the stack.



Let's see if we can see this by examining this process in the Intermediate Language Disassembler (ildasm.exe), which can be found in the .NET Software Development Kit (SDK). On my computer it is located in

"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6 Tools\ildasm.exe"

Intermediate Language (IL) code is produced when we compile our source code. At run time this code is translated into native machine instructions, which are then executed by the processor.

So I let's see what Intermediate Language (IL) code is produced from this simple C# code:

public static void Main()
{
    int x = 18;
    int square = GetSquare(x);
}
 
private static int GetSquare(int x)
{
    return x * x;
}

We build this code and open the resulting dll or executable in ildasm (I also chose to show source code lines as comments). This is what our Main method looks like in IL:

.method public hidebysig static void  Main() cil managed
{
  // Code size       12 (0xc)
  .maxstack  1
  .locals init ([0] int32 x,
           [1] int32 square)
//000012:         {
  IL_0000:  nop
//000013:             int x = 18;
  IL_0001:  ldc.i4.s   18
  IL_0003:  stloc.0
//000014:             int square = GetSquare(x);
  IL_0004:  ldloc.0
  IL_0005:  call       int32 HappyCoding.ValueTypes::GetSquare(int32)
  IL_000a:  stloc.1
//000015:         }
  IL_000b:  ret// end of method ValueTypes::Main

It may be a bit difficult to read IL in the beginning, there is a good tutorial on that here: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm

The IL syntax highlighting is provided by this useful Visual Studio extension: IL Support

So this is what's happening here:

  1. The .maxstack  1 directive indicates that the maximum stack depth used in our code is 1, meaning there won't be more than one value on the stack at any time during the execution of our code.
  2. The .locals init directive declares local variables accessible through an index, so the variable x will be known in further code as variable 0, while square will be known as 1. The init keyword requests that the variables be initialized to a default value before the method executes.
  3. nop just means: no operation (do nothing)
  4. ldc.i4.s 18 pushes the value 18 as a 32-bit (4-byte) integer onto the stack. So ldc stands for load constant onto the stack (push). i4 stands for a 4 byte integer, also known as int or int32 in C#. If the value of the constant were less or equal to 8, then this command would use the value directly, like in: ldc.i4.7 
  5. stloc.0 pops the value from the stack into local variable 0 (which is the index of our variable x). stloc stands for store (pop) to local variable. So in order to assign a constant value to a local variable, we need two commands: push the constant value onto the stack and pop it from the stack into the local variable.
  6. Now we are ready to call our GetSquare method. We start by loading onto the stack the value of local variable 0 (which is x): ldloc.0 
  7. The the GetSquare function is called:  call int32 HappyCoding.ValueTypes::GetSquare(int32)
    (we'll look at the execution of that call a bit later)
  8. The return value of the function call is then popped from the stack into the local variable 1 (which is square): stloc.1
  9. Finally we return from our Main method, but without any value the return type is void: ret

Let us now see what happens in the GetSquare function:

.method private hidebysig static int32  GetSquare(int32 number) cil managed
{
  // Code size       9 (0x9)
  .maxstack  2
  .locals init ([0] int32 V_0)
//000018:         {
  IL_0000:  nop
//000019:             return number * number;
  IL_0001:  ldarg.0
  IL_0002:  ldarg.0
  IL_0003:  mul
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
//000020:         }
  IL_0007:  ldloc.0
  IL_0008:  ret// end of method ValueTypes::GetSquare
  1. We see the familiar directives that the max stack depth will be 2 and that there is one local variable V_0. But we do not create any local variable in code!? We just return the product. So it looks like the compiler does the creation of a local variable for us and calls it V_0 !
  2. By repeating ldarg.0 two times, the programs loads onto the stack the value of the first argument of our function twice. So now the stack contains two copies of the same value (which was passed to our function as first (and only) argument).
  3. Next the multiplication command mul is called which multiplies the two upper values on the stack, giving us the square of our argument. Internally the mul command pops the two values from the stack, multiplies them and pushes the result back on the stack. You can read more about it here.
  4. stloc.0 pops the result from the stack into the local variable 0 (remember this variable is created for us by the compiler)
  5. br.s IL_0007 stands for branch to target and transfers control to a target instruction, in our case to IL_0007 
  6. At this point ldloc.0 loads the value of our local variable 0 to the stack again
  7. And we return from our function: ret, with the return value already on the stack, to be picked up in the Main function.
To sum up, we saw that value types are stored and processed directly on the evaluation stack.


Reference types are allocated on the heap, which is a different area of memory. When we declare an array of 5 elements like this:

int[] arr = new int[5];

the space for the 5 integers is allocated on the heap. When our array goes out of scope, this memory is not discarded immediately. The C# garbage collection will eventually discard it, when it determines that the memory is no longer needed. Reference types involve greater overhead, but they have the advantage that they are accessible from other classes.

We shall look at the reference types in more detail in my following post.

Friday, November 6, 2015

Generating an Array of Consecutive Intergers in C#

Recently I had to generate an array of consecutive integers from 0 to n-1 for a given number n. I found three ways of doing this in C#, some of which are more elegant and succinct than others. Now I would like to find out which of them is the most efficient and why.

The three methods are:

  1. Using a conventional loop
  2. Using a clever variant of the LINQ Select query
  3. Using the Range method of the Enumerable class
1. The first method uses a plain loop in a straightforward way.

public static int[] PlainLoop(int n)
{
    //create an array of n integers
    var arr = new int[n];
    // in a loop set each element of the array to be equal to its index
    for (int i = 0; i < n; i++) arr[i] = i;
    //return results
    return arr;
}

We instantiate an array of n integers, all elements by default being 0. Then we loop through the array and make each element to be equal to its index in the array. So the element with index 0 is 0, the element with index 1 is 1, and so on until we come to the last element, which has index n-1. So basically the problem is reduced to returning the array of indices of an array of length n!

2. Then I thought, why not implement this idea as a one-liner, using LINQ (Language Integrated Query), and specifically the LINQ Select query with index:

public static int[] LinqSelect(int n)
{
    //create an array of n integers, 
    //then use Linq Select with Index, to get the indexes of the array, 
    //and create an array based on this select query
    return new int[n].Select((x, ind) => ind).ToArray();
}

So again we create an array of n integers, then select the indices of the array into a separate array. This looks pretty nice and quite straightforward.

3. Another method, which I think has been designed specifically for this purpose, is to use the static Range method of the System.Linq.Enumerable class. This gives us a perfect one-liner:

public static int[] EnumerableRange(int n)
{
    //use the Range query of the Enumerable class
    //and create an array based on this select query 
    return Enumerable.Range(0, n).ToArray();
}

The Range method generates a sequence of integers, whereby you can specify the number to start with (in our case 0) and how many numbers you want (in our case n).

So far so good. Now let's look at the performance of these three methods, by executing them in a profiler. I created a unit test that exercises each of these methods with n equal to 1 million:

[TestMethod]
public void GenArrayOfConsInts_PerfTest()
{
    int n = 1000000;
    var arr1 = GenConsecInts.PlainLoop(n);
    var arr2 = GenConsecInts.LinqSelect(n);
    var arr3 = GenConsecInts.EnumerableRange(n);
}

In the Visual Studio Text Explorer, we can run this test through the performance profiler:


Let us examine the call tree trace of the profiler:


We see that the plain loop method is the most efficient one taking only 2.67% of the total processing time. Then comes the Enumerable.Range method with 9.86% (almost 4 times slower), and then comes the LINQ Select method with the staggering 87.44% (about 30 times as slow as the plain loops). We also notice that a select in this method is called 1,000,000 times, for each array element taking quite some time to execute.

So the conclusion is pretty clear, using plain loops in this way is very fast, Enumerable.Range is also OK, and in addition very elegant (!), but LinqSelect is way too slow. The question is of course: why is that?

I will update this post when I figure out the why!