Sunday, November 20, 2016

Refactoring to Patterns: About Patterns

In my previous post in this series about the book Refactoring to Patterns by Joshua Kirievsky, I talked about the concept of refactoring, which is about improving software code without changing the software's behavior.

Today, let's take a closer look at design patterns.

What is a pattern?

The author quotes Christopher Alexander, an architect whose books A Timeless Way of Building and A Pattern Language have inspired the software pattern movement: 
Each pattern is a three-part rule, which expresses a relation between a context, a problem and a solution.

As an element in the world, each pattern is a relationship between a certain context, a certain system of forces which occurs repeatedly in that context, and a certain spatial configuration which allows these forces to resolve themselves.

As an element of language, a pattern is an instruction, which shows how this spatial configuration can be used, over and over again, to resolve the given system of forces, wherever the context makes it relevant.

The pattern is, in short, at the same time a thing, which happens in the world, and the rule which tells us how to create that thing, and when we must create it. It is both a process and a thing; both a description of a thing which is alive, and a description of the process which will generate that thing.
Software patterns appear as part of catalogs of individual patterns, and should not be viewed as stand-alone prescriptions, but rather in conjunction with other, alternative patterns.

Patterns Happy

"Patterns-happy" programmers tend to overuse patterns. They are so much in love with patterns that they apply them regardless of whether their use is justified, making code unnecessarily complex instead of simplifying it. They simply must use patterns in their code.

When learning patterns it is hard to avoid being patterns-happy. The true joy of patterns comes from using them wisely. Refactoring helps us to that by focusing our attention on removing duplication, simplifying code and making code communicate its intention. Evolving systems through refactoring makes over-engineering with patterns less likely.

There Are Many Ways to Implement a Pattern

The famous book Design Patterns by Erich Gamma et al begins the discussion of each pattern with a structure diagram. It is important to realize that this structure diagram is just an example and that there are multiple implementations of the pattern possible, depending on the need at hand. Alternative implementations are often discussed in the implementation notes. But all too often a programmer looks at the diagram and begins coding, assuming that the diagram is the way to implement the pattern.

Deviating from the standard implementation is inevitable and in fact desirable.

The evolutionary approach to software design often leads to minimalistic pattern implementations, which are simpler than classical pattern definitions, because they involve implementing only what is necessary. This is the approach used throughout this book.

Refactoring to, towards and away from Patterns

Depending on the nature of a pattern, one can refactor to the pattern, towards it and even away from it. For some patterns, like Composed Method or Template Method, one has to refactor all the way to the pattern, it does not make sense to refactor half way towards it. For other patterns, it is sometimes sufficient to improve the design by refactoring towards them even if you do not go all the way.

If after the refactoring to a pattern, you feel that your design has not improved enough, you can decide to refactor away from this pattern to another pattern.

The goal is to obtain a better design, not to implement patterns!

Do Patterns Make Code More Complex?

In general, pattern implementations ought to remove duplicate code, simplify logic, better communicate intention, and increase flexibility. Yet, people's familiarity with patterns plays a major role in how they perceive refactoring to patterns. It's better that team learn patterns rather than avoid using them, because the teams view patterns as being to complex.

On the other hand, some patterns implementation can make code unnecessarily more complex; in that case, backtracking or more refactoring is needed.

Pattern Knowledge

Patterns capture wisdom. Reusing that wisdom is extremely useful.

Knowing patterns is not enough to evolve great software, you must also know how to intelligently use patterns. Yet, if you don't study patterns, you'll lack access to important, even beautiful, design ideas.

A good way to learn patterns is to choose  great pattern books to study and then study one pattern a week in a study group. Meeting and discussing important design ideas each week is a great way to become better software designers.

Advice: only read the great books!

Up-Front Design With Patterns

The author prefers to evolve a system, refactoring to, towards or away from patterns as necessary. Up-front design with patterns has some place in a designer's toolkit, but use it rarely and most judiciously.

Saturday, November 19, 2016

Refactoring to Patterns: About Refactoring

I am reading this pretty old (2004) book by Joshua Kirievsky called Refactoring to Patterns, which I would like to study deeper and share what I learn with others.

The book, as suggested by its title, combines two key concepts in software development: Refactoring and Design Patterns. In fact, the book argues that great software designs are better understood and learnt not as stand-alone masterpieces of software craftsmanship, but by studying how they emerge through refactoring.

What is Refactoring?

Refactoring is behavior-preserving transformation, "a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior." [Martin Fowler]

Refactoring involves
  • removing duplication
  • simplifying complex logic
  • and clarifying unclear code.
To refactor safely and with courage, you need a set of automated tests that you can run quickly to confirm that your code still works.

Refactoring should be done in small steps that take seconds or minutes.

It's better to refactor continuously, rather than in phases. If you see code that needs improvement, improve it. If you need to implement an important feature by tomorrow, first finish the feature and refactor later. Refactoring must co-exist harmoniously with business priorities.

Refactoring Motivations

  • Make it easier to add new code: when adding a new feature, we can use two approaches, none of which is right or wrong:
    • either code it quickly without regard to how well it fits with an existing design, refactor later;
    • or modify the existing design so it can easily and gracefully accommodate the new feature.
  • Improve the design of existing code: sniff constantly for code smells and remove smells immediately (or soon) after finding them. This is a great hygienic habit. It can also lead to greater job enjoyment.
  • Gain a better understanding of code: If some code is not clear, it's an odor that needs to be removed by refactoring, not by deodorizing the code with a comment.
  • Make coding less annoying: We often refactor simply to make code less annoying to work with.

Many Eyes

To get best refactoring results, you'll want the help of many eyes, which is one of the reasons for the practices of pair-programming and collective code ownership.

Human-Readable Code

Good code 
  • reads like spoken language
  • separates important code from distracting code
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." [Martin Fowler]

Keeping Code Clean

Refactoring is a lot like cleaning your room. The worse the mess becomes, the harder it is to clean and the less you want to clean it. One giant cleanup is not enough, you must practice continuous hygiene. To keep code clean, we must continuously:
  • remove duplication
  • simplify code
  • clarify code
Do not tolerate messes in code, and do not backslide into bad habits.

Clean code -> Better design -> Faster development -> Happy customers and programmers 

Small Steps

Take very small steps and keep the unit tests green! They should not stay red for more than a few minutes.

Design Debt

Design debt is a much better metaphor to communicate with management than the technical language of refactoring. It occurs when you do not consistently do 3 things:
  1. Remove duplication
  2. Simplify your code
  3. Clarify your code's intent
When do you pay back the debt? In financial terms, when you don't pay your debt, you incur late fees. If you don't pay your late fees, you incur higher late fees, and so on. Compound interest kicks in, and going out of debt becomes an impossible dream. So it is with design debt.

Evolving a New Architecture 

Evolutionary design suggest that you:
  • Form one team
  • Drive the framework from application needs
  • Continuously improve applications and the framework by refactoring

Composite and Test-Driven Refactorings

Composite refactorings are high-level refactorings composed of low-level refactorings. Between applying low-level refactorings you run unit tests.

Test-driven refactorings involove applying TDD to produce replacement code and then swap out old code for new code (while retaining and rerunning the old code's tests).
When it's impossible to evolve a design through composite refactorings, test-driven refactorings can be used to produce a better design. Substitute Algorithm is a good example of a test-driven refactoring.

Benefits of composite refactorings:
  • They describe an overall path for a refactoring sequence
  • They suggest non-obvious design directions
  • They provide insights into implementing patterns

Sunday, November 22, 2015

C# Value Types, Stack and .NET Intermediate Language

Most basic built-in types in C#, such as integers, doubles and other numeric types, booleans, but notably except string, are so-called value types, as opposed to reference types, such as arrays and classes. The difference between the two is how they are stored in memory.

Value types are stored on the stack. So when we, for example, do an integer assignment like this:

int x = 18;

The value 18 is pushed to the stack. When this variable goes out of scope (like when the method where it is declared has finished executing), it is popped out of the stack and discarded. This is a very efficient mechanism, but it makes value types very short lived and hence less suitable for sharing between classes.

If we want to pass such a value to a different method, the value is pushed to the stack, picked up by this other method, which copies this value and loads the copy on the stack, performs operations on it, and when done, discards the copy from the stack. Then we are back in our original method, which may perform other actions on the original value, but when done, it discards the value from the stack.



Let's see if we can see this by examining this process in the Intermediate Language Disassembler (ildasm.exe), which can be found in the .NET Software Development Kit (SDK). On my computer it is located in

"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6 Tools\ildasm.exe"

Intermediate Language (IL) code is produced when we compile our source code. At run time this code is translated into native machine instructions, which are then executed by the processor.

So I let's see what Intermediate Language (IL) code is produced from this simple C# code:

public static void Main()
{
    int x = 18;
    int square = GetSquare(x);
}
 
private static int GetSquare(int x)
{
    return x * x;
}

We build this code and open the resulting dll or executable in ildasm (I also chose to show source code lines as comments). This is what our Main method looks like in IL:

.method public hidebysig static void  Main() cil managed
{
  // Code size       12 (0xc)
  .maxstack  1
  .locals init ([0] int32 x,
           [1] int32 square)
//000012:         {
  IL_0000:  nop
//000013:             int x = 18;
  IL_0001:  ldc.i4.s   18
  IL_0003:  stloc.0
//000014:             int square = GetSquare(x);
  IL_0004:  ldloc.0
  IL_0005:  call       int32 HappyCoding.ValueTypes::GetSquare(int32)
  IL_000a:  stloc.1
//000015:         }
  IL_000b:  ret// end of method ValueTypes::Main

It may be a bit difficult to read IL in the beginning, there is a good tutorial on that here: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm

The IL syntax highlighting is provided by this useful Visual Studio extension: IL Support

So this is what's happening here:

  1. The .maxstack  1 directive indicates that the maximum stack depth used in our code is 1, meaning there won't be more than one value on the stack at any time during the execution of our code.
  2. The .locals init directive declares local variables accessible through an index, so the variable x will be known in further code as variable 0, while square will be known as 1. The init keyword requests that the variables be initialized to a default value before the method executes.
  3. nop just means: no operation (do nothing)
  4. ldc.i4.s 18 pushes the value 18 as a 32-bit (4-byte) integer onto the stack. So ldc stands for load constant onto the stack (push). i4 stands for a 4 byte integer, also known as int or int32 in C#. If the value of the constant were less or equal to 8, then this command would use the value directly, like in: ldc.i4.7 
  5. stloc.0 pops the value from the stack into local variable 0 (which is the index of our variable x). stloc stands for store (pop) to local variable. So in order to assign a constant value to a local variable, we need two commands: push the constant value onto the stack and pop it from the stack into the local variable.
  6. Now we are ready to call our GetSquare method. We start by loading onto the stack the value of local variable 0 (which is x): ldloc.0 
  7. The the GetSquare function is called:  call int32 HappyCoding.ValueTypes::GetSquare(int32)
    (we'll look at the execution of that call a bit later)
  8. The return value of the function call is then popped from the stack into the local variable 1 (which is square): stloc.1
  9. Finally we return from our Main method, but without any value the return type is void: ret

Let us now see what happens in the GetSquare function:

.method private hidebysig static int32  GetSquare(int32 number) cil managed
{
  // Code size       9 (0x9)
  .maxstack  2
  .locals init ([0] int32 V_0)
//000018:         {
  IL_0000:  nop
//000019:             return number * number;
  IL_0001:  ldarg.0
  IL_0002:  ldarg.0
  IL_0003:  mul
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
//000020:         }
  IL_0007:  ldloc.0
  IL_0008:  ret// end of method ValueTypes::GetSquare
  1. We see the familiar directives that the max stack depth will be 2 and that there is one local variable V_0. But we do not create any local variable in code!? We just return the product. So it looks like the compiler does the creation of a local variable for us and calls it V_0 !
  2. By repeating ldarg.0 two times, the programs loads onto the stack the value of the first argument of our function twice. So now the stack contains two copies of the same value (which was passed to our function as first (and only) argument).
  3. Next the multiplication command mul is called which multiplies the two upper values on the stack, giving us the square of our argument. Internally the mul command pops the two values from the stack, multiplies them and pushes the result back on the stack. You can read more about it here.
  4. stloc.0 pops the result from the stack into the local variable 0 (remember this variable is created for us by the compiler)
  5. br.s IL_0007 stands for branch to target and transfers control to a target instruction, in our case to IL_0007 
  6. At this point ldloc.0 loads the value of our local variable 0 to the stack again
  7. And we return from our function: ret, with the return value already on the stack, to be picked up in the Main function.
To sum up, we saw that value types are stored and processed directly on the evaluation stack.


Reference types are allocated on the heap, which is a different area of memory. When we declare an array of 5 elements like this:

int[] arr = new int[5];

the space for the 5 integers is allocated on the heap. When our array goes out of scope, this memory is not discarded immediately. The C# garbage collection will eventually discard it, when it determines that the memory is no longer needed. Reference types involve greater overhead, but they have the advantage that they are accessible from other classes.

We shall look at the reference types in more detail in my following post.

Friday, November 6, 2015

Generating an Array of Consecutive Intergers in C#

Recently I had to generate an array of consecutive integers from 0 to n-1 for a given number n. I found three ways of doing this in C#, some of which are more elegant and succinct than others. Now I would like to find out which of them is the most efficient and why.

The three methods are:

  1. Using a conventional loop
  2. Using a clever variant of the LINQ Select query
  3. Using the Range method of the Enumerable class
1. The first method uses a plain loop in a straightforward way.

public static int[] PlainLoop(int n)
{
    //create an array of n integers
    var arr = new int[n];
    // in a loop set each element of the array to be equal to its index
    for (int i = 0; i < n; i++) arr[i] = i;
    //return results
    return arr;
}

We instantiate an array of n integers, all elements by default being 0. Then we loop through the array and make each element to be equal to its index in the array. So the element with index 0 is 0, the element with index 1 is 1, and so on until we come to the last element, which has index n-1. So basically the problem is reduced to returning the array of indices of an array of length n!

2. Then I thought, why not implement this idea as a one-liner, using LINQ (Language Integrated Query), and specifically the LINQ Select query with index:

public static int[] LinqSelect(int n)
{
    //create an array of n integers, 
    //then use Linq Select with Index, to get the indexes of the array, 
    //and create an array based on this select query
    return new int[n].Select((x, ind) => ind).ToArray();
}

So again we create an array of n integers, then select the indices of the array into a separate array. This looks pretty nice and quite straightforward.

3. Another method, which I think has been designed specifically for this purpose, is to use the static Range method of the System.Linq.Enumerable class. This gives us a perfect one-liner:

public static int[] EnumerableRange(int n)
{
    //use the Range query of the Enumerable class
    //and create an array based on this select query 
    return Enumerable.Range(0, n).ToArray();
}

The Range method generates a sequence of integers, whereby you can specify the number to start with (in our case 0) and how many numbers you want (in our case n).

So far so good. Now let's look at the performance of these three methods, by executing them in a profiler. I created a unit test that exercises each of these methods with n equal to 1 million:

[TestMethod]
public void GenArrayOfConsInts_PerfTest()
{
    int n = 1000000;
    var arr1 = GenConsecInts.PlainLoop(n);
    var arr2 = GenConsecInts.LinqSelect(n);
    var arr3 = GenConsecInts.EnumerableRange(n);
}

In the Visual Studio Text Explorer, we can run this test through the performance profiler:


Let us examine the call tree trace of the profiler:


We see that the plain loop method is the most efficient one taking only 2.67% of the total processing time. Then comes the Enumerable.Range method with 9.86% (almost 4 times slower), and then comes the LINQ Select method with the staggering 87.44% (about 30 times as slow as the plain loops). We also notice that a select in this method is called 1,000,000 times, for each array element taking quite some time to execute.

So the conclusion is pretty clear, using plain loops in this way is very fast, Enumerable.Range is also OK, and in addition very elegant (!), but LinqSelect is way too slow. The question is of course: why is that?

I will update this post when I figure out the why!

Tuesday, April 16, 2013

How to quickly open Visual Studio project's properties

I used to do this by right-clicking the project in the Solution Explorer and scrolling all the way down the context menu to select Properties. This is really cumbersome. It turns out there is a much quicker way to do this: Just double-click the Properties node under the project in the Solution Explorer! And of course, there is a shortcut for that too: Alt+Enter !!! It works both in Visual Studio 2010 and 2012.



Happy coding!

Thursday, March 21, 2013

Learning Web App Security with Google Gruyere

I've been developing web applications for almost 10 years now, but I was rarely concerned with application security issues. To be sure, I am a defensive programmer, meaning that I do all kinds of checks on the inputs received by my functions, leaving very little to chance. But although I heard much about cross site scripting (XSS) and SQL injection, I didn't have a very clear understanding of these concepts. I relied on best practices (like parametrized database queries) and inherent ASP.NET features like request validation to take care of malicious inputs, be it form variables or query string parameters.

Yet I always had a gnawing feeling that I should get more understanding of web app vulnerabilities, attacks and defenses. So this month I started to look for way of how I can brush up on this important topic.

I found a couple of books, of which this one seems to collect a lot of praise from readers: The Web Application Hacker's Handbook (WAHH). I started to read the book, but I also wanted something more practical. I came across Open Web Application Security Project (OWASP) with its plenitude of resources and the famous OWASP Top 10 list of web app security flaws.

OWASP members also develop security software such as the security testing tool ZAP and the request intercepter proxy WebScarab. Both I really handy and easy to use.

OWASP has also produced WebGoat, a fictitious web application full of vulnerabilities that you can run and test locally on your PC. WebGoat has a number of lessons to teach about various security flaws that you can try to discover on your own with the help of some hints. Although a great resource, I found WebGoat somewhat lacking in the quality of their materials.

And then I stumbled upon Google Gruyere, which is a very elaborate web security code lab from Google Code University. It is much similar to WebGoat in that it gives you a sandbox to learn about and try to dicover security flaws in a Python web application that you can run either locally or online. It does a great job of explaining various security concepts and provides challenges to explore them in practice as well as guidance on how to guard against them. It is especially good at explaining the various flavors of XSS attacks, but it also provides a good foundation for understanding many other topics such as path traversal, denial of service and code execution. It touches upon but doesn't go into the details of SQL injection.

Learn how to make web apps more
secure. Do the Gruyere codelab.

I've gone through the lab and thoroughly enjoyed the challenges and learned to use the tools like WebScarab and ZAP. I'd recommend it to anyone interested in web application security!

In parallel I did some security testing of the web applications that I've been involved with for the last year or so. I found that ASP.NET does a great job protecting ASP.NET applications from certain types of attacks out of the box. I found some minor flaws that are mostly due to relying too much on client side validation and forgetting to validate user input again on the server. A quite trivial example of this is being able to intercept a request and change the house number to a negative value. But I also discovered a more serious exploit using the same technique, which I will not describe here :)

The main lesson that I've drawn so far is that we should never trust input coming into our applications, be it through a web browser or a web API. Most security flaws in software result from sloppy programming. Web developers should be well aware of these issues and write their code defensively and test it thoroughly not only from the point of view of functionality but also security-wise.

To sum up, these have been very interesting and instructive few weeks. Web application security is a fascinating topic and I look forward to diving even deeper into it!

Happy coding!


Thursday, March 14, 2013

ASP.NET debug="false" and line numbers in error stack trace

It's a good practice to always log run-time errors to a database table or a log file or both. It is also nice to have line numbers appear in the stack trace to be able to see where in your code the error appears. For some types of errors, e.g. for exceptions of type NullReferenceException, this is especially relevant as there is no other way to determine what went wrong.

In the past I had to deploy a debug build of the application and set debug="true" within the compilation section of web.config, in order to be able to see the line numbers in error logs. But this is not a good idea, as this adversely affects application performance (see this blog from Scott Guthrie for more details and this blog for even more details). So please, always set debug="false" in production environments. Or, even better, set deployment retail="true" in your production machine.config

But what do we do if we really want more detailed error information in our logs? There appears to be a an easy way to do this. In the properties of your project, in the Build section, select your Release configuration, click Advanced and make sure that Debug Info is set to pdb-only. This is equivalent to setting this compiler option: /debug:pdbonly in the compilerOptions attribute of the compiler element in web.config (for more details about this compiler option read this MSDN article and this blog article). This will emit your optimized release assemblies together with corresponding .pdb files containing all the information needed to reference files and line numbers in exception stack traces. You can then deploy the build output into your production environment.

This will allow you to log error details like line numbers without significantly affecting the performance of your compiler-optimized release assemblies. Warning: if you do this, then also please make sure that you don't set customErrors mode="off" so that you don't expose detailed exceptions to remote users (which is a wise thing to do from security standpoint!)

Happy coding!