Sunday, November 13, 2011

Model-Based Testing of Legacy Code – Part II – Risk Profiling

Last time I left you after describing my latest challenge – making sense of a large piece of legacy code. I eluded to the fact that we started building a risk profile for the changelist that made up the difference between a well-tested root branch and a poorly tested branch.
Okay, so what did we do to tackle this immense problem? Well, our approach was to compute a risk profile based on a model of the probability of code lines containing bugs. Without digging too deep into the mathematics, this is how the profile is built:

A)     All lines of code for files that differs are loaded into the profile
B)     Each line of code is assigned two risk weights: Test coverage, and difference weights
C)     For each file the weights are summed up and normalized and a third weight is introduced: Revision weight

The three weights are combined into a total weight for each code line and aggregated up to file level. The computation is a weighted average of the normalized weights, where the test coverage and difference weights are multiplied together:

Wednesday, November 2, 2011

Model-Based Testing of Legacy Code – Part I

It has been a while since I had time to post on this blog. I've been kept busy attending the Model-Based Testing User Conference - it had some really great presentation, that I also want to find some time on reporting on. I have a trip report ready, but I need to boil it down to the relevant stuff before I post it here. Granted, my next post is not going to be on Model-Based Testing, but it will however be the first in a series of posts that - with a bit of luck - will build up to some really interesting Model-Based Testing!

That being said, let's jump into it. Recently I’m being faced with a challenge of taking a large set of legacy code changes and test it in a way that is sensible. I work with a colleague in a two man team on solving this problem.
You can think of the changes as a source control branch. The root branch has good test coverage, but the new branch has no automated test coverage. The changes contains both added functionality to the product, but also smaller tweaks to existing code in the form of changes to existing functionality and regulatory changes.

The objective is of course to apply Model-Based Testing somehow, but the major challenges are:
A)     How do we make sense of this new branch?
B)     How do we approach automation of legacy functionality?
C)     How much can be obtained from modeling?

I will be writing a few blog posts for the coming weeks on the progress we make as we go along, but ultimately I would like to hear from my readers, what ideas they have on how to tackle this problem.

So let’s start with the first challenge...

How do we make sense of this new branch?
We decided to analyze the changes and group them into two sets: Features and integration points. A feature is a coherent set of changes that was developed in the same go, whereas integration points cover the incoherent changes that are either smaller or made over a longer period of time. Later we will see how this categorization affects the automation strategy we would like to apply.

Luckily we have documentation that names these features and link them to source files, so we do not have to make a detailed analysis to uncover what is coherent and what is not. Some documented features are small and can be seen more as integration points.

All integration points can be identified through a simple source code difference analysis. Notice this includes all features as well. It’s an open challenge how to identify overlap...

This was a brief introduction to the upcoming project from my side.

The next step that we are working on is coming up with a structured approach on how to handle legacy code, where we build a risk profile over the changes, more on this next week...