Usually, people want to improve their tests but do not have quality metrics to determine which version of their improvements is most beneficial to their projects. The presented assessment framework can help you to figure out which is the best possible enhancement that you need to introduce into your system tests and so make them more stable, reliable, and maintainable.
I am going to present to you eight criteria for system tests architecture design assessment. You can find some of them in different books and blog posts, but this list is unique. My teammates and I created it specifically for our system tests design improvements.
What Problems Did We Have?
We had over 1000 tests that ran for over six hours on a single machine. Sometimes, all of them were green — but sometimes, there were problems that we needed to troubleshoot repeatedly. The biggest problem for us was that we could not trust our UI tests. They verified a big part of the system but were so brittle because they were not designed in a way that can be easily modified. Small changes in the main workflow usually caused regression in a random group of tests. Our challenge was to find a better design so that we can refactor them and make them more maintainable, more readable, and always green.
Before we came up with the system, we tried to patch up the tests and find quick solutions, hoping that this way we can fix the regression problems and simultaneously be able to add new tests. However, our problem here was that for quite some time we did not have the whole picture. As you will see through the analysis and comparison of the different ideas, we can achieve much better results.
Assessment System Criteria
Now, it is time to present to you the different levels of the system. Each level represents a characteristic of the tests. As you will see, they are listed in a numbered order, which means that they are ordered by importance. However, I think this order depends highly on the context of your team and the skill of its members. Therefore, you can reorder the criteria if you want.
- Code complexity index.
- Learning curve.
- Principle of least knowledge.
- Keep it Simple, Stupid (KISS).
Our team is responsible for a complex legacy licensing system. We need to have many regression tests and be able to extend and modify them easily because the maintainability holds the first spot. Since we have many tests, they need to be readable, because sometimes the tests are documentation, too. The third one is CCI. It represents how complex our code is, and we want our code to be simple. In addition, it is the only tool calculated metric. We do not want to reinvent the wheel, so usability is important. The next one is flexibility. How easy is to learn to write tests? The seven is connected with the maintainability. Our last resort of comparing is that the simplest design wins if all other criteria are equal. It is not a metric but a principle.
For every criterion, there will be a rating assigned. You can find the possible ratings below. In addition, they have a number representation.
- Very poor.
- Very good.
Steps to Apply
Here are some pragmatic steps to apply the system.
1. Create research and development branch.
2. Create separate projects.
3. Choose a small set of tests.
4. Implement the set for each design.
5. Present the designs to your team.
6. Every participant uses the system.
7. Create final triage meeting.
First, create a completely new Research and Development branch. Then, create separate projects to test your new ideas. Do not refactor your existing test framework’s code before you are completely sure which idea is the best for your case. To be able to evaluate effectively and assess the different ideas, it is best if you choose a small set of identical tests to implement. Create different folders for each idea. Choose a same small set of identical tests to implement for each design. If the tests you create are different, how do you expect the assessment to be accurate? Usually, we do not refactor directly all of our tests because it costs a lot of time that we usually do not have.
Anyway, the system will work for any number of tests. Present the design to your team. Use the provided eight-level evaluation system to assess the different solutions. It is best if a couple of people participate in the process because some of the points are personally subjective (like what a readable test is or which design is easier to learn). Create a final triage meeting with your whole team and decide which idea to implement based on the results of the assessment.
Assessment Criteria Definitions
Before we proceed with the examples how we use the system, I am going to explain what every criterion means.
The official definition by Wikipedia is the following: Maintainability has been defined as:
"The ease with which a software system or component can be modified to correct faults, improve performance or other attributes, or adapt to a changed environment."
The keyword here is ease.
The most important part for me is the troubleshooting. How much time do you need to find out if there is a bug in the functionality that the test is asserting or it is a problem with the test itself? When there is some issue in the code, you are looking into the logs. You are all sweaty, looking and looking, unable to locate it. You debug deeper and deeper and deeper to find out the root cause. I am sure you have experienced it more than once. This is the maintainability that I mean.
Readable code is code that clearly communicates its intention to the reader. Code that is not readable takes longer to understand and increases the likelihood of defects. There is a tendency for some programmers to use comments as a substitute for readable code or to simply comment for the sake of commenting. I believe test readability means how easy is to find what the test does without the need for huge comments or large test descriptions. I am sure all of you at least once in your lifetime have seen a test’s name that is two rows long.
Code Complexity Index
The code complexity index is our custom-made metric. We created a formula for it. It contains four important parts that can be calculated with tools such as Microsoft Visual Studio IDE. This is the only metric from the system that is tool calculated. All others are based on the participants’ opinion.
Code Complexity Index Rating = (Depth of Inheritance Rating + Class Coupling Rating + Maintainability Index Rating + Cyclomatic Complexity Rating)/4
First, depth of inheritance. The deeper the hierarchy, the more difficult it might be to understand where particular methods and fields are. For class coupling, good software design dictates that types and methods should have high cohesion and low coupling. High coupling indicates a design that is difficult to reuse and maintain because it’s interdependent on other types. These metrics’ calculations are available in the development editions of the application, even in the free one (the community edition).
Maintainability index calculates an index value between zero and 100 that represents the relative ease of maintaining the code. A high value means better maintainability. Most of the formulas used to calculate the metrics are not public. However, I found an unofficial one for the maintainability index. I am not going to decipher it. I wanted to emphasize that real mathematics stay behind these metrics.
Maintainability Index = MAX(0,(171 – 5.2 * ln(Halstead Volume) – 0.23 * (Cyclomatic Complexity) – 16.2 * ln(Lines of Code))*100 / 171)
Let's talk about cyclomatic complexity. Below, you can find the formula for cyclomatic complexity. Cyclomatic complexity is based on the number of decisions in a program. The control flow shows seven nodes (shapes) and eight edged (lines), thus using the formal formula the Cyclomatic complexity is 8 – 7 + 2 equal to 3.
Cyclomatic Complexity = CC=E-N + 2
E = the number of edges of the graph
N = the number of nodes of the graph
I could not find any official values published by Microsoft for assessment of these criteria. So I did some research and read blog posts of Microsoft MVPs that suggested a sample assessment system. I modified it a little bit to fit our needs. You can observe the result in the presented table. We use the table to calculate the rating for the different parts of the formula.
By usability, I mean how easy it is to use the test framework API. How much effort is required to write a new common test leveraging on the existing test API? How much code do you need to write a single simple test? If you use complex design patterns and many classes, your tests may become complex. The tests writing should be a straightforward process, should bring joy and pleasure to the writer.
By flexibility, I mean how easy it is to add a new step to the existing workflow. If you have 100 tests that use one primary method and the whole process is described there, that means that if you want to support 20 different use cases, you need to have many conditions in your code. Usually, the conditions tend to make the code more complex and less maintainable. In addition, design as previously described will not follow the Open Closed Principle that states that software entities should be open for extension, but closed for modification. Every change in this imaginary method can affect all of the tests that use it. The best tests framework designs should allow you to add new steps quickly possibily affecting all other tests.
Learning Curve Test Framework API
If a new member joins your team and needs to read 100-page long documentation before he or she is ready to write a test — or, even worse, if you do not have any documentation and you need to spend countless hours teaching each new member how to start writing — this that means you have a poor test framework API learning curve.
Principle of Least Knowledge (Law of Demeter)
When the assessment system was designed, most of our tests shared the currently executed test data through a static class. Most of the time, the different components of the design did not need to use all of the information, so we decided to include the principle to our list. For example, if you have a client that has first name, last name, email, country, and so on, and you have a test for resetting a password — if you pass only the email, everything is OK, but if you need the whole object, this is a problem.
Keep It Simple Stupid (KISS)
Keeping things simple is, ironically, not simple! It requires abstract thinking. Let me quote Martin Fowler:
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”
Think about it for a second. How much code have you seen that was easy to read, that was simple enough to understand? Probably not a lot.
This is not a metric as the previous ones. Moreover, we do not assign a rating for it. We just apply this principle if all other criteria are equal but usually is not necessary.
We can have lots of ideas and approaches, but we need to analyze them well and decide which one is the best. In the next articles from the series, I will show you how to use our system in practice. I will use some of the real designs that we evaluated in the past. I will shortly explain the specifics of each one and then I will assign ratings for each level described in our assessment system. Further, I will clarify the reasoning behind my rating decisions. You can watch my conference talk dedicated to the system here or download the whole slide deck here.