Refactoring In A Legacy Code Jungle
Refactoring In A Legacy Code Jungle
Yes, it’s challenging to write new tests on legacy code—but legacy code is often the area of a product most in need of testing.
Join the DZone community and get the full member experience.Join For Free
[Latest Guide] Ship faster because you know more, not because you are rushing. Get actionable insights from 7 million commits and 85,000+ software engineers, to increase your team's velocity. Brought to you in partnership with GitPrime.
Refactoring is a safe action when you have existing tests in place to make sure the working code isn’t broken in the process. However, many organizations accumulate legacy code without building or maintaining corresponding tests, and you can’t write proper tests until you’ve refactored the code. DZone’s 2015 Software Quality survey results report that 61% of respondents were limited in their ability to write automated tests because they had legacy code that needed to be rewritten. In this situation, there are two choices: forgo the adventure altogether or do the brave deed and modify the code.
By leaving the code as-is, you incur costs not just in terms of ongoing maintenance, but you also have to add future maintenance costs to the equation. The code slowly rots away and future change costs rise.
When the project can no longer afford to take on more technical debt, modifying the code is the only choice. The problem is that writing tests for legacy code is hard. Depending on your language of choice (or maybe the current language wasn’t even your choice), here are some of problems you might encounter when trying to write new tests:
The tested object can’t be created.
The object can’t be separated from its dependencies.
There are singletons that are created once and impact different test scenarios.
There are algorithms that are not exposed through a public interface.
There are dependencies inherited by the tested code.
Yes, it’s challenging to write new tests on legacy code—but this doesn’t change the fact that legacy code is often the area of a product most in need of testing.
In other words, you’re going into the jungle.
But before you do, you better tool up with enough refactoring techniques so that when you bump into trouble, you’ll know what to do. Some of these techniques are automatic, which can cut out tedium and human error. Others are manual, and therefore carry an unknown amount of risk. You need to match the tool to the task at hand.
Getting Acquainted with the Hostile Environment
Before you start moving code around, become familiar with your surroundings. The first step is to read the code, jump around from file to file, and think about how you might be able to organize the project a little better. Then start by reorganizing the source structure. Move co-located classes to separate files, move types into areas where you would expect to find them, and fix typos to increase readability and maneuverability working in the codebase. The structure is a model of the code, and if it’s hard to understand the model, it will be difficult to refactor confidently. If we feel comfortable, we’ll be more confident making further changes.
After things are in place, start renaming. Classes, functions,variables, files—anything that can improve readability. If programmers understand the code, the tests will be more effective (and there’s not much point in testing the code if it can’t be understood).
Be sure to modify names that don’t fit their true purpose/describe their functions. For example, we have a function called “getValidCustomer” that returns a success code if a Customer object is updated from the database. It makes sense to rename it to“PopulateValidCustomer.” (While we’re at it, change it to a void method.) Now the names describe the function.
Choosing good names is not as simple as it seems. There’s an art toit, especially with giant catch-all classes. If we use more accurate names, we can mentally (and structurally) refine our model. On the other hand, using generic names hides functionality, which causes inappropriate functionality to gravitate into these classes like a giant black hole. (If you’ve ever written a “-Utils” class, you know what I mean. If you bump into these classes, try to separate them and rename them properly.)
Renaming is low-risk, as it’s mostly done automatically by the IDE. Usually, when doing pre-test renaming, it’s recommended that you concentrate more on method names and variables in thecode. These are usually small enough to modify without making any larger, potentially damaging changes.
Penetrating the Foliage
In order to test code, you need to access it in different ways: probe the code and check the results; set up data and see how the code reacts; and replace dependencies with mock objects in order to control the tests. The more access points available, the easier writing tests will become. The setup and validation code will be shorter, less prone to error, and able to get better coverage.
We can change the code to introduce access points using these refactoring patterns:
Change accessibility: Change method signature from private to public.
Introduce field: In a long method that does many things,store tested data in accessible fields.
Add accessors: If data is too hidden, use “getters” and “setters”to probe and modify that data.
Introduce interfaces: If we want to mock a dependency, split its functionality into separate interfaces and mock the specific one you need. Then it can be used correctly once testing begins.
Virtualize: Enable overriding and redefining functions by making them virtual.
Like renaming, these changes are also low-risk. However, you mightencounter some resistance from peers who will say, “we shouldn’t expose that, it’s not proper design.” Reassure them these exposures and modifications are temporary for the purposes of test design, and that a more sophisticated approach can be taken once the code is refactored and the new tests are built.
Removing Obstacles for Making a Pathway
Once you can access the code and its dependencies, it’s time toactually move code around—this time it’s not just for readability. By separating and removing dependencies, writing the tests becomes a simpler process.
There are many patterns available to remove dependencies from the code:
Move methods: Especially in large classes, there are private methods that clearly don’t belong in that class. When these methods also use dependencies, they can be moved to separate classes. I usually identify extractable bits of code inthe complex methods, then extract to private methods in that class. You can also explore if these methods can be moved into other classes. From this, we get two benefits: the large class is reduced in size, and you can mock the new method instead of the direct dependency.
Extract classes: The methods mentioned above can also be extracted to entirely new classes. An additional benefit isthat you can specialize the new class, give it a proper name,and make sure that it won’t become a black hole.
Introduce parameter: When methods use a dependency directly (and probably more than one dependency), this process should be modified so that dependencies are sent to methods from the calling code. This way you can set up the dependency from the test. For example, if a function calls a static method, you can introduce a parameter that will contain the result of that static method call. Not only does this weed out the dependencies and make the code testable, it moves the dependency call up the chain. By doing this, you can use Extract Class for the tested code and benefit from having a separation of concerns.
Moving code around in these steps does increase the risk of affecting system function. Slow, thoughtful modifications, executed in pairs, will help to avoid breakage. Luckily, some of these modifications can be done automatically by the IDE, reducing that risk.
The Adventure Begins
Our journey began in the jungle with the prospect of modifying legacy code for the benefits of testability. The truth is, these techniques also apply to general refactoring—modifying the code to be simpler, modular, and more readable.
Teams usually don’t stop here, though. After clearing a path, the real fun begins. You can separate more classes, extract logic from loops, invert conditionals, and make other higher-risk modifications. As the code is simplified, tests will become easier and more effective.
But the sun is setting, and you need to set up camp. You’ll have to continue this journey on your own. There is a wealth of resources out there dedicated to this subject, so don’t stop here.
For best practices on writing, testing, and monitoring quality code, get your free copy of the DZone Guide to Code Quality and Software Agility!
Opinions expressed by DZone contributors are their own.