Most of those legacy system started with a clean design at the beginning because the originally problem it was trying to solve initially was smaller and well-defined. However, as business competition and organization evolutions continuous requires new enhancements/features to be added to the legacy within a very short timeframe. Without changing existing code which is working, these new features will typically be implemented in a completely separated module. For those common functionality that the original system has implemented, they got copied into the new module. This is a very common syndrome of what I call "reuse via copy and paste"
Here is an earlier blog about some common mindset that causes the formation of bad code.
Basically, the idea of "not touching existing working code" encourage a "copy and paste code" culture which over time, causing a lot of code duplication all over the places. So once a bug is found, you need to make sure the fix is put in all the copied code. Once you enhance a feature, you need to make sure where to change all its copies. When there is 20 different places in the code doing the same thing (but slightly different), you start of losing visibility which takes the ultimate responsibility of certain piece of logic. Now the code is very hard to maintain because it is so hard to understand it.
Because you cannot understand it, so you are more scare about making changes to existing code (since they are working). This further encourage you to put new feature in a complete separated module, and further worsen the situation.
Over a period of time, the code is so unmaintainable that adding any new features takes a long time and usually breaks many places of existing code, development team doesn't feel they are productive and work in a low morale condition. As a consultant, you bring us in to help the situation.
At a high level, here are the key steps ...
1. Identify your target architecture
Define a "to-be" architecture that can serve the business objectives in next 5 years. It is important to purposely ignore the current legacy system at this stage because otherwise you won't be able to think "outside the box".
It is important to manage the impression because it is easy to pass out a feeling that this exercise are boiling the ocean, or suggesting throwing the existing system away and start everything from scratch. It is important understand that the "to-be architecture" is primarily a thought exercise for us to define our target. And we should clearly separate our "vision" from the "execution" which we shouldn't be worrying at this stage.
The long-term architecture establish a vision on where we want the ultimate architecture to be and serve as our long-term target. A core vs non-core analysis is necessary to decide which components should be built vs buy.
It is also important to get a sense of possible changes in future and build enough flexibility into the architecture such that it can adapt to future changes when it happens. Knowing what you don't know is very important.
A top down approach is typically used to design the green-field architecture. The level of detail is determined by how well the requirements are known and how likely will they be changed in future. Since the green-field architecture mainly serve the purpose of a guiding target, I usually won't get too deep into implementation details at this stage.
The next step is to get on to the ground to understand where you are now.
2: Understand your existing system
To get me quickly up to speed, my first attempt is talk to people who understand some parts of the current code base, as well as the pain points. I'd also try to skim through existing documents, presentations to get some basic ideas of the existing architecture.
In case people who are knowledgeable about how the legacy system works still available, a formal architecture review process can be a very efficient process to get start on understanding the legacy system.
In case these people has already left, then a different reverse engineering process is needed.
3: Define your action plan
At this point, you have a clear picture of where you are and where you want to be. The next step is to figure out how to move from here to there. This in fact is the hardest part because many factors needs to be taken into considerations.
- Business priorities and important milestone dates
- Risk factors and opportunity costs
- Organization skill set distribution and culture
The next step is to construct an execution plan that optimize business opportunities and minimize cost and risks. Each risk need to have an associate contingency plan (plan B). In my experience, the execution plan usually take on one of the following options.
Parallel development of a green-field project
A small team of the best developers will form an effort to create the architecture from scratch. The latest, best of breed technologies will typically be used such that most of the infrastructure pieces can either be bought or fulfilled by open source technologies. The team will focus in just rewriting the core business logic. The green-field system is typically more easy to understand and more efficient.
After the green field system is sufficiently tested. The existing legacy system will be swapped out (or serve as a contingent backup). Careful planning on data migration, traffic migration is important to make sure the transition is smooth.
One problem of this approach is development cost, because now you need to maintain (within the transition period) two teams of developers working on two systems. New feature requirements may come in continuously and you may need to do the same thing twice in both systems.
Another problem is the morale of the developers who maintain the legacy system. They know the system is going away in future and so they may need to find another job. You may lose those people who are knowledgeable about your legacy system even faster.
Another approach is to refactor the current code base and incrementally bring them back into a good shape. This involve repartitioning of responsibilities of existing components, break down complex components or long methods.
I almost certainly encounter situation like there are certain parts of the code which I am not able to understand it. These may be dead code that never get exercise, or logic that is hidden in many level of indirection. What I typically do is to add trace into the code and rerun the system to see when this code is execute and who is calling it (by observing the stack trace). I will also build a wrapper around the code that I don't understand, shrink its perimeter to a point that I can safely rewrite the component.
It is also quite common that legacy system lacks of unit test, so a fair amount of effort is involved in writing unit test around the components.
One problem of the refactoring approach is it is not easy to get management buy-in because they don't see any new feature when for the engineering effort you spent.