Reference Graphs or 'Feature Sketches' as Tools for Refactoring Legacy Code
The Performance Zone is presented by AppDynamics. Scalability and better performance are constant concerns for the developer and operations manager. Try AppDynamics' fully-featured performance tool for Java, .NET, PHP, & Node.js.
Some of the code we work with at New Relic has been around for quite awhile. Unfortunately, not all of it has stayed clean and well factored over time. If you’re not careful, even the cleanest code can succumb to entropy. And without an active effort to counteract this, you can end up with code that seems unmaintainable.
Luckily, there are some tools we can use to help us make clear, organized, testable changes to even the most complicated codebases. One of my favorites is the ‘Reference Graph’, a slightly expanded version of the ‘Feature Sketch’ described in Working Effectively with Legacy Code. The general idea is that you have a set of changes you want to make to your code, so you graph everything that holds any references to the changes you want to make.
Let’s take a look at the following
sample code, in which all the classes and even the methods on those
classes are a confusing mess of crossed references:
class A def report B.new.get_all_the_state end end class B def get_all_the_state [ munge_state, @state1, D.new.get_state ] end def munge_state get_state end def get_state @state0 end end class C def report D.new.get_state end end class D def get_state @state2 end end module E def self.report [ A.new.report, C.report ] end end
Let’s say that we’ve decided that we want to get rid of the
method and split its responsibilities among the ‘A’ and ‘B’ classes. In
this contrived example, it’s easy to prop up a canned solution that we
should refactor toward. But in a more complicated project, it can be
challenging to know all the ramifications that one simple change
can have on your source code. By graphing the reference between the
methods and instance variables around the code we want to change, we can
get a good idea of the scope of the changes we want to make. A visual
representation of what the references between methods and instance
variables in your code can give you a good view of where your tightest
knots of code are hiding and guide your work to untangle them.
I use Graphviz to create the graphs. It renders images from dot files, which you use to define node, style them and define their associations. For the code above, this would be the dot file. Once you’ve installed Graphviz, you can create the graph below with this command:
the parts we want to change are in red and all the parts that refer to
parts we want to change are in orange. This gives us an idea of the
scope of the changes we want, as well as a roadmap to getting those
changes done incrementally. We see that we can’t make changes to
B#get_state without also making changes to
as well. The tests for these methods will also have to change or be
removed entirely. The existing tests for the orange boxes should not
have to change, but new ones will likely need to be written to cover the
functionality of the old red boxes. Neither code nor test changes for
the black boxes should be necessary, since the orange boxes should
isolate all the changes within their interfaces and keep them hidden
from unrelated code.
As I work my way through the refactor, I go back to the dot file and keep it up to date with my changes. This is a great way to keep track of my progress and it keeps me from getting lost, which is a very real risk in large refactors of complex code. I like to check the Graphviz dot file into source control. That way if I get stuck too badly, I can always go back to my graph and see if it’s still current. It’s also nice to be able to go back and look at your progress. The feedback you get from seeing changes in the graph can help you find areas for further development (This is similar to the feedback loop of TDD.) If the graph starts to get more twisted, you may need to rethink your code change.
Once we’re all done, our graph should look like this:
The newly added green boxes are either new entries or old entries that are completed. At this point, our code should look like this:
class A def report [ B.new.zeros, ones ] end def ones B.new.ones end end class B def zeros @state0 end def get_state @state1 end end class C def report @state2 end end module E def self.report [ A.new.report, C.report ] end end
While contrived examples are nice for demonstrating a concept, they don’t do a good job showing how well this technique works in practice. In my experience, it takes about two to three hours to construct the graph for a non-trivial tract of code. I consider it time well spent and I find keeping it up to date as I work is surprisingly unobtrusive. In fact, I actually look forward to making my incremental changes then seeing the graph clean up and straighten out as I work.
Recently, I overhauled the configuration system in the Ruby agent. Here’s a teaser:
If you’re curious, the dot file for these graphs can be found in the git history for the Ruby agent. You can browse through the history and recreate the graph at each point.
If you’d like to find out more about this subject, I encourage you to read Michael Feather’s book for more information on this and other techniques for working with code that’s suffered the ravages of time and entropy.