What I’ve Learned from (Nearly) Failing to Refactor Hudson
Join the DZone community and get the full member experience.Join For Free
we’ve tried to refactor hudson.java but without success; only later have i been able to refactor it successfully, thanks to the experience from the first attempt and more time. in any case it was a great learning opportunity .
the two most important things we’ve learned are:
- never underestimate legacy code. it’s for more complex and intertwined than you expect and it has more nasty surprises up in its sleeves than you can imagine.
- never underestimate legacy code.
and another important one: when you’re tired and depressed, have some fun reading the “ best comments ever ” at stackoverflow . seeing somebody else’ suffering makes one’s own seem to be smaller.
i’ve also started to think that the refactoring process must be more rigorous to protect you from wandering too far your original goal and from getting lost in the eternal cycle of fixing something <-> discovering new problems. people tend to do depth-first refactoring changes that can easily lead them astray, far from where they actually need to go; it is important to stop periodically and look at where we are, where we are trying to get and whether we aren’t getting lost and shouldn’t just prune the current “branch” of refactorings and return to some earlier point and try perhaps a completely different solution. i guess that one of the key benefits of the mikado method is that it provides you with this global overview – which gets easily lost when it is only in your head – and with points to roll-back to.
evils of legacy code
use a dependency injection framework, for god’s sake! singletons and their manual retrieval really complicate testing and affect the flexibility of the code.
don’t use public fields. they make it really hard to replace a class with an interface.
reflection and multithreading make it pretty difficult if not impossible to find out the dependencies of a particular piece of code and thus the impacts of its change. i’d hard time finding out all the places where hudson.getinstance is invoked while its constructor is still running.
our way to failure and success
there is a lot of refactoring that could be done with hudson.java, for it is a typical god class which additionally spreads its tentacles through the whole code base via its evil singleton instance being used by just about anyone for many different purposes. gojko describes some of the problems worth removing .
we’ve tried to start small and “normalize” the singleton initialization, which isn’t done in a factory method, but in the constructor itself. i haven’t chosen the goal very well as it doesn’t bring much value. the idea was to make it possible to have potentially also other implementations of hudson – e.g. a mockhudson – but with respect to the state of the code it wasn’t really feasible and even if it was, a simple hudson.setinstance would perhaps suffice. anyway we’ve tried to create a factory method and move the initialization of the singleton instance there but at the end we got lost in concurrency issues: there were either multiple instances of hudson or the application deadlocked itself. we tried to move pieces of code around, but the dependencies wouldn’t have let us do that.
while reflecting on our failure i’ve come to the realization that the problem was that hudson.getinstance() is called (many times) already during the execution of the hudson’s constructor by the objects used there and threads started from there. it is of course a hideous practice to access a half-baked instance before it is fully initialized. the solution is then simple: to be able to initialize the singleton field outside of the constructor, we must remove all calls to getinstance from its context .
the steps can be seen very well from the corresponding github commits . summary:
- i used the “introduce factory” refactoring on the constructor
- i modified proxyconfiguration not to use getinstance but to expect that the root directory will be set before its first use
- i moved the code that didn’t need to be run from the constructor out, to the new factory method – this resulted in some, hopefully insignificant, reordering of the code
- finally, i also moved the instance initialization to the factory method
i can’t be 100% sure that the resulting code has the same semantic as far as it matters, for i had to do few changes outside of the safe automated refactorings and there are no useful tests except for trying to run the application (and, as is common with legacy applications, it wasn’t feasible to create them beforehand).
the refactored code doesn’t provide much added value yet but it is a good start for further refactorings (which i won’t have the time to try ), it got rid of the offending use of an instance while it is being created and the constructor code is simpler and better. the exercise took me about four pomodoros , i.e. little less than two hours.
if i had the time, i’d continue with extracting an interface from hudson, moving its unrelated responsibilities to classes of their own (perhaps keeping the methods in hudson for backwards compatibility and delegating to those objects) and i might even use some aop magic to get a cleaner code while preserving binary compatibility (as hudson/jenkins actually already does ).
try it for yourself!
get the code
get the code as .zip or via git:
|firstname.lastname@example.org:iterate/coding-dojo.git # 50mb => takes a while|
|3||git checkout -b mybranch initial|
compile the code
as described in the dojo’s readme .
|2||cd maven-plugin; mvn install; cd .. # a necessary dependency|
|3||cd hudson/war; mvn hudson-dev:run|
and browse to http://localhost:8080/ (jetty should pick changes to class files automatically).
if you’re the adventurous type, you can try to improve the code more by splitting out the individual responsibilities of the god class. i’d proceed like this:
- extract an interface from hudson and use it wherever possible
move related methods and fields into (nested) classes of their own,
the original hudson’s methods just delegate to them (the move method
refactoring should be useful); for example:
- management of extensions and descriptors
- authentication & authorization
- cluster management
- application-level functionality (control methods such as restart, updates of configurations, management of socket listeners)
- ui controller (factoring this out would require re-configuration of stapler)
- convert the nested classes into top-level ones
- provide a way to get instances of the classes without hudson, e.g. as singletons
- use the individual classes instead of hudson wherever possible so that other classes depend only on the functionality they actually need instead of on the whole of hudson
learning about jenkins/hudson
if you want to understand mode about what hudson does and how it works, you may check:
- hudson’s architecture and optionally proceed with
- building hudson
- introduction into the ui framework stapler (its key feature is that it cleverly maps urls to object hierarchies [and view files and action methods]), perhaps check also stapler’s reference
sidenote: hudson vs. jenkins
once upon time there was a continuous integration server called hudson but after its patron sun died, it ended up in the hands of a man called oracle. he wasn’t very good at communication and nobody really knew what he is up to so when he started to behave little weird – or at least so the friends of hudson perceived it – those worried about hudson’s future (including most people originally working in the project) made its clone and named it jenkins, which is another popular name for butlers. so now we have hudson backed by oracle and the maven guys from sonatype and jenkins, supported by a vivid community. this exercise is based on the source code of the jenkins, but to keep the confusion level low i refer to it often as hudson for that is how the package and main class are called.
refactoring legacy code always turns out to be more complicated and time-consuming than you expect. it’s important to follow some method – e.g. the mikado method – that helps you to keep a global overview of where you want to go and where you are and to regularly consider what and why you’re doing so that you don’t get lost in a series of fix a problem – new problems discovered steps. it’s important to realize when to give up and try a different approach. it’s also very hard or impossible to write tests for the changes so you must be very careful (using safe, automated refactorings as much as possible and proceeding in small steps) but fear shouldn’t stop you from trying to save the code from decay.
Opinions expressed by DZone contributors are their own.