Nine Steps of Learning by Refactoring
If you're looking to refactor your code, especially in an object-oriented environment, then you'll want to look through these nine steps.
Join the DZone community and get the full member experience.Join For Free
i was asked on twitter recently how it is possible to refactor if one doesn't understand how the code works. i replied that it is "learning by refactoring." then i tried to google it and found nothing . i was surprised. to me, refactoring seems to be the most effective and obvious way to study source code. here is how i usually do it, in nine object-oriented steps.
according to wikipedia, code refactoring is "the process of restructuring existing computer code-changing the factoring-without changing its external behavior." the goal of refactoring is to make code more readable and suitable for modifications.
martin fowler in his famous book refactoring: improving the design of existing code suggested a number of refactoring techniques which help to make code simpler, more abstract, more readable, etc. some of them are rather questionable from an object-oriented standpoint — like encapsulate field , for example — but most of them are valid.
here is what i'm usually doing when i don't know the code but need to modify it. the techniques are sorted by the order of complexity, starting with the easiest one.
remove ide red spots
when i open the source code of
in intellij idea, using my custom
, i see something like this:
when i open the source code of, say,
, i see something like this (it's
randomly picked out of a thousand other classes that look very similar):
see the difference?
the first thing i do, when i see someone else's code is to make it "red spots free" for my ide. most of those red spots are easy to remove, while others will take some time to refactor. while doing that, i learn a lot about the
program i have to deal with.
remove empty lines
i wrote some time ago that empty lines inside method bodies are bad things. they are obvious indicators of redundant complexity. programmers tend to add them to their methods in order to simplify things.
this is a method from the
code base (class
picked at random, but almost all other classes are formatted the same way):
aside from being "all red" their code is full of empty lines. removing them will make code more readable and will also help me understand how it works. bigger methods will need refactoring, since, without empty lines, they will become almost completely unreadable. hence, i compress, understand, and make them smaller mostly by breaking them down into smaller methods.
make names shorter
i'm generally in favor of short one-noun names for variables and one-verb names for methods. i believe that longer "compound" names are an indicator of unnecessary code complexity.
for example, i found this method
(69 characters!) in the
class in spring boot. i wonder why the author skipped the
prefix and the
jokes aside, such long method names clearly demonstrate that the code is too complex and can't be explained with a simple
. it seems that there are many different containers, initializers, servlets, and other creatures that need to be registered somehow. when i join a project and see a method with this name, i'm getting ready for big trouble.
making names shorter is the mandatory refactoring step i take when starting to work with foreign or legacy code.
add unit tests
most classes (and methods) come without any documentation, especially if we are talking about closed-source commercial code. we are lucky if the classes have more or less descriptive names and are small and cohesive.
however, instead of documentation, i prefer to deal with unit tests. they explain the code much better and prove that it works. when i don't understand how the class works, i try to write a unit test for it. in most cases, for many reasons, it's not possible. in such a case, i try to apply everything i learned from working effectively with legacy code by michael feathers and growing object-oriented software, guided by tests by steve freeman and nat pryce. both books are pretty much focused on this very problem: what to do when you don't know what to do, testing-wise.
remove multiple returns
that the presence of multiple
statements in a single method is not something object-oriented programming should encourage. instead, a method must always have a single exit point, just like those functions in functional programming.
look at this method from the
class from spring boot (there are many similar examples there, i picked this one randomly):
there are five
statements in such a small method. for object-oriented code, that's too much. it's ok for procedural code, which i also write sometimes. for example,
this groovy script
of ours has five
but this is groovy, and it's not a class. it's just a procedure, a script.
refactoring and removing multiple
statements definitely helps make code cleaner. mostly because, without them, it's necessary to use deeper nesting of
statements, and then the code starts to look ugly unless you break it down into smaller pieces.
get rid of nulls
, it's a well-known fact. however, they are still everywhere. for example, there are 4,100 java files in spring boot v2.0.0.release and 243k loc, which include the
keyword 7,055 times. this means approximately one
for every 35 lines.
to the contrary,
, which i founded a few years ago, has 771 java files, 154k loc, and 58
keywords. that is roughly one
per 2,700 lines. see the difference?
the code gets cleaner when you remove nulls, but it's not so easy to do. sometimes it's even impossible. that's why we still have those 58 cases of
in takes. we simply can't remove them, because they are coming from the jdk.
make objects immutable
as i demonstrated some time ago, immutability helps keep objects smaller. most classes that i see in the foreign code i deal with are mutable. and large.
if you look at any artifact analyzed by jpeek , you will see that in most of them, approximately 80% of classes are mutable. moving from mutability to immutability is a big challenge in object-oriented programming, which, if resolved, leads to better code.
this refactoring step of making things immutable is purely profitable.
static methods and attributes are convenient, if you are a procedural programmer. if your code is object-oriented, they
must go away
. in spring boot, there are 7,482
keywords, which means one for every 32 lines of code. to the contrary, in takes, we have 310
s, which is one every 496 lines.
compare these numbers with the statistics about null, and you will see that getting rid of
is a more complex task.
apply static analysis
this is the final step — and the most complex one. it's complex because i configure static analyzers to their maximum potential or even more. i'm using qulice , which is an aggregator of checkstyle, pmd, and findbugs. those guys are strong by themselves, but qulice makes them even stronger , adding a few dozen custom-made checks.
the principle i use for static analysis is 0/100. this means that either the entire code base is clean and there are no qulice complaints, or it's dirty. there is nothing in the middle. this is not a very typical way of looking at static analysis. most programmers are using those tools just to collect "opinions" about their code. i'm using them as guides for refactoring.
check out this video, which demonstrates the amount of complaints qulice gives for the
sub-module in spring boot (the video has no end, since i lost my patience in waiting):
when qulice says that everything is clean, i consider the codebase fully ready for maintenance and modifications. at this point, the refactoring is done.
Published at DZone with permission of Yegor Bugayenko. See the original article here.
Opinions expressed by DZone contributors are their own.
Introduction to API Gateway in Microservices Architecture
Introduction to Domain-Driven Design
The SPACE Framework for Developer Productivity
Using Render Log Streams to Log to Papertrail