Why Refactor Java Code? Here Are 44 Reasons (Based on 748 GitHub Projects)
Because of changing requirements more than code smells. But what about each kind of refactoring?
Join the DZone community and get the full member experience.
Join For Free(Raw totals from our last poll: lousy naming choices and how often they come up. More analysis to follow.)
Your code won't stay the same forever. Sometimes it changes because requirements have changed; sometimes because external dependencies and interfaces change; sometimes because it wasn't written well in the first place, and so on.
To understand how and why developers refactor, map the flow in two directions:
Change needed -> make change.
Change made <- change needed?
Neither direction is super informative on its own. To the first: many changes are needed that do not result in changes made. So direction 1 captures the need to refactor, but not the refactoring done. To the second: diffs don't always clearly indicate why the change was made. Reasons to refactor are often dirty, anyway: turns out you need to change X_a,b,c, but while you're at it you might as well change X_d,e,f,g (since you're already touching X, getting into its headspace, and unit testing it). So direction 2 does capture start- and end-points of the refactor, but doesn't get to the reasoning behind the change.
Reasoning From Refactoring Type to Reason for Refactor: the Patterns Approach...
The type of refactoring — the nature of the change — does sometimes suggest reasoning, in the way that patterns often do. We've published a Refcard on refactoring patterns, with typical reasoning and Java examples of each. But while patterns repositories like this help you think about how to refactor without harmful side effects, they don't tell you empirically how and why developers are actually refactoring real code.
...vs. the Data Mining Approach
But now we do have some empirical data on how and why developers refactor real Java code. Earlier this year three researchers monitored 748 GitHub projects for changes, inferred refactoring types from the diffs (using their own tool RefactoringMiner), and will be presenting their results at the Foundations of Software Engineering conference in Seattle this November (FSE 2016). Pre-print recently went live on arXiv.
RefactoringMiner can autoextract nine different refactoring types:
Extract Method
Inline Method
Move Method/Attribute
Pull Up Method/Attribute
Push Down Method/Attribute
Extract Superclass/Interface
Move Class
Rename Class
Rename Method
Each of which patterns is recognized with varying accuracy (section 3.2.1 here, summarized in table 1), but with an overall accuracy of 0.93 recall and 0.98 precision.
For two months, Silva, Tsantalis, and Valente checked their autoextracted refactorings for false positives, then emailed the commiters asking why they refactored as they did, what automatic refactoring tools they used (if any), and what IDEs they used while refactoring.
So why do developers perform each of these refactoring types, and how do they do it?
Some results are summarized in the paper (see esp. Tables 3-6). The full dataset is available here. Reasons vary by refactoring type, of course. The simplest takeaway: requirements changes account for more refactorings than code smells. Or so the commiters interviewed say...
(Stay tuned for more inspired by that paper — including where automatic refactoring tools seem to help [and where they don't] and which IDEs are most commonly used to semi-automate refactoring [measured as % of users of IDE i who use i to automate refactoring], okay obviously IntelliJ is #1 but why is Netbeans [a distant] #2, ahead of Eclipse?)
Table 3 summarizes reasons given for extracting methods:
and table 4 summarizes reasons given for other refactorings:
But Why Do We Refactor?
That study goes from refactorings to reasons given — valuable, semi-objective data. But it doesn't capture how developers begin their refactoring process. Presumably any developer's thoughts about how to go about refactoring — once you know that some change is needed — depend on some combination of the specific signal that change is needed, the set of all previous refactorings, and contextual information that allows the developer to filter previous refactorings into probably relevant/probably irrelevant given signal and context (e.g. tech stack, familiarity with this particular codebase, general knowledge of the developer community involved).
So we'd like to learn more about developers' general reasoning behind specific refactoring types — but abstracting from the particular refactoring stimulus.
For example, when my browser running a Ruby on Rails app waits some amount of time that 'feels too long' given the kind of database access I imagine the given read would require and given the commonality or general maturity of the technical and business domains (e.g. I expect a price lookup in a shopping card to run faster than a >2-hop social graph traversal, making assumptions about server power given guessed userbase), I knee-jerk toward: inefficient use of ActiveRecord b/c ORMs are too easy to use and not to optimize for less-common problem domains; poor caching b/c Rails programmers are pretty far from the metal; and maybe you're still using SQLite? Probably most of these guesses aren't super likely to be true, but the prospective refactorer needs to start somewhere.
So we're curious about your thoughts broadly speaking: why do you refactor? What are you trying to accomplish when you rename a class or inline a method or push an attribute down the class hierarchy?
Let us know and we'll share the results here.
Opinions expressed by DZone contributors are their own.
Comments