Few programmers explicitly intend to write poorly structured source code.
They don't sit down, whip out their Bad Code Design Patterns book, and wreak meticulous spaghettipocalypse. Rather, poorly structured code is what happens when programmers don't know what they're doing.
Figure 1: Two Java package structures: one well-designed, the other, not so much.
So, why is this difficult?
Source code has many properties — and of different kinds.
One property, for instance, is the number of public methods in a program. Programmers easily control this property: making a private method public increases the number by 1. And that's it. In a sense, this is a "Linear" property, in that small changes produce small effects.
Structure also represents a source code property, but calling method b() from method a() does not only affect those two methods. New transitive dependencies form from all methods depending on a() and on all methods depended upon by b().
Furthermore, Java has three structural levels: method, class, and package, and method connections need not affect method-level alone. If the owning classes had not been connected, then new transitive dependencies pop into being on class level, too. And similarly on package level. Structure thus represents a "Non-linear" property, in that small changes may trigger large consequences.
This non-linearity makes writing well-structured programs hard.
It would be helpful if we could forget about this grand, over-arching structure and focus instead on small, linear properties that somehow magically lead to well-structured code.
Alas, no such linear properties exist.
But there are hidden clues.
Because source code properties are objective, we can measure them. We can certainly count the number of public methods in a program, and hosts of other linear properties besides. We can also measure the "Messiness" of a program via the structural disorder, a percentage which rises as source code structure decays. If we measure over a large number of programs, we can then calculate the mathematical correlation between structural disorder and all those other properties.
A negligible correlation would imply no connection between a particular linear property and overall program structure. A large correlation, however, suggests that careful management of that linear property may contribute towards overall well-structured awesomeness.
For example, if structural disorder correlated 100% with the number of public methods, then we might suggest minimizing the numbers of public methods in order to minimize structural disorder, thereby using a simple, linear property to control a difficult non-linear one.
Let's give it a whirl.
Let's blitz 4 million lines of code from 38 Java systems1 in a code analyzer and get correlatin' over dozens of its structural properties. Table 1 shows the strongest structural disorder correlations discovered2 (full matrices: method, class, and package).
|Average circular dependencies||0.62||0.21||0.58|
|Average impact set||0.59||0.35||0.42|
|Average impacted set||0.56||0.35||0.42|
|Average transitive dependencies||0.56||0.24||0.35|
|Average transitive dependency length||0.64||0.24||0.42|
Table 1: Structural disorder correlations with other properties.
Only one property correlates strongly with structural disorder over all three levels: depth.
The depth of a method (class, or package) is just its position in a transitive dependency. In figure 1, on the left, the depth of method a() is 0, the depth of b() is 1, c() is 2, etc. These depths sum to 21. On the right, however, a() still has a depth of 0, but all other methods - being directly called from a() - each have a depth of 1, making the total depth of the right structure just 6.
Figure 1: A deep transitive dependency on the left, and shallow dependencies on the right.
It is this depth total that correlates with structural disorder. The deeper your code, the more disordered it'll probably be. If you want to manage your program's structural disorder, avoid deep dependencies3.
One way to do this is to use a coordinator (sunburst) method, which calls other methods to do the heavy lifting, with the coordinator reduced to a sequencing role. Then repeat this pattern on class- and package-level, where possible.
A previous post introduced four evidence-based principles for code structure, where the justifying correlations were weak but non-negligible.
This post adds a fifth principle, "Manage depth," but with a much stronger correlation, making the list of evidence-based structural principles now:
- Manage Size.
- Manage method Impact set.
- Manage absolute Potential coupling.
- Manage the number of Transitive dependencies.
- Manage Depth.
(Admittedly, not the catchiest acronym.)