The Blighttown Corollary
The Blighttown Corollary
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Can an image capture an entire system's structural integrity? Can we tell from a graphic whether a system is well-structured?
Figure 1 shows a spoiklin diagram with method a() calling method b().
Figure 1: Two methods.
A trivial system, but is it well-structured? If update-cost indicates structural quality - that is, if the cost of updates rises as structure degrades - and if ripple effect determines update-cost predictability, then figure 1 must indeed be considered syntactically well-structured because of the ease with which potential ripple-effects may be traced. Basically, programmers can see how changing any part of this system might potentially change any other. By comparison, figure 2 presents these same two methods but reconfigured.
Figure 2: Two methods badly structured.
In figure 2, with straight lines showing dependencies down the page and curved lines showing dependencies up the page, the two methods have become mutually dependent: an update to one could lead to an update to the other, and so ad infinitum.
If we ask whether a graphic can capture a system's structural integrity, then for such tiny systems as those of figures 1 and 2, we can answer: yes. But what of larger systems? Figure 3 shows a more realistic case, with many more interconnected methods.
Figure 3: Exhibit A.
Examined closely, the figure shows that despite the increased number of methods, structural integrity remains intact.
For example, let us look at the impact set of various methods. The impact set contains all the methods that depend - directly or transitively - on a particular method, and thus reflects the potential ripple-effects of a method's change. Figure 4 shows the impact set of the Predicate() method (actually a constructor), deep down in the structure where problems - should they exist - most likely fester. Though this line of dependencies streaks all the way to the top, it nevertheless remains easily traceable and such traceability, as mentioned, is the very hallmark of good structure.
Figure 4: Easily traceable impacts.
Figure 5 shows the impact set of the select() method. Despite having two ripple-effect paths up towards the root method, the paths taken appear again easily traceable.
Figure 5: Further sunbursts.
Now, however, let us examine another system, see figure 6, one comprised of an almost identical numbers of methods as that of figure 3.
Figure 6: Exhibit B.
Take a moment to compare this with the previous system of figure 3 (reproduced below for convenience). Which has the better structure? And why?
Figure 3: Exhibit A.
Figure 6 contains some curved lines, dependencies that run up the page. No mere aesthetic, these sometimes represent circular dependencies, a type of dependency which renders the exercise of structural reasoning - precisely what we did with figures 4 and 5, in which we tried to plot the course of hypothetical ripple-effects - far more difficult. Figure 6 does not have many curved dependencies - perhaps only 4 - yet because ripple-effects gush through not just dependencies but transitive dependencies, these few dependencies bend the entire system into fearful intricacy.
Take the getLinkedFixtureWithArgs() method highlighted in green in figure 7. Lying below the tangle, we might fear a rather complicated path to reach the top of the diagram, and as the diagram shows, these fears seem well-founded.
Figure 7: Unpredictable paths.
Not only might ripple-effects from getLinkedFixtureWithArgs() traverse the entire tangle, they might pop up at multiple root methods at the top of figure 7. (Of course, having multiple client methods represents duplication reduction and not necessarily poor structure; the benefit of duplication reduction offsets but does not neutralize the cost of ripple-effects.)
As another example, consider the green at() method identified in figure 8. True, this lies in the center of the tangle, but we might hope at least that its dependencies filter up the page. The diagram shows the unfortunate extent of its surprising influence.
Figure 8: A truly expensive method to change.
Different graphical algorithms will display the system differently, but the mutual dependencies of this system, and hence the difficulty with which dependencies may be traced, are properties of the system itself, not of the representing images. Nevertheless the hypothesis still holds: graphical representation can identify structural quality, good and bad. But what of even larger systems?
Consider the system in figure 9, with almost 150 methods compared to 25 of the previous example (the methods names have been removed to reduce clutter).
Figure 9: A system with 150 methods.
Is this system well-structured? Most programmers would struggle to answer this question on the basis of the diagram alone. Yes, we can delve into details, for example examining the impact sets of sample methods, as in figure 10. But our initial question asked whether we can derive an entire system's structural integrity from a single graphic and here any such insight seems buried under sheer weight of method.
Figure 10: A specific method's impact set.
Perhaps we are examining the system at too low a level? In Java, methods reside inside classes, so perhaps we should examine this system on class level rather than at method level to try to ascertain its structure. Figure 11 shows the system's class level diagram.
Figure 11: The same system at class level.
A far cleaner structure has emerged. In figure 11, every circle represents a class (rather than a method, as in all previous diagrams) and each straight line represents a dependency from class above to class below. Now we see that, at class level, the system is well structured, its dominant feature a classic sunburst formation. No circular dependencies or unruly tangles blot the vista. Tracing dependencies between classes becomes once more an easy task, as can be seen from analyzing even the deepest class, see figure 12.
Figure 12: Class-level impact set of the deepest class.
So does good structure on class level imply that the structure on method level in fact was also good if only we could have seen it? What is the relationship between class-level structure and method-level structure? And which scale defines the overall system's structural quality?
How methods map to classes.
The relationship between classes and methods is that of containment. If we consider methods to be the lower level and classes to be the higher, then in mapping methods to classes - we shall call this the, "Level-up mapping" - all methods that belong to the same class collapse to a single node on the higher level. Similarly, all dependencies between methods within the same class vanish entirely; on class level only dependencies between methods of different classes remain. Lastly, a dependency from class A to class B is formed by any number of methods of A that depend on any number of methods of B.
Astute programmers will observe that the fundamental nature of a level-up mapping is one of reduction. As methods coagulate within classes, the number of nodes falls. The higher level can never harbour more nodes than the lower nor can it harbour more dependencies.
This allows the derivation of two interesting heuristics. Consider the four methods of figure 13, arranged in three transitive dependencies.
Figure 13: Four methods waiting to be class-ified.
Now suppose we group these four methods into two classes, ClassX and ClassY, such that a(), b(), c() and d() go into ClassX and a2() goes into ClassY, as shown in figure 14.
Figure 14: Four methods captured in two classes.
How does this system look when viewed up at class level? Figure 15 reveals the great un-surprise.
Figure 15: Class-level view of figure 14.
Because all the class-internal methods and dependencies vanish, the system when viewed at class level appears much simpler.
A little thought shows that the number of transitive dependencies of system's higher level is almost always lower than the number of transitive dependencies of its lower level. And the length of these transitive dependencies is also almost always less on the higher level. Transitive dependencies being the paths along which ripple-effects do their damage and update costs being the prime indicator of structure, we can conclude that the structure of higher levels is almost always better than the structure of lower levels. Structure tends to improve with height.
Of course this does not always hold. A programmer reviewing figure 13 might decide to put method b(), instead of method a2(), into ClassY yielding the alternative method allocation shown in figure 16.
Figure 16: An alternative class allocation.
If so then the class-level structure would look as in figure 17.
Figure 17: A higher-level circular dependency.
That is, ClassX.a() calls ClassY.b() which in turn calls ClassX.c(), hence creating a circular dependency at class level that does not exist at method level. Such cases must always remain a caveat to our analysis - no graphical analysis can be conclusive - though a trawl of real-world systems reveals that few systems contain significant quantities of these beasts (and graphical tools can easily identify them where the turn up). That structure tends to improve with height remains statistically valid.
Furthermore, that methods vanish on class level does not remove them from reality. No change of perspective reduces update costs. But when a class bulges with horribly interconnected methods then at least those methods are confined to that class; programmers can bring proven cost-reducing tools (such as information hiding) to bear on them. This goes some way to justifying the structural improvements generally seen on class level.
Thus if structure tends to improve with height, then the structural quality of the higher level can be seen as the upper limit of the structural quality of the lower. This leads to the moderate pessimism of the Blighttown corollary: whatever the structural quality, it's probably worse underneath.
Applying this Blighttown corollary to real-world systems takes us the final few steps to the answers we seek.
Very large systems.
Figure 18 shows the method-level spoiklin diagrams of two real-world systems.
Figure 18: Method-level diagrams of two real-world systems.
Although a surgical analysis of particular methods would undoubtedly prove useful, these figures do not give the programmer an overall impression of structural quality. Both systems built from thousands upon thousand of methods, the diagrams contain simply too much information.
This presents a clear instance of a graphical representation's not being powerful enough to convey useful structural information.
So might it help then to go up a level? Figure 19 shows both systems again, but on class level, where each circle represents a class (or interface).
Figure 19: Class-level diagrams of two real-world systems.
Once again, the amount of information in both diagrams renders impossible an overall evaluation of either system's structural quality. It simply is not obvious whether these systems are susceptible to ripple-effects and costly updates. Both systems may be well-structured and cheap to update, but this information is lost amid graphical noise.
Perhaps going up yet another level might help? Figure 20 shows both systems again, but on package level, with each circle representing a package and each line a dependency between packages.
Figure 20: Package-level diagrams of two real-world systems.
Suddenly, useful structure crystallizes from chaos.
Now the programmer can clearly see that tracing the dependencies running through the system on the left remains difficult: this is a poorly designed package structure. The system on the right, however, allows easy tracing of dependencies and potential ripple-effects: with all lines running straight, it is clear, for example, than any changes to the line of packages towards the top (by far the largest proportion of the system) have a low probability of impacting the rest of the system. This is a good package structure.
But does a good package structure imply a good system structure? We must assume that by, "System structure," we mean structure on all levels: method, class and package. Yet for large systems, graphical representation can only suggest overall structural quality on package level, the lower levels offering sensory overload. Has scale finally defeated interpretation? Well, the Blighttown corollary applies not just to method and class levels but to all levels. Crucially, it allows us to relate levels to one another.
The Blighttown corollary says that given the good package structure on the right, the class and method structures underneath are probably worse, but this really tells us little. It is not especially useful to know that something is worse than good (though knowing that good structure exists on package level informs us that a good system structure is at least possible).
Given the poor package-structure on the left, however, coupled with the Blighttown corollary's predicting that class and method level structures are probably worse, we can say that such an image bodes ill for this system. If the package structure is bad and the lower levels are worse, then they must be bad indeed. This system's entire structure must be suspect.
So, can an image capture an entire system's structural integrity? Well, no, not conclusively. But the image can place probabilistic constraints on the structure's quality; and if that image contains the highest-level structures of the system, then the image can certainly suggest an expected degree of quality for the whole system.
The Blighttown corollary highlights the importance of a good package structure, as this structure will probably constrain the quality of the entire system's structure. Little can save the method and class levels when the package level degrades. Indeed, a Java software shop without a clear strategy for managing package dependencies may find updates puzzlingly expensive no matter how many refactoring cycles they endure. Sometimes the roof determines the foundations.
If you want a well-structured system, look up.
Opinions expressed by DZone contributors are their own.