A return to the wellspring.
To W.P. Stevens, G.J. Myers and L.L. Constantine, programmers owe much gratitude. Authors of the inexhaustibly minable 1974 paper Structured design, they revealed the fundamental mechanism of ripple effect thereby pouring welcome concrete into the rather shallow foundations of software engineering. The paper also introduced to the world the blushing concepts of coupling and cohesion, concepts lauded ever afterwards. Where the French revolutionaries had, "Liberté, Egalité, Fraternité," programmers would rush into scrum meetings roaring, "Loose coupling and high cohesion!" But there is a problem. Taken individually, both concepts appear sound; taken together, however, they confess worrying inconsistency. A previous post having already examined coupling, this post looks at cohesion before undertaking a side-by-side evaluation.
Two pairs of consecutive lines from the great work present the concept:
"Coupling is reduced when the relationships among elements not in the same module are minimized. There are two ways of achieving this - minimizing the relationships among modules and maximizing relationships among elements of the same module."
"Binding is the measure of the cohesiveness of a module. The objective here is to reduce coupling by striving for high binding."
(The paper avoids the term, "Cohesion," preferring, "Cohesiveness," but this post uses the more commonplace term.)
In fact, though the authors' later works would, the paper fails to advance an actual definition of cohesion. The above is as close as it gets. Whatever form cohesion takes, binding - another nebulous term - is its measure. This vagueness might not strike as critical given the wealth of subsequent verbiage birthed by the paper, yet a darker idea lurks within the excerpts above: for what can it mean to, "Maximize relationships among elements in the same module?"
As mentioned before, any review of ancient texts must tread a cautious path between monkish reverence and interpretative flaccidity. The, "Module," of the paper, for example, reads as though limited to the notion of a method or function: "The term module is used to refer to a set of one or more contiguous program statements having a name by which other parts of the system can invoke it." Nevertheless, license allows that, considering both the orders of magnitude by which software systems have inflated in the intervening years and the consequent structural innovations necessitated, programmers can justify expanding this, "Module," term to include its modern up-gunned equivalents, with Java, for instance, offering a hierarchical organization of program statements into methods, methods into classes and classes into packages (not to mention jars and bundles).
Presuming, then, that cohesion applies to classes, packages, etc., two questions arise. Which cohesion? And in what sense?
For the paper famously separates binding - and by implication, cohesion - into its six different strains, from the weakest, "Coincidental," to the strongest, "Functional." A tour of this spectrum lies beyond the scope of this post but because functional binding stands as the ultimate classification and its corresponding cohesion the goal towards which structured design claims to strive, only functional binding is considered here. The paper defines this via, "In a functionally bound module, all of the elements are related to the performance of a single function." (The single responsibility principle was alive, well and probably hitting the disco twice weekly in the early seventies.)
In which sense, then, are these elements, " ... related to the performance of a single function?" What is the nature of this relationship? Source code being textual, this relationship may be either semantic or syntactic. The source code for a flight control system, for example, might gather in a single package all classes related to the raising and lowering of the landing gear. This package would then appear semantically functionally bound and hence of high cohesion. The paper suggests, however, the insufficiency of semantic relationship by discussing what it calls, "Logical binding," its term for binding based on semantic relatedness. The paper defines this binding as that which, " ... implies some logical relationship between the elements of a module. Examples are a module that performs all input and output operations for the program or a module that edits all data," with the paper reaching the slightly alarmist conclusion, "In short, logical binding usually results in tricky or shared code, which is difficult to modify, and in the passing of unnecessary parameters." The case is not strong, but this hints at the need for a thicker glue, that of syntax, whereby the classes of a package must syntactically depend on one another before being considered functionally bound.
Consider figure 1, showing a package of classes from the recently reviewed FitNesse.
Figure 1: Package
This package displays strong syntactic functional binding and thus high cohesion. The class
ConverterRegistry clearly depends on almost all others and so it would be hoped that this class in some sense coordinates the others in the service of the package's singular purpose (whatever that may be). The two straddling classes,
ConverterRegistryTest$CustomerConverter, hardly divert from this goal. This seems, in isolation at least, a well-structured package.
instructions offers another good example, see figure 2.
Figure 2: Package
In figure 2, two classes hold court,
Instruction$1, both presiding over clear dependencies engaging the rest of the classes in the package. Again, with no mutinous sub-divisions seceding from the greater mass, this package enjoys functional cohesion. On this issue of mutiny, programmers can usually spot a package that lacks functional cohesion precisely because such packages host independent constellations of classes. Take figure 3, which shows the small and apparently well-structured
Figure 3: Package
Nevertheless this package lacks functional cohesion because several of its classes have sloughed off from the main group, making possible a significant decomposition without breaking any dependencies, see figure 4.
Figure 4: Package
fitnesse.http split into two independent parts.
Figure 4 shows the same classes and dependencies as figure 3 but graphically re-arranged to show the largest connected group to the right, leaving behind those disconnected classes of which the main group makes no use. Here then a package has been caught performing more than one function, the penalty for which should be a brutal refactoring just to keep other packages in line. This all smacks, however, of the graph concept of connectedness, a perfectly good concept in its own right for which no new term - such as cohesion - would seem required. This digression, though, concerns packages of low cohesion and these are not our primary quarry. We must return to packages of high cohesion.
The maximization problem.
So, packages can certainly lack cohesion but can they have too much cohesion? Recall that the paper does not advise to reduce coupling by merely clarifying the relationships among elements of the same module, it speaks of, "Maximizing," the relationships among elements in the same module. What might this mean? It cannot mean the arbitrary sharing of superfluous dependencies between all classes in a package; this would be silly. Might it instead mean that an increase in the number of essential dependencies between classes within a package indicates better cohesion? Consider figure 5, showing another FitNesse package, but this one poorly structured.
Figure 5: Package
Figure 5 shows the package
tables, one in obvious distress. The classes of this package, unlike those of the previous, hang snared like gasping fish in a great trawl of dependencies. About this confused tangle of relationships few clear statements can be made. Few statements, that is, save one: certainly more dependencies stretch between classes here than between those of the previous packages. So does this mean that the
tables package has a higher cohesion?
Most programmers, surely, would think not. Or if they did so, they might say that, yes, perhaps this package has more cohesion but then cohesion itself has its limits, beyond which it delivers no structural benefit and may even drag a system into structural debt. This latter opinion might find itself rejected, however, on the basis of the sheer unpopularity of the battle cry, "Loose coupling and sufficient cohesion."
The distinctness problem.
If figure 5 does not have a higher cohesion than previous examples, programmers must ask why. Why is maximising the relationships among elements in the same module - in this case, classes in the same package - undesirable? The answer lies in ripple effects, " ... where changes in one part cause errors in another, necessitating additional changes elsewhere, giving rise to new errors, etc." If the elements within a module are too tightly inter-connected then predicting the cost of change becomes difficult because a change detonated in any element may explode, consuming many others.
This, however, presents another problem, for this desire to minimize potential ripple effects has surfaced before, it being the sole motivation for coupling. Yes, coupling is defined as the strength of association between - rather than within - modules but this does not restrict ripple effects to the realm of the inter-modular. Ripple effects respect no boundary, ploughing into and through packages as much as between them. Given a transitive dependency of three classes,
C, a change to
C may potentially impact both
B irrespective of whether the three classes reside in one package or three. Imagining transitive dependencies dissected into those parts that fall within a package and those that fall without may offer solace to over-worked programmers groping to understand a small slice of a system but does not curtail the transitive dependencies themselves.
And it is in terms of ripple effects that figure 5 differs from its forebears: tracing potential ripple effects in figure 5 has become far more difficult - and hence far more potentially costly - than in previous figures. So here we have both coupling and cohesion serving the same master and towards the same ends: ripple-effect minimization. Why, then, have two distinct concepts instead of one?
The perspective problem.
A further complication arises, that of perspective. Coupling and cohesion have absolute meaning only in systems that admit just two individual levels, that of containing module and that of contained element. If programs only had, "Contiguous program statements," (elements) within methods (modules) then coupling would exclusively apply to method-to-method dependencies and cohesion exclusively to those executive dependencies between statements within methods. But that world, if ever it existed, has gone forever. Java's hierarchical containment structures, as mentioned, tower far above the mere method. Looking again at figure 5, we see that it portrays the cohesion of the
tables package because, considering a package to be a module of elements (in this case, classes), and according to the concept of cohesion, the figure presents the relationships among elements within the
tables module. Yet if a programmer considers a class to be a module of elements (in this case, methods) then figure 5 simultaneously shows a coupling diagram where, according to the concept of coupling, we can see the associations established by connections from one module to another.
Whether figure 5 portrays coupling or cohesion depends not on inherent properties of the packages or classes displayed but on viewer perspective. So it makes no sense to apply different rules for evaluating different levels within a containment hierarchy when a module at one level is also an element on the lower level but seen from a different perspective. This, alas, appears to be the goal to which the mantra, "Loose coupling and high cohesion," leads, offering no guidance on the levels to which each part applies and instead inferring application to all. Yet how do we achieve, " ... minimizing the relationships among modules and maximizing relationships among elements of the same module," when those relationship-maximized elements are to be the relationship-minimized modules of the level below?
Ripple effect the Leveller.
Such problems undermine the validity of coupling and cohesion but do not in any way call into disrepute hierarchical containment per se, such containment enabling the indispensable property of encapsulation. Encapsulation does not and cannot limit actual transitive dependencies, instead by information-hiding elements within boundaries it reduces the targets to which transitive dependencies can spread, thereby limiting future potential dependencies; encapsulation is dependency management done at the appropriate time: well in advance. Nor does the underlying threat of ripple effects diminish whatsoever: unpredictability of change cost remains the twenty-first century's most ferocious digital gorgon. If anything, abandoning the distracting concepts of coupling and cohesion only emphasizes the dread importance of ripple effects.
For ripple effects, there is only structure. Given that a structure is a set of elements and their inter-relationships, then Java source code boasts many levels of structure: method-level, class-level, package-level, etc., but over such taxonomic detail ripple effects stand aloof, leaping from method-to-method just as virulently as from package-to-package. This holds out the promise of unification, of a single level-agnostic concept that applies universally, whose jurisdiction encompasses entire programs rather than isolated districts. If ripple effects be the enemy and transitive dependencies the vector along which they propagate, then reducing transitive dependency length becomes one sensible strategy (and there are many) in this ceaseless war on costs. Programmers cannot, of course, whip out a measuring tape for each of the countless transitive dependencies charging through a system, but defining depth as the average length of all a structure's transitive dependencies those programmers can avail of the principle of depth, which states merely that structures be kept shallow.
Software systems are big.
Fortunately, examining a small part of a system in isolation can be immensely useful. Combing over the classes in a single package, for instance, ignoring all dependencies entering and leaving that package can provide invaluable insight. But it is a convenient myth and a myth which serves analytical ends only. As soon as this pretense becomes a design factor - as soon as programmers design classes within a package as though they had no connection with the rest of the system when, in fact, they do - then the myth darkens and design crumbles. One cohesive package does not make a cohesive system. No amount of cohesive packages makes a cohesive system. No amount of loosely coupled packages makes a loosely coupled system. With the pieces not adding up to the whole, programmers scratch their heads as yet another, "Loosely coupled and highly cohesive," system dies not in the field but in the finance department, exit wounds on its chest.
The pity of coupling and cohesion is that they are localized analytical measures misused as globalized design ideals. Forged as weapons with which to combat ripple effects, both suffer from a fatal impurity: they build on the flawed premise that ripple effects bang against magical impervious borders capable of perfectly compartmentalizing our source code. Ripple effects, neutrinos of the software world, know no such borders. What can stop them? A thousand light-years of lead will. Short transitive dependencies might. Myths won't.