The human brain is very good at handling pictures – graphs and diagrams are a quick way of grasping structure. Java source code is full of structure which we programmers have to understand, but the leading visual language we have is UML which was built for hand-drawn diagrams and results in cluttered and unreadable diagrams when reverse engineered. What would you change about UML to make the pictures do a better job of helping you to understand the code?
In answer to these questions here are some reasons UML diagrams make a poor job of Java code visualization along with some suggestions as to how code visualization could made to better serve the Java programming community.
1. Language neutrality
When UML was conceived there was a lot in common between the emerging OO programming languages – and there still is. But the reason we have different programming languages as opposed to having one standard programming language is because ideas shift over time as to what makes a language efficient and expressive. And for the same reason ideas about how to visualize the structure in that language should be allowed to shift with the language. In other words, the modeling language should serve the programming language in each case as needed, and we should stop worrying about trying to unify across different programming languages.
2. Based on hand drawing
If a software development methodology prescribes up-front design with the project stakeholders gathered in one room, then hand drawn diagrams are indeed a rapid and expressive way to communicate and document design. However, if the methodology favors working code over documentation then the design is expressed directly as code by the programmers. Advances in IDEs have helped to make this possible through wizards and refactoring capabilities. So while UML serves its purpose as a visual language for hand drawing this in no way qualifies it as a suitable visual language for reverse engineering.
The visibility symbols of UML are plus (+) for public, minus (-) for private, and hash (#) for protected. If this was an intuitive nomenclature then it would have been adopted by IDE makers. However, the reason for this choice of symbols in UML was, again, ease of hand drawing. Yet it fails on an aesthetic level when glancing down a list of class members especially if displayed with the same font as the member names. Furthermore, IDEs use richer symbols which convey more than just class and member accessibility.
4. Associations between classes
To Java programmers, used to making references between classes using only one mechanism (field members), the notions of composition, aggregation and association take a little more thinking about. Most can quickly appreciate the differences between these three once having realized the roles that object ownership and object lifecycle have in determining the type of association. However, composition is really only a concern in the absence of automated garbage collection: destruction of the owner mandates destruction of its parts. Clues to aggregation and composition in Java are expressed through things like naming conventions, inner classes, the package structure and design patterns.
Because the programming language makes no distinction between association types, then it is difficult for reverse engineering tools to make the distinction and determine whether to put a diamond on the tail of the association line or not, and whether to color it in or not.
From the point of view of static structure analysis we care if a reference in the code is a reference to a single object or multiple objects. Java represents this as a field referencing either a named field pointing to a single instance of a given type or an array of instances to a given type. In other words, relationships are either one-to-one or one-to-many, with no other cardinalities built into the language. Enforcement of other cardinalities normally happens dynamically through checking for null in the case of single objects and through range checking in the case of multiple objects. From a reverse engineering perspective it is a headache to try to determine cardinality ranges and most likely not worth the effort given that cardinality ranges of, say, 3…17, are rarely hard coded anyway.
6. Java conventions
Java programmers care about a range of conventions which help to make code easy to read and the structure easy to understand, but these are simply absent from UML diagrams. For example:
- The Bean Properties naming convention – the presence of get and set methods amount to an association with the type being set or got.
- Checked exceptions – love them or hate them, they are part of the language and it is useful to be able to see them as they are part of the method signature and occupy a dimension of their own.
- Serialization – ask anyone who has serialized a graph of Java objects and they will tell you how important it is to maintain an overview of the boundaries of what is included in the serialization operation.
- Synchronization – not center stage most of the time except for when deadlocks occur and then the need arises to trace through the model looking for possible causes.
- Collections – the cornerstone of object modeling in Java representing one-to-many relationships most of the time yet not inherently viewed as such by UML.
Java programmers find little value in visual representations of their code when the diagrams are ignorant of the constructs and conventions of the programming language.
7. Naming and labelling of associations
UML supports labeling of associations on the canvas next to the association line. On a freeform hand drawn diagram you might well label lines by the addition of free text parallel to the line but at the same time you would limit the amount of text you add, only write in empty space and position the text so that it is clear which line it refers to. This is a problem for automated class diagram generation with a smattering of text around the lines. Java does not need labels on associations as the field name is effectively the name of the association.
8. The assumption of paper
Finally, and linking back again to the theme of hand drawing, UML diagram contains no notion of tooltips, folding, filtering, hyperlinking or any other degree of interactivity which is now the norm for electronic documents. Instead, reverse engineered UML diagrams remain constrained as if paper based. Online maps allow the switching on and off of place names, traffic information and arial photographs and contain links to local services and points of interest. UML in particular, and code structure visualization in general needs to start presenting the wealth of structural information interactively, filtering out most of the information to make the diagram easy to navigate and revealing more detail about one element at a time as the user shows an interest in it.