Dependency Analysis and the Modularisation of Java Programs

DZone 's Guide to

Dependency Analysis and the Modularisation of Java Programs

· Java Zone ·
Free Resource
Everybody agrees that modularization is good, but how do we go about transforming a big ball of mud architecture to something like OSGi?




Most experienced developers have experienced some form of system rot: the quality of a program deteriorates over time and it becomes more and more expensive to update and maintain it. This is often caused by poorly managed dependencies. Dependencies between software artifacts (classes, packages, functions,.. ) are created when an artifact is referenced by another artifact, for instance, when a method of a class invokes a method defined in another class. These dependencies are then propagated to other units of code: a package depends on another package if there is a class dependency between classes in the respective packages etc. Dependencies can become problematic when they are created to solve short-term problems but bypass rules defined as part of the system architecture such as “the persistency layer should not depend on the presentation layer”. These dependencies then become technical debt that starts piling up.


And eventually, this will become a problem. Remember the late 90s when everybody wanted to port their applications to the web. That was easy to do for applications with a clear separation between user interface and logic layer, and difficult to impossible for applications where the logic depended on a particular user interface (usually a desktop UI). Now Java programmers face a similar situation. There are many use cases that require modularity - creating plugin ecosystems around products, product lines and the ability to make incremental updates to name a few. And there are several great platforms for modularity available, in particular OSGi and its extensions ( Eclipse, declarative services, Spring dynamic modules). But all of these platforms have strict requirements when it comes to dependencies. A common theme is that these frameworks have containers to manage dependencies automatically. This requires that programmers adhere to the following two principles:


package separability - dependencies between different packages should be minimised so that packages can be deployed in different modules. In particular, there should be no circular dependencies between packages.


interface separability - dependencies between abstract classes and interfaces and their implementing concrete types should be minimised, so that abstract types and implementation types can be part of different modules. This facilitates the compatibility of different implementations and makes it easier to replace a particular implementation within an application.


The question arises how existing applications can be refactored to modular designs based on one of these platforms.


The State of Affairs


To answer this question, we have investigated a large set of open-source Java programs (the qualitas corpus) in order to find out how many of those programs suffer from dependency related problems. The short answer is: almost all of them. In this experiment, we checked the dependency graph extracted from the respective program for instances of the following antipatterns which compromise package and interface separability, respectively:


  1. strong circular dependencies between packages (CD): dependency chains starting in a package A, traversing some other packages and the returning into A. This is a strong version of circular dependency caused by one reference chain that creates the package dependencies. In particular, this pattern cannot be broken by splitting packages.
  2. strong circular dependencies between jars (CDC): dependency chains starting in a jar A, traversing some other jars and the returning into A.
  3. subtype knowledge (STK): supertypes (classes or interfaces) (indirectly) referencing their own subtypes.
  4. abstraction without decoupling (AWD): classes referencing both abstract types and their implementation types.
  5. degenerated inheritance (DEGINH): multiple paths from subtypes to super types (in Java, this is possible because interfaces support multiple inheritance).


Surprisingly, almost all programs analysed were ripe with instances of these patterns.


Here are some examples. In tomcat-7.0.2, there is the following circular dependency between jars (CDC):


  1. org.apache.catalina.ha.context.ReplicatedContext in tomcat-catalina-ha.jar
  2. depends on org.apache.catalina.core.ApplicationContext in tomcat-catalina.jar
  3. depends on org.apache.catalina.Service in tomcat-catalina.jar
  4. depends on org.apache.catalina.startup.Catalina in tomcat-catalina.jar
  5. depends on org.apache.catalina.ha.ClusterRuleSet in tomcat-catalina-ha.jar


According to the Tomcat documentation, catalina is the servlet container, and the ha package/jar contains cluster functionality. This means that Tomcat, even when used without clustering, depends on cluster functionality being available.


tomcat jars and their relationships


tomcat jars and their relationships (click here to explore the dependency graph)


Dependency chains traversing several packages are even more abundant. For instance, the OpenJDK (both versions 6 and 7) contains such a chain linking AWT (java.awt) and Swing (javax.swing). The critical edge is a reference to javax.swing.JComponent in java.awt.Component. This tightly couples the two alternative toolkits together and makes it impossible to deploy them separately. This implies that an application that only uses the older AWT will also need Swing to run! This reminds me of the situation in the late 90s when Swing was added to the JDK ( version 1.2 in 1998), Internet Explorer became the dominant browser that only supported Java 1.1 and suddenly users had to install a rather large browser plugin to run applets. I think it is safe to say that this killed Java in the browser and almost killed Java as a language before it had a strong comeback on the server side. A particular problem was the large size of the plugin (remember that the internet was slow back then and many people were still on dial up). But given the dependencies between the toolkits, even if you were using only AWT you had to download (the much larger) swing as well!


9637   if (Component.isInstanceOf(this, "javax.swing.JComponent")) {
9638      if (((javax.swing.JComponent) this).isOpaque()) {
9639         states.add(AccessibleState.OPAQUE);
9640      }
9641   }
Reference to javax.swing.JComponent in java.awt.Component (click here to explore the dependency graph)


A common point made here is that there might be another reason, not known to the outsider, for having this dependency here. Or, in other terms, that a behaviour-preserving refactoring that breaks this dependency is just not possible. But this is definitely not the case here: in the alternative Apache Harmony implementation of the JDK, this dependency is missing.


The following table shows the size of some programs in terms of their dependency graph (nodes are classes, edges are relationships), and the number of antipattern instances found.


system classes (nodes) dependencies (edges)
OpenJDK JRE1.6.0_05-b13 16877 170140
azureus- 6444 35392
jruby-1.0.1.jar 2093 11016
hibernate-3.3.1.jar 1700 10093


system AWD CD STK
azureus- 9415 335 290
jruby-1.0.1.jar 2508 32 87
hibernate-3.3.1.jar 2680 74 224


Many programs also have a large number of both class and package tangles. In mathematical terms, tangles are strongly connected components - every artifact in a tangle (directly or indirectly) depends on every other artifact within the tangle. Software engineers often refer to tangles as “big balls of mud”. An extreme example is azureus- with a package tangle consisting of 373 packages and a class tangle consisting 2698 classes!


Packages inside the large (373) package cluster in azureus-


Packages inside the large (373) package cluster in azureus- (click here to explore the dependency graph)


Detecting Dependency-Related Problems


There are a number of tools that can extract, display and analyse the dependency graph. There are two approaches to detect critical dependencies: metrics and patterns.


Metrics associate artifacts and their relationships with numerical values expressing quality. The classical example is distance from the main sequence (D) - a metric for packages. It states that abstract packages should have relatively many incoming dependencies (high responsibility), while concrete (implementation) packages should have relatively few incoming dependencies but will depend on other packages (high instability and low responsibility). The classic tool to detect the D metric is JDepend. JDepend computes several package dependency metrics, and can be easily embedded into IDEs and build scripts. It also supports the detection of circular dependencies between packages.


Some of the general (anti-) patterns are described above. Some tools also allow the user to define project-specific patterns, often relating to application tiers and their dependencies. For instance, if the persistence layer must not depend on the presentation layer, dependencies between classes in the respective layers become antipatterns.


Systematic and scalable pattern analysis is not well-supported by any existing tool, and was reason we started to develop our own set of tools. This tool is based on the GUERY graph query library we have developed and open-sourced. The Massey Architecture Explorer is an HTML5-based front end using this library.


There are well-known scalable algorithms for detecting tangles (such as Tarjan’s algorithm) and many commercial tools support tangle detection. Free tools that support tangle detection include Google’s CodePro Analytix (formerly Instantiations) and the Massey Architecture Explorer.


Another good free tool that extracts and visualises the dependency graph using an UML-like notation is Programmer's Friend Class Dependency Analyzer (CDA). Commercial programs that can be used to analyse and visualise dependencies include Lattix, SonarGraph for Java and Structure 101.


Refactoring Dependencies


While there is some research into automated architectural refactoring, it is not clear whether this will result in robust tools anytime soon. For now, reorganising dependencies has to be done manually. Refactoring has to address two problems:


  1. Which dependencies should be removed?
  2. How can these dependencies be removed?


Patterns and metrics will provide some guidance to locate the dependencies to be removed. They will help to identify important and critical dependencies. To measure the importance of a dependency, standard network analysis metrics such as edge betweenness can be used. However, important dependencies are not necessary critical dependencies. A possible approach to find critical edges is to measure in how many antipatterns a dependency participates. To illustrate this approach, consider the following program:



There are several antipattern instances in this design: two circular dependencies (between packages 1, 2, 3 and 2,3) and a subtype knowledge instance (B indirectly uses its subtype A). The dependency “B uses A” is part of all three antipattern instances, and therefore has a antipattern score (apsc) of 3. By removing this dependency, all antipattern instances disappear. We have performed some experiments that show that this approach is promising: by removing a small number of dependencies most antipattern instances disappear, reflecting a much better modular design of the refactored system. The Massey Architecture Explorer computes betweenness and antipattern participation score for all dependencies.


The second problem, how to break dependencies, is harder. There are several refactoring patterns that can be applied:


  1. Type abstraction. For instance, a method parameter type java.util.ArrayList can often be changed to java.util.List or even java.util.Collection without breaking the code. This may break the dependency to an implementation type (java.util.ArrayList in this case). This refactoring requires that only members of the subtype that are also defined in the supertype are referenced. Type abstraction is potentially recursive (for instance, if references to the parameter are leaked to other methods), and verifying pre-and post conditions can be tricky.
  2. Use dependency injection (DI) or a service locators (aka service registries). For instance, consider the following code snippet: java.util.List list = new java.util.ArrayList(). Using dependency injection, the value of list is set by a DI container at runtime and the class would not depend on java.util.ArrayList. A service registry works similar - the class would ask the service registry for an instance of java.util.List, avoiding a direct reference to java.util.ArrayList. There are various DI frameworks available such as Spring and Guice. Examples for service registries include the Eclipse extension registry and the java.util.ServiceLoader utility that is part of the JDK. Many Java APIs have custom built-in service registries to minimise dependencies on particular implementations. Examples include the JDBC driver manager, the JAXP pluggability layer and the JNDI service provider interface.
  3. Relocating classes and packages. Sometimes, patterns such as circular dependencies are caused by classes being in the wrong package and packages being in the wrong jar. In this case, a straight-forward “move class” refactoring (supported by many IDEs) can be used.
  4. Inlining. Finally, inlining can be used to move or copy parts (=members) of a class that is the target of a dependency into the class that depends on these parts. This only works if those parts are not coupled to other parts of the respective artifact (class or package). Copying creates redundancies and should be used with care.


In general, architectural refactoring is complex and requires great care. It is sometimes not straight forward to verify preconditions that should be satisfied before a refactoring is performed. This is in particular the case if dynamic programming techniques such as reflection, multiple classloaders, dependency injection or aspect-oriented programming are used.


After each refactoring, postconditions should be checked. This includes the following steps:


  1. Check whether the program can still be compiled and built.
  2. Check whether the refactoring was behaviour-preserving. Usually this is verified by running tests. This is easier if the program has a high test coverage.
  3. Check whether the architecture has actually improved by reassessing this using patterns and metrics as described above.


Once the dependencies have been refactored, modules can be built. This also requires the definition of modules in build scripts, and the definition or generation of module meta data. There is some tool support emerging in this area such as Spring Bundlor and BND.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}