Closing the code-architecture gap in legacy systems
Simon recently talked about the gap between Software Architecture and Code and how to close this with architecturally-evident coding. He's also creating tools to allow Software Architecture to be expressed as code.
If you're working on a greenfield project then including annotations to help with navigation is a great solution but what if you've inherited a large system with a model-code gap? Or if you only realise, sometime into a project, that you lack a model to help you understand its growing complexity? Well, Simon also had some thoughts on scanning Spring annotations to provide this data. This works quite well and it got me thinking about other artifacts in code that can be extracted for these diagrams.
(In more formal terms - for those of you that like to quote ISO42010 - we are trying to extract architectural elements from code that can be displayed within architectural views. Of course the elements may be from a variety of differing abstractions, levels and granularity and therefore need to be placed within differing views.)
So what can we extract from a current/legacy system to give us a view into it? Some suggestions include:
AnnotationsAs already suggested, dependency injections systems such as Spring provide some annotations that can be extracted to give a basic, high level model. Annotations are also present in Java EE applications and other enterprise frameworks.
XML DI Configuration filesMany (legacy) Spring projects use xml configuration files to define beans. Having scanned a few examples this seems to create a relatively low level model which would need some manual tweaking after generation. With sensible naming convention for beans you can produce models for a desired abstraction. The bean properties indicate the connections between these elements.
Module Bundling SystemsModular systems such as OSGi define bundles of components and services including lifecycle and service registry. The deployment information should provide a high level overview.
PackagesIf you have used 'package-by-component' then your packages will relate one-to-one with your components. The links between components should be identifiable by the links between the classes within them (the has-a relationships). If you have package-by-layer then this is much harder or impossible to use. Experience tells me that most real-world systems are actually a combination of the two so you should have some useful information.
Class NamesIt's very common for class names to contain a strong indication of their role e.g. XyzService, XyzConnector, XyzDao, XyzFacade etc. Scanning for known patterns should identify the element names and roles.
Interfaces and class hierarchiesIf you implement interfaces (or extend base classes) then the interfaces used may show the abstraction level and type e.g. implementing Service, DAO, Connector, Repository etc
Delegation or Library DependencyShared delegates used by a set of classes/functions may indicate their purpose. e.g. components delegating to a database utility might indicate a DAO component or using a CORBA utility might indicate a service. This is likely to be time consuming as you need to identify and scan for each delegate you are interested in.
Comments/Javadoc/JSDoc/NDoc/DocletsComments and javadoc style API comments can provide a large amount of meta-information about a class or package. In fact, many UML modeling tools enrich code using custom comments and tags. This has the advantage of not affecting the compiled code or introducing library dependencies but may not be consistently used.
TestsTest can provide a lot of meta-data about your system. Unit tests tend to be concentrated around important classes and often construct entire components to test. Simply extracting the names of classes that are directly tested will produce a useful list of components. The higher level systems tests will reveal the important services.
Build SystemsBuild systems such as ant, maven, NuBuild etc all have hooks into the code base for building and deployment. A simple extraction of the build targets will give you the deployment modules (which is a very helpful view for operation teams). This may give you the required information for a Containers view.
What do you think?Of course all of the above is very dependent on your codebase but if none of the above works then you have to question the quality and structure of the code! The data extracted may need filtering and manual correction as it won't give you exactly what you want. You might consider creating structurizr annotations using an initial scan and then maintaining them. One of my tasks for the next few weeks is to try this out on some legacy codebases I maintain.
What other ways of identifying architectural elements can you think of?