Being Joe the Software Plumber
Infrastructure is to software what plumbing is to a house. In this article, I'll discuss how to design and evolve infrastructure within a software organization. Just like you shouldn't feel, hear or see the pipes in your house, the best infrastructure is unfelt, unheard and unseen.
Good infrastructure is also odourless
Every software organization should have a group of people who focus on the infrastructure of their applications. These are the plumbers of the software engineering world. Though they are not subjected to the same smells as real-world plumbers, (although that's debatable in some cases), their concerns are similar. This job is not for everyone. It comes with a unique set of challenges. Incumbents must possess equal amounts of technical and political prowess to get the job done. Most importantly though, incumbents must have the desire to toil in anonymity with things that end-users don't see. It's not a glamorous job but if you're like me, you believe that being Joe the software plumber is the most exciting job in software.
What is infrastructural software, anyway?
Infrastructural software is one of those elastic, abstract and meaningless terms tossed around in our industry - like real-time, intuitive and simple. My definition of infrastructure is this: it is what a software layer uses to get its job done. It is the environment in which the business logic lives. But one man's infrastructure is another man's business logic. To a server-side Java developer, Spring is infrastructure. To Rod Johnson, Linux is infrastructure. And to Linus Torvalds, the firmware in the ethernet card is infrastructure. It's all relative. For the purposes of the article, I’ll constrain my definition to something running atop a managed environment (and by managed environment I mean Spring, JBoss and the like).
But wait a minute... why do we need more infrastructure if we are already using a managed environment? If the open-source community has an endless supply of infrastructural software, why do organizations need to build their own? The answer is this: any software organization involved in building a sizable software product must invest in its own infrastructure. This is not to compete against a proven managed environment but rather to build atop it and enhance it. Every application has its own domain-specific issues. Generic, open-source products cannot properly address these concerns, which are very narrow and specific. Thus there is a need for organization-specific infrastructure that simplifies development. Without it, applications quickly degenerate into an unmanageable mess.
Let’s discuss what organizational infrastructure is all about.
Building without a plan
A trait unique to the world of software engineering is that applications are constantly rebuilt, refactored and redesigned. This is partly due to the fast-paced nature of the software business. Unless you are building apps for the government, your development cycles are heavily influenced by time-to-market pressures. It is not unusual to build new applications without a clear idea of how everything is going to fit in and evolve over time.
Similar to the Capability Maturity Model (CMM) that quantifies an organization’s capability level in regards to its business processes, I like to think that organizations and their applications have an infrastructure capability model equivalent. This relates to how much infrastructure exists within an application. It can vary from non-existent to ubiquitous. At the low-end of the spectrum, business logic within an application is built ad hoc and there is no architecture in place to dictate where things should go and how they should interact. This is similar to the code you would end up with if you were given one afternoon to turn an idea into a prototype. You would have little regard for layers, logic abstractions or re-use of classes. Sadly, some applications evolve for years based on that kind of prototype. They never mature from the proof-of-concept stage to an architecture that can evolve and grow in scale. Obviously, this kind of application has a very limited ability to scale beyond a prototype. Yet, some of them live on...
The second level of infrastructure capability offers more hope. Layers and boundaries are formalized between the different areas. Some degree of abstraction layers exists and ideas are repeated throughout the code. For example, there is a common way of writing a persistence layer, and use of logging is formalized. Still, every component has free reign over the invocation thread. This means that even though components share code and ideas, they can still do whatever they want; invoke an inappropriate component, spawn a parallel thread, open an https connection and make a withdrawal from your bank account. While this level promises a slightly brighter future than the previous level, it still has limited feature scalability. There is still too much code duplication which makes it increasingly difficult to add or maintain crosscutting functionality across all components. The crucial piece missing is an end-to-end framework.
The ubiquitous framework
Frameworks are different than passive libraries because they impose a certain way of doing things. Unlike passive libraries, which can be called at any time by any components, frameworks dictate how things are done. They become necessary to any application that has achieved a certain scale. To continue its evolution, it must have a unified, all-encompassing framework.
Frameworks offer many advantages. Not only do they introduce common concepts and lingo between developers, but they standardize a way of thinking and coding. They also maximize code re-use by owning common, repeated logic. They have full control of the call flow and call back business logic only when necessary. Sure a rogue developer can still write rogue code even within a framework but good frameworks encourage and facilitate doing things right and make it harder to do otherwise. More importantly though, frameworks can transparently implement low-level concerns that are orthogonal to their primary functionality. For example, implementing a user-tracing feature, in which a thread traces each component it visits in the call flow on behalf of a given end-user, becomes trivial if handled by a framework that controls the thread end-to-end.
There are countless examples of such frameworks in the Java world. Consider the servlet container that controls much of the transport-level infrastructural concerns and calls back a specific servlet that performs some kind of business logic.
Now that I’ve discussed the various capability levels of infrastructure, let’s see how one would go about improving the infrastructure level in an organization.
Hopefully, you believe in the merits of frameworks. You might even think that all software development projects should be preceded by framework development. But the reality is counter-intuitive; trying to write a framework before a having deep understanding of the problem is a recipe for disaster.
Think of how cities evolve. At the very beginning, houses spring up in undeveloped regions. Asphalt roads, telephone cables and the sewage pipes are only created when a sufficient amount of houses exist to justify the expense. At that point, the infrastructure designs itself as the pattern of house dictates where things should go. Frameworks work best when thought about in this way. There must be enough code to start noticing repeated patterns of code. This dictates how the framework should behave. This is not to say that architecture must be avoided. Rather, the use of frameworks can only be designed when they are fully understood. A great way to understand how a framework should behave is to reverse-engineer a design based on an existing, non-framework implementation.
Widely used frameworks are created much in the same way. Many were implementing their own http servers before the servlet framework existed. The servlet framework was only created once enough users experienced pain implementing their own http parsing and transport layers via sockets. (And the Struts framework, in turn, was only created once enough users experienced pain implementing their own servlets. And Tapestry, Wicket and JSF were only created...)
Is it an art or a science?
Framework design can be tricky even after the problem has been well understood. The process is as much an art form as a science. There is a constant tug of war between trying to do as much as possible but not constraining users into dead-ends. If a framework does too little, it delegates too much work to the framework user. This reduces its value proposition. On the other hand, if a framework does too much, it becomes overbearing and constraining and users need to contort the API to get what they need. In the end, underwhelming and overwhelming frameworks risk the same fate: they end up being unused and useless. (Is anyone thinking about entity beans?)
Framework design must also consider future proofing. Once an API has been published, it cannot freely change. As the user base increases, making changes becomes more difficult. This complicates the life of the framework designer because it becomes hard to deprecate or change method signatures. Think of the difficulty Sun has had with the JDK libraries: some classes have kept their deprecated methods around for ten years!
Framework design is also an exercise in politics. The first task is to convince others within the organization that a) there is a need for a framework, b) that you have a solid framework idea and c) that users like the API enough and buy into it. Even method names can be a point of contention between framework designers and users during design reviews.
Once the framework has been developed, the evangelizing phase can begin. Users need to be convinced that the framework will simplify their lives. If code existed before the new framework, stakeholders need to be sold on the idea of porting existing code to the new framework. Porting code is the acid test for a new framework. Only with real code can we really know if it covers every use case effectively. Sometimes, features are missing from the framework. Can it be modified?
I started this article with a teaser headline stating that the best infrastructure is invisible – a sort of infrastructure nirvana. In reality, this does not exist. It is merely an ideal for which we should strive. In the end, there are abstraction leaks that compromise the infrastructure design. This burdens the business layer with concepts it shouldn’t know. It's just like we shouldn’t be aware that pipes exist in our house until a broken pipe floods the basement and shatters the illusion. Having to clean up the basement is an ironic example of abstraction leakage.
In most cases, though, we can achieve much more with infrastructure, floods and all, than we would otherwise. And you can thank Joe the software plumber for that!
This article and more can be found at www.deepheap.com.