Is caching an 'Architectural Smell'?
Is caching an 'Architectural Smell'?
Join the DZone community and get the full member experience.Join For Free
Sensu is an open source monitoring event pipeline. Try it today.
Kent Beck introduced the concept of "Code Smells" while working on Martin Fowler's famous Refactoring book and I think that most people would agree with many of the stinks he identified. Many of us probably also use tools such as checkstyle to automatically identify such things as excessively long methods, dead code etc. To those not familiar with the concept please have a quick read from the link above but the basic premise is that
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
Though we have to remember that just because some code has a 'smell' doesn't mean it's bad, just that it's worth investigation and justification.
We can take the concept to the next layer of abstraction and identify a number of "Architectural Smells". A recent blog article touched upon one of mine - the (over) use of Caches.
I've had terrible trouble with caches in the past. They can introduce bugs which are difficult to reproduce as they rely upon operation timing to be visible. They are similar to bugs you find in concurrent systems, where the issue only occurs every few thousand operations and aren't present when you attach a debugger or logging. Like all performance tuning a cache should be introduced AFTER you have determined that there is an problem. However they can be added so easily that developers throw them in whenever they can. Of course if your cache hit is low then your performance can actually degrade after adding a cache.
Maybe you agree or not with the above (and I know I'll be flamed for saying it) but why do I consider caches to be an Architectural Smell?
In a perfect system the business logic will always have access to the data it needs. The access (local or remote) will fit comfortably into the non-functional requirements and the data it uses will be from the primary source/system of record and not be stale.
Back in the real world the system is not used in the way it was originally designed for, by many more users than anticipated and they can't wait for anything.
The temptation is to introduce a cache at each layer there is an issue. They can be very easy to introduce (Spring will allow you to do this with a couple of lines of configuration for your data access components) and the user's perception of response can increase dramatically. Is it a free lunch? If you look closely at the options available with caching systems you'll see all sorts that you might associate with databases - which is not surprising as they are really a mini database. Have you considered data staleness, dirty reads, dirty writes, update schedules? Will all clients of the data see the same data at the same time? Can updates be missed? Does it listen for updates or poll? Is data coalesced, grouped or skipped? Depending on the use of the data you might answer these questions and decide that caching is an effective and accurate solution - great! If it's not then the cache will introduce the kind of bugs I described.
Either way it is still an Architectural Smell. Perhaps the best solution is to re-examine how data is distributed and accessed throughout the system. For example:
- Maybe a monolithic database sitting at the center of the system isn't the best solution and perhaps you need multiple database with different responsibilities? (Issues with monolithic, remote databases are a common reason for needing caches).
- Maybe an asynchronous messaging system with multiple messages being processed would work better than a single request/response system?
- Perhaps data associated with a request should be sent through the system with the request itself (enriched request).
- Should some data (e.g. static) be explicitly kept locally rather than requested and cached?
- Should some data have its encoding changed? Moving from/to xml is very time consuming.
- Can data be request in larger or smaller blocks to reduce overheads? Calling a database in a loop is a common problem.
I appreciate that this will involve a lot more work than a few lines of configuration but may help architectures to evolve logically rather than become a series of hacks and bolt ons. Introducing a cache is an architectural decision and not a coding one.
What are your favourite Architectural Smells we should all look for? I've already mentioned another of mine - "XML everywhere".
Published at DZone with permission of Robert Annett , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.