The Importance of Distributed Caching
Over the past year we've seen more and more articles here on JavaLobby about distributed caching. Terracotta, in particular, have been at the forefront of many exciting announcements in this domain. I was lucky enough to have the chance to talk to Alex Miller, tech lead over at Terracotta about the importance of distributed caching.
James Sugrue: Hi Alex, First can you give us an insight into your time as a developer to date?
Alex Miller: After college, I wrote web apps in C++ and jumped into Java in the early JDK 1.1 days. I worked for several companies doing corporate Java apps. In 1999 I got hooked into a startup called MetaMatrix as part of the initial development team and eventually the Chief Architect. Ultimately, I ended up writing a good chunk of a relational federated query engine there, which seems fairly insane looking back at it. MetaMatrix was ultimately acquired by Red Hat and has since been spun out as the open source Teiid project.
After that I worked for a while at BEA on their Aqualogic products. I currently work as a tech lead for Terracotta in addition to doing the occasional writing, speaking, and of course organizing the Strange Loop conference.
James: Could you tell us a bit about Strange Loop. Was it a success? Will you do it again?
Alex: Strange Loop is a new developer conference that I put together in St. Louis this year. While it was a ton of work, I had a great time and other people seemed to like it as well. I've written a lot about it already so probably better just to read my take elsewhere. Very soon we should start putting up the session videos on DZone, which nicely arranged for the video recording. I will definitely be doing it again next year. I'm already brimming with new crazy ideas and looking forward to taking it up another level.
James: I presume your job at Terracotta has a lot to do with distributed caching. What's a typical day in your life @ Terracotta?
Alex: At Terracotta, I'm a tech lead for a distributed team that works primarily on integration with open source libraries outside the core product. That includes some really intrinsic stuff like the JDK collection libraries, but also a Hibernate Second Level Cache implementation, Ehcache (recently acquired by Terracotta), and other important open source frameworks like Spring, Quartz, and Lucene.
A typical day for me usually involves staying in touch with the developers on my team who are spread across US, Belgium, and India. Most of the time I'm directing work, doing design and project management, and occasionally a bit of coding when I'm lucky.
James: Do you miss full time coding? Do you have any secret spare time coding projects?
Alex: I do wish I could code more but at some point you learn that by building a great team you get way more leverage to affect change and create value by proxy. I wish I had some spare time! Right now it goes into the conference, writing, or speaking more so than coding (although I manage to do some coding for all of those too).
James: Terracotta has had a really big year, the acquisition of Ehcache possibly being a highlight. Do you see Terracotta continuing this rise?
Alex: Hey, there's still a few months left in the year - we're not done yet! :) Terracotta has been a lot of fun this year. We've had a great platform for a long time and it's very satisfying to have found ways to build products on top of that platform that can solve more immediate, tangible problems. Internally, we're actually working hard on both the platform and the products around Hibernate and Ehcache.
In the platform area, we've recently rebuilt our distributed lock manager from the bottom up and it is now much faster, more concurrent, and uses significantly less memory. That sort of core engineering makes things faster across the board so it has a big impact. On some of our internal benchmarks (Hibernate read/write caching for example) we're seeing throughput that is about 4x the throughput of our last 3.1.1 release. All the geeks at Terracotta get excited about stuff like that.
In the products area, we have a new Ehcache monitoring console and REST management API still scheduled before the end of the year. That console is really focused on Ehcache core users who aren't using Terracotta at all so it's a bit of a departure for us. And we still have a few more things in the works that I can't talk about just yet.
James: So, why is caching so important? And why has it become so much more of a prominent topic now?
Alex: Caching is one of those basic techniques that we've been using in programming to make things faster for decades. We keep using it because it works. :)
One reason it works is that not all data access is equal. The closer you can keep data to the CPU, the faster access you will see. The differences to fetch data from a CPU register vs cache vs RAM vs disk vs network cover many orders of magnitude. It makes perfect sense that keeping the most frequently used data in the closest location reduces latency.
Another factor is that our data access patterns tend to exhibit "locality". We often access the same data many times in a short period of time (for instance a user's information while they are logged in) or access a set of related information together (a list of popular products). The combination of non-uniform access with drastic speed differences make caching a no-brainer for decreasing latency.
I think it's achieving more prominence because the scale of so many apps now operate at web scale instead of corporate scale. We used to build applications available to hundreds or thousands of users but now routinely build applications intended for millions of users. That has driven us to seek new tools for surviving scale.
James: Could you explain the concepts behind distributed caching, and Terracotta in general?
Alex: Terracotta started as a core platform technology called "Distributed Shared Objects". Simply put, it lets you take Java objects in your JVM and make those objects appear in a set of JVMs while maintaining all the object identity and Java memory model constraints a Java application developer expects.
These days Terracotta is really focused on one simple goal: making scalability simple. Any time someone has a Java application they are trying to scale, we want to solve the problems they encounter. To that end, we've been focusing on building products this year instead of platform. We've released a Hibernate second level cache provider and recently acquired the open source Ehcache product. Ehcache is a widely deployed open source cache and we've already released a tightly integrated Terracotta-Ehcache distributed cache. We're also pleased to have Greg Luck, the founder of Ehcache join our team.
There are several reasons to extend your local cache into a distributed cache. The first is that loading a cache on N nodes will cost N loads on your data store. If you can utilize a distributed cache, your cost is likely just a single load which can significantly reduce the database read load as you scale out. Another big reason is that all of your application nodes can see a consistent view of both your cache and your database which can simplify the application logic.
James: Are there any myths about caching that you'd like to put to rest?
Alex: I think probably the biggest myth is that adding a cache always helps. Caching is a great technique for all the reasons above, but it's easy to be blind to some obvious pitfalls. For example, if the items you're caching are not the bottleneck, then caching them won't help. Sometimes a cache will require you to copy data in and/or out of the cache and the time to copy can be non-trivial. Or a poorly designed cache can introduce a concurrency bottleneck.
Because of all these factors, it's possible to introduce a cache and make your application slower on the first try. You should test with a performance test that really reflects your application's concurrency and scale characteristics and be aware of these kinds of issues.
James: What are the signs that I should look for in my application that show that I need to add some scalability?
Alex: Generally the two biggest reasons we see people looking at caching are to improve performance or to reduce database load. The database is often an expensive and hard to scale part of an architecture. Many people investigate caching purely to allow greater app scalability without needing to increase the size of their database.
Other people are more interested in reducing latency by avoiding slow database or web service lookups or increasing throughput by getting greater concurrency on their cache than they can get from their database.
James: Can you give us some examples of well-thought-out uses of caching in the enterprise? Maybe even those who use Terracotta?
Alex: Sure, we work with a number of customers with high-visibility web sites in reservation systems, publishing, social networking, and gaming. Without getting into details of their systems, they generally see between 30% and 70% database offload which saves millions of dollars per year in database and hardware costs.
James: How would you recommend getting started with caching?
Alex: Adding a cache to your application is usually pretty easy. The basic interface for a cache like Ehcache is approximately the same as a standard Java Map - put, get, remove, clear, etc. Generally developers find it very easy to get started with Ehcache or other open source caches. At Terracotta we are working hard to make this simple interface the starting point for a wide range of possible cache configurations that all use that same basic API. This lets you learn the API once and continue leveraging it as your application grows.
James: If I decide to use a caching solution, is it going to be difficult to debug?
Alex: Actually, I think caching can be easier to debug than many parts of your architecture. Conceptually it has a pretty basic API that allows you to treat it as a monitorable black blox. First, you need to know the size of your cache. With caching you are consuming a portion of your memory for caching. It's important to balance that cache memory usage with the needs of the rest of your application and to balance memory usage between different caches.
Second, you need to look at the cache hit ratio which tells you how efficient the cache is. Generally the hit ratio needs to be fairly high to make the built-in costs of copying, concurrency, etc to be worth it. For more detail, you need to look at the latency difference between a hit and a miss on the cache. Combining that information with the hit ratio can tell you the overall latency benefit the cache is providing.
At Terracotta we regularly do internal performance tuning competitions where we take an application and work in teams to optimize it. Usually we end up doing some mix of application changes, configuration tuning, and product improvements. We use these tests as an inspiration and proving ground for the monitoring tools that we've been building for Hibernate and Ehcache. That way, we can be sure we're building tools that answer real questions.
With distributed caching, things definitely get more complicated, but again probably simpler than most distributed architecture issues. Generally, you want to look at data replication, partitioning, and issues like that so that you can minimize data flow while still providing high availability and (possibly) consistency.
James: What direction is caching going? Will it become even more important in the future?
Alex: I think the caching landscape is growing more sophisticated and more important. We've seen a sudden upsurge in new storage solutions (under the perhaps unfortunate "NoSQL" moniker) and they are expanding both the palette of storage options and the dimensions we use to discuss them. There is room for all kinds of caches at all levels of architecture and developers will grow more interested in the nuances of what kinds of caches plug in to different points in the architecture.
James: You've been doing a great job of keeping an eye on Java7 developments. How do you think it's going? Are you impressed with what's coming down the line, or are you content with Java 6?
Alex: Yeah, I've been following Java 7 since the beginning of 2007. It's hard to believe it's still in work almost three years later. Weirdly, I'm most excited about things that aren't very visible like the new G1 garbage collector and the invokedynamic stuff, since I'm kind of a language enthusiast and that opens some exciting possibilities.
The language changes are interesting and will clean up some boilerplate but aren't going to change the way we write code. In the JDK, there are a lot of interesting little things coming in like the java.util.Objects class, and some new concurrency classes but I'm really glad to have JSR 203 and the File API cleanup. I'm really disappointed that JSR 310 (the date and time overhaul) didn't make it in - guess we'll have to wait for Java 8 for that one.
No one is talking about it, but I think people will be surprised at how much faster it's going to be. There has been performance work done in a bunch of different parts of the JDK (jar files, collections, string handling, etc) as well as in stuff like escape analysis and G1 and I suspect that's going to add up to some surprising numbers at the end of the day.