Following up on my previous response to Antonio Goncalves' blog post, I have submitted a JSR to the JCP on a data grid standard, titled "Java Data Grids". It has yet to be assigned a number by the JCP, but I thought I'd talk about it a little here anyway.
Here is the description of the JSR that I have submitted:
This specification proposes to provide an API for accessing, storing, and managing data in a distributed data grid.Data grids are gaining prominence and importance in enterprise Java, particularly as cloud-style deployments gain popularity:
The primary API will build upon and extend JSR-107 (JCACHE) API. In addition to it’s genericized Map-like API to access a Cache, JSR-107 defines SPIs for spooling in-memory data to persistent storage, an API for obtaining a named Cache from a CacheManager and an API to register event listeners.
Above and beyond JSR-107, this JSR will define characteristics and expectations from eviction, replication and distribution, and transactions (via the JTA specification). Further, it would define an asynchronous, non-blocking API as an alternative to JSR-107’s primary API, as non-blocking access to data becomes a concern when an implementation needs to perform remote calls, as in the case of a data grid.
This specification builds upon JSR-107, which is not yet complete. We intend to work with the JSR-107 EG to ensure that their schedule is compatible with the schedule for this JSR. If JSR-107 is unable to complete, we propose merging the last available draft into this specification.
- Characteristics such as high availability, along with removal of single points of failure become increasingly important, since cloud infrastructure is inherently unreliable and can be re-provisioned with minimal notice; applications deployed on cloud need to be resilient to this.
- Further, one of the major benefits of cloud-style deployments is elasticity. The ability to scale out (and back in) quickly and easily. Again, data grids have a role to play here.
- Finally, with scalable middleware comes additional stress on the data tier (traditionally an RDBMS), as middleware nodes scale out to cope with load. Data grids - used as a distributed cache - can help with mitigating database bottlenecks.
With one of Java EE 7's stated goals being "cloud-friendliness", the above are powerful arguments for the inclusion of a distributed data grid standard in Java EE 7.
What about JSR-107? JSR-107 - the temporary caching API proposed in 2001 - certainly has a role to play in Java EE too. Temporary caches are an important part of enterprise middleware, but yet a standard has been sadly missing from a Java EE umbrella specification for far too long. Spring, having identified the need as well, has a temporary caching abstraction in their current development versions. Several other non-Java frameworks define temporary caching APIs too (Ruby on Rails, Django for Python, .NET). There is no denying JSR-107 is necessary, and necessary as a part of Java EE.
But JSR-107 isn't a data grid. JSR-107 falls short as a standard for data grids, specifically as it doesn't take into account characteristics of distribution and replication of data, and doesn't define a contract that implementations would have to adhere to when it comes to moving data around a cluster. Crucial things for a data grid that, if not baked into a specification, will hinder portability and render the standard itself useless and impotent.
Further, with remote capabilities in mind, a data grid should also expose a non-blocking API, since network calls can be a limiting factor. Invoking methods that involve remote calls should be able to be done in an asynchronous fashion. Stuff that is irrelevant to a temporary caching API like JSR-107.
So with all that in mind, I'd love to hear your thoughts on the data grid JSR. In addition to Red Hat, the JSR is currently backed by a major Java EE and data grid vendor which cannot be named at this stage, along with independent JCP members with relevant interest and background.