Introducing the Infinispan Data Grid Platform
Join the DZone community and get the full member experience.
Join For FreeThis two-part series aims to introduce the reader to Infinispan, a new open source, LGPL licensed data grid platform. The first part will focus on an overview of the scope and capabilities of Infinispan, along with usage examples and a brief tour of the APIs. An upcoming second part will take a deep-dive into the architecture, more advanced APIs and extending Infinispan.
What is Infinispan?
Infinispan is a peer-to-peer, in-memory data grid (IMDG) platform, written for the Java virtual machine (JVM). I first publicly announced the project in April 2009, and a string of alpha and beta releases quickly followed. The project has grown out of experiences gained on JBoss Cache, a popular clustered caching library, and while JBoss Cache was used as a “living prototype” for Infinispan - a place to harvest ideas, designs, usage patterns and the wants of the community - it is in no way related to or dependent on JBoss Cache.
While there are several differences between JBoss Cache and Infinispan, the most significant difference is scope. JBoss Cache focused on being a clustered caching library while Infinispan is a data grid platform, complete with GUI management tooling, and the potential to scale to thousands of nodes. At the same time, Infinispan also fulfills the requirements of a clustered caching library, and even performs exceptionally as a standalone, non-clustered cache. To help existing JBoss Cache users, Infinispan provides for an easy migration path.
Features
Infinispan’s upcoming release is 4.0 - wondering why our first release is numbered such? Have a look at this FAQ.
Current release: 4.0
This release offers the following features:
APIs
- Familiar Map-like API (org.infinispan.Cache extends java.util.concurrent.ConcurrentMap)
- Alternative JBoss Cache-compatible tree-like API
- RESTful API for remote access
- Support for non-JVM clients via RESTful API
Operational modes
- Clustered and non-clustered (standalone) operation modes
- Clustered operational modes include invalidation, replication and distribution, making use of either synchronous and asynchronous network communications
- Distribution with optional L1 caching (“near-caching”) reduces network latency for frequent lookups, while maintaining the scalability of distribution
JTA transaction support
- Infinispan is an XA resource that is compatible with any JTA-compliant transaction manager
Write-through caching to pluggable CacheStores
- Can be configured to provide “warm starts”
- Ships with high-performance disk, JDBC and cloud storage based CacheStore implementations
- Support for custom implementations
- Also useful as an overflow space if memory is scarce
Eviction and expiration
- FIFO and LRU based eviction policies
- Expiration of data based on lifespan and idle time
Management
- JMX statistics and monitoring
- JOPR-based GUI management
High-performance custom marshaling layer to provide fast, low-overhead serialization and deserialization
Future releases
Please visit Infinispan’s project roadmap page to view the project’s roadmap and to check out what is in store for future releases, including the JPA-like API, a client/server module, querying capabilities, and distributed code execution.
Getting started with Infinispan
Demos
A good place to start is to download the Infinispan distribution. The distribution includes a number of demo applications including a GUI-based one to visualize state moving around a grid.
1. Download an Infinispan distribution. E.g.,
$ wget http://sourceforge.net/projects/infinispan/files/infinispan/4.0.0.CR2/infinispan-4.0.0.CR2-bin.zip/download
2. Unzip the archive
$ unzip infinispan-4.0.0.CR2-bin.zip
3. Run the GUI demo by invoking the runGuiDemo.sh script provided with the distribution(or runGuiDemo.bat on Windows platforms)
$ infinispan-4.0.0.CR2/bin/runGuiDemo.sh
4. The GUI console should load up and you would see something like the frame below:
Clicking on the START CACHE button will start the cache instance wrapped up in the frame.
5. Naturally, one instance is of limited use - things are more fun when you have several cache instances in a cluster. Invoking the runGuiDemo.sh script a few more times will create more GUI frames, and you can start them all. For example, starting two more GUIs, we can see that the caches discover each other and form a cluster.
You can start as many cache instances as you wish.
6. You can now use one of the instances to generate data.
7. You can see that this entry has been added to the cache:
8. You will be able to retrieve this key from any of the frames in the cluster:
Using Infinispan in your project
The easiest way to do this is to use Maven, and add dependencies to Infinispan modules in your project’s POM. Alternatively you could download the distribution and get the necessary jar files there, for inclusion in your project.
To use Infinispan with Maven, just add the following to your project’s pom.xml:
<dependencies>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-core</artifactId>
<version>4.0.0.CR2</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>repository.jboss.org</id>
<url>http://repository.jboss.org/maven2</url>
</repository>
</repositories>
From that point on, you will be able to create instances of the cache and use it:
//1. Create a configuration
Configuration cfg = new Configuration();
// ... Customize your configuration as you wish.
// ... Defaults to LOCAL mode.
CacheManager mgr = new DefaultCacheManager(cfg);
//2. Let's create a stock price cache, keyed on String
Cache<String, Float> stockPriceCache = mgr.getCache("strockPriceCache");
//3. Let's check if we have the price of IBM, and if not,
// retrieve and cache it
String ticker = "IBM";
Float value;
if (stockPriceCache.containsKey(ticker)) {
value = stockPriceCache.get(ticker);
} else {
value = getStockPriceFromTheInternet(ticker);
stockPriceCache.put(ticker, value);
}
System.out.printf("Got the price of %s as %s", ticker, value);
We maintain an online, interactive tutorial to walk you through the basic steps of creating a cache and using it. For more examples, we recommend this as a good starting point, and it is probably a very good idea to have the Infinispan API documentation handy while you do this. Exploring the API in this manner is a great way to get up to speed using Infinispan really fast.
Configuring your Cache
The online configuration guide is your friend here. Infinispan’s XML configuration file is straightforward to use, especially with a good XML editor referencing the XSD schema. All elements in this file are optional with sensible defaults selected for any element omitted. Please refer to the configuration reference for details.
Migrating configurations
The Infinispan distribution ships with tools to migrate cache configuration files from JBoss Cache, EHCache and Oracle Coherence to Infinispan configuration files. This can be a useful starting point if you are considering Infinispan as a replacement for one of these cache systems. Information on these tools can be found here.
Using the REST API
Feel like connecting to an Infinispan backend using the REST API? Download the REST server for Infinispan! This is in the form of a WAR file which can be deployed in most Servlet containers.
1. Download the Infinispan REST server
$ wget http://sourceforge.net/projects/infinispan/files/infinispan/4.0.0.CR2/infinispan-4.0.0.0.CR2-server-rest.zip/download
2. Unzip the archive
$ unzip infinispan-4.0.0.CR2-server-rest.zip
3. Deploy the webapp in your favorite Servlet container or Java EE server
$ cp infinispan-4.0.0.CR2/webapp/infinispan.war $JBOSS_HOME/server/default/deploy/
4. Start your Servlet container or Java EE server
The Infinispan REST server should now be listening on the host name and port that you have used to configure your Servlet container or Java EE server. E.g., on http://localhost:8080/infinispan
5. Connecting to your REST server is easy - here is an example using Python:
import httplib
hostname = "localhost:8080"
#putting data in
conn = httplib.HTTPConnection(hostname)
data = "SOME DATA HERE !" #could be a string, or a file ...
conn.request("POST", "/infinispan/rest/Bucket/0", data, {"Content-Type":
"text/plain"})
response = conn.getresponse()
print response.status
#getting data out
conn = httplib.HTTPConnection(hostname)
conn.request("GET", "/infinispan/rest/Bucket/0")
response = conn.getresponse()
print response.status
pring response.read()
Or using Ruby:
require 'net/http'
http = Net::HTTP.new('localhost', 8080)
#Create new entry
http.post ('/infinispan/rest/MyData/MyKey', 'DATA HERE', {"Content-Type" =>
"text/plain"})
#get it back
puts http.get ('/infinispan/rest/MyData/MyKey').body
#use PUT to overwrite
http.put ('/infinispan/rest/MyData/MyKey', 'MORE DATA', {"Content-Type" =>
"text/plain"})
#and remove ...
http.delete('/infinispan/rest/MyData/MyKey')
We have a more detailed guide on using the REST server. The long-term goal with this API is to converge with the REST-* effort and standardize the API for distributed caches via REST.
Managing your caches using JOPR
We have a detailed guide on using the JOPR GUI tool to manage your cache instances as well. The current version allows you to visually monitor the health of your data grid and plots graphs of various metrics over time. As this tool evolves, we hope to add features such as automatic provisioning of nodes based on rules, where the grid’s true elastic nature can be realized.
Querying the grid
While querying is only really scheduled for 4.1, our next release, we do have a tech preview of querying in the current, 4.0 release. Keep in mind that, as a tech preview, the querying interface and API are subject to change, but it does give you a feel for what to expect. Details about the tech preview of the Query API, along with instructions on usage and sample code, can be found here.
Operational modes
Here we discuss the different operational modes in more detail.
Standalone operation
As a simple, standalone cache to store data that is expensive to retrieve - for example from a database or a mainframe - or recalculate, Infinispan’s highly concurrent core container performs exceptionally well with minimal overhead, and is highly tuned for multi-core/multi-CPU environments. Synchronization and locking are kept to a minimum while delivering concurrent transaction isolation, and offering all of the other features of the platform, including a write-through CacheStore, eviction and JTA transaction compatibility.
Clustered operation
In addition to standalone operation, Infinispan can operate as a cluster, where nodes are aware of each others’ presence and are able to interact and maintain coherence of state. The following clustered modes are supported:
Invalidated data grid
A clustered, invalidated data grid is essentially a set of local, standalone caches which are aware of each other. When an entry is changed in any cache in the grid, the entire grid is made aware of the fact. Other nodes, if they happen to have cached the same entry as well, are aware that it is now out of date, and it will be invalidated. This low-cost invalidation message involves a multicast of the modified key(s), and prompts remote caches to remove corresponding entries from their caches. Commonly used in “read-mostly” scenarios where there is a data store elsewhere which can be consulted for data if it is invalidated from parts of the data grid.
Replicated data grid
A replicated data grid is where each instance contains a replica of its neighbors. As such, any changes made to any instance is replicated across the entire cluster. This is useful if the cluster size is small and the entire cluster can benefit from having all of the state local and in-memory. However, this operational mode does not scale in terms of memory, since adding more cluster nodes does not give you access to more addressable memory, and you are theoretically limited to the heap of a single JVM, discounting overhead.
Distributed data grid
This is the default clustered operation mode in Infinispan, and makes use of a consistent hash algorithm to determine where keys should be located in the cluster. Consistent hashing allows for cheap, fast and above all, deterministic location of keys with no need for further metadata or network traffic. The goal of distribution is to maintain enough copies of state in the cluster so as to be durable, but not too many copies so as to be scalable. As such, the number of copies of each entry maintained in the grid - the numOwners configuration attribute - is a configurable parameter that can be tuned, and represents the tradeoff between performance and scalability, and durability of data. Regardless of how large the cluster is though, the number of copies is fixed. This means that such a setup scales linearly as nodes are added to the cluster. Further, capacity added is capacity realized, since adding more nodes means more usable memory can be addressed. For example, discounting for overhead, 200 JVMs in such a cluster, with a heap size of 1GB each and setting numOwners 2 would give you 100GB of addressable memory in the entire system!
L1 caching (”near caching”)
With a distributed data grid, there is no guarantee that the instance you speak to locally holds the entry you are looking for. The system may have to make a remote call to another cache node to retrieve the requested entry. While this remote lookup happens transparently, it has a cost associated with it. To minimize this cost in the event of repeated lookups on the same key, L1 caching can be enabled. L1 caching causes the requesting node to cache the retrieved entry locally and listen for changes to the key on the wire. L1-cached entries are given an internal expiry to control memory usage. Enabling L1 will improve performance for repeated reads of non-local keys, but will increase memory consumption to some degree. It offers a nice tradeoff between the “read-mostly” performance of an invalidated data grid with the scalability of a distributed one.
Use cases
Infinispan can be used for a number of purposes.
Standalone data cache
Traditional cache usage - to front databases or other expensive, non-scalable data stores - is one. Such usage helps “read-mostly” setups to relieve their data store from congestion, and provide quick, low-latency access to data being read.
Clustering toolkit
Use as a toolkit to cluster a container, framework or server by distributing on-the-fly state and allowing for failover is another common use case. Such usage allows framework or server developers to create clustered offerings where state management is delegated to Infinispan and clients connected to such backends can gracefully fail over to another instance if one were to experience a failure.
Data store
Increasingly, though, use as a primary data store in itself is gaining popularity, especially for unstructured or semi-structured data. Due to the low-latency, high-concurrency and highly scalable nature of in-memory data grids, they have become popular in many applications that require the ability to scale on-demand, or to have fast, low-latency access to data. Infinispan fits well with the NoSQL movement, which is gaining momentum, as well as cloud-deployments where traditional data stores are problematic. Upcoming features such as indexing and querying of state as well as distributed execution (“move the process to the data, not data to the process”) make this an interesting space to watch.
Integrating with other products and frameworks
We know of several open source and proprietary products considering Infinispan as a part of their offering, and here are some that have reached a certain degree of maturity that may be of interest.
Hibernate 2nd Level Cache Provider
As of version 3.5, Hibernate ships with an Infinispan cache provider for 2nd level caching. This setup typically uses Infinispan as a clustered, invalidated data grid and helps improve performance on “read-mostly” entities. More details on this cache provider can be found here.
Lucene Directory Provider
Contributed to Infinispan’s codebase, this module allows you to use Infinispan as a distributed, in-memory store for Lucene indexes. More details can be found here.
Next steps
Want more?
A formal user guide is in the process of being written. Expect this to be available soon, but in the meanwhile the wiki should be your primary source for information. The wiki serves as a launchpad for more information on Infinispan, from design documents to FAQs, API docs to configuration references, tutorials to tips on contributing to the project. Can’t find the information you need on a specific subject? Visit the Users’ Forum to ask about it. We use JIRA as a project issue tracker. And of course you should follow the project on Twitter! Like our logo? Check out these cool desktop wallpapers the good people at JBoss.org designed for us!
Feel like getting involved?
Have a look at this page, which details the resources available to anyone interested in participating in the project, along with information on how to get in touch with the development team.
Opinions expressed by DZone contributors are their own.
Comments