DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

  1. DZone
  2. Refcards
  3. Introduction to Hazelcast IMDG
refcard cover
Refcard #272

Introduction to Hazelcast IMDG

Hazelcast IMDG is a clustered, in-memory data-grid that uses sharding for data distribution and supports monitoring. In this Refcard, learn about what an in-memory data grid can be used for, how to do simple query operations with Hazelcast IMDG, what sharding means with Hazelcast, and more.

Download Refcard
Free PDF for Easy Reference
refcard cover

Written By

author avatar Tom OConnell
SSA, Hazelcast
Table of Contents
► What’s an IMDG For? ► Before You Start ► A Simple Spring Boot Cluster Member ► Simple Query Operations ► Conclusion
Section 1

What’s an IMDG For?

Hazelcast is a clustered, in-memory data-grid that uses sharding for data distribution and supports monitoring.

Clustering refers to how some network-centric software remains resilient and highly available. You can just start processes, either members or clients, as you need, and they find each other to form a consistent whole. Members join and the load spreads out; members terminate and the load is absorbed by others.

In-memory storage is an ideal use case for Hazelcast. You can scale the storage in a number of ways. For IMDG open-source, you pick your JVM size based on your own testing and tuning. There are no good portable recommendations for that; you just have to test. You pick your backups. More than one backup is probably not necessary. Consider adding persistence if multiple backups are needed.

The term “data grid” means a lot of different things and marketing terminology can obscure this, but basically, it’s data held in caches that are available for retrieval by members or clients, processing (either in-place or in different processes) that can support events, triggers, transformations — basically, anything you can think up and code.

Hazelcast IMDG provides a robust, wide array of distributed processing possibilities. There are Maps, Queues, Lists, Sets — all extending the collections classes. There are also RingBuffers and Multimaps. Alongside Queues, there are Topics and Reliable Topics for more messaging options. The available concurrency utilities include Locks, Semaphores, Atomic-longs, Atomic-references, the ID-generator, and a Countdown-latch. CRDTs (conflict-free replicated data types) are being added, starting with the PN-counter.

Sharding, which Hazelcast calls it partitioning, is a means of horizontally partitioning the data across multiple member processes. You can think of the Hazelcast shards (partitions) as the hash buckets of a distributed hash Map. Each cluster uses a configured number of partitions — the default is 271 (that often doesn’t need to be changed). A single-member deployment would get all the partitions, without backups. As the first member joins, roughly half the partitions are transferred to that new member, and backups are created at that time. As each subsequent member joins, a basically equal fraction of the partitions are transferred to that member — both primary data and backups. When members leave, the backups become the primary data partitions and new backups are created on the remaining members.

Monitoring in distributed systems is critical. The lack of monitoring is the first step toward failure. Headless systems are often not well understood and are sometimes ignored. Hazelcast supports JMX and a management console, so easy monitoring is available; you see issues coming in advance of major problems and you can set alerting thresholds that will allow the system to call for help.

It’s for almost any programming task — broadly speaking, the three major areas are caching, distributed processing, and distributed messaging. The primary benefits to applications are big and fast data. Big data is good; big, fast data is awesome. Start small and grow enormously.

Section 2

Before You Start

There are only a few things you need to get going:

  • Java 8 is probably the most widely used JDK/JRE right now and is the preferred one to start with.
  • IDEs Eclipse, IntelliJ, and NetBeans all work well with Hazelcast/Maven. The sample code in the GitHub repo was tested in Eclipse, as well as from the command line, so it’s an easy import.
  • Maven is one of the popular build tools used with Hazelcast. There is also the option of using Gradle.

##Maven Dependencies All the dependencies for Hazelcast (any edition) are available on public Maven repositories, in addition to the Hazelcast download page.

  • Server: The server will be in one of two forms — hazelcast or hazelcast-all — which also includes client dependencies.
  • Client: The Hazelcast client is generally included from hazelcast-client, and that’s the only addition to your client app build.

Programming Models

Hazelcast is a toolkit. There are common patterns you can employ, but really, it’s just Java. You can design your own infrastructure to meet your needs in any way you see fit. Here are some common deployment models:

Embedded Member

Embedded members are really the easiest way to get started and for some things, they may be all that you need. An embedded client is a Java program where you create a HazelcastInstance. That will load the launch and the framework, and form a cluster with any compatibly configured members they find on the network (depending on your network and discovery configuration).

Dedicated Member

A dedicated member is a Hazelcast process dedicated to storage and a few other things. It won’t run your code, except for a few server-side-specific instances — entry processors, Executor Tasks (Callables and Runnables), event code (Listeners and Interceptors), and persistence code (MapLoader and MapStore).

The advantage of this approach over the embedded model is that scalability will always become more important than simplicity. With this, you can scale your storage fleet separately from your client fleet. If your storage demands soar but the processing doesn’t, you just scale these members. If you introduce new processing demands for the same, or similar, data loads, you just add clients.

Lite Member

Lite members are interesting — they join the cluster, unlike clients that just make a client-specific TCP connection. They do not, however, host data. They are for a small number of advanced things — they may be used for class-loading or they may be used as high-performance compute clients. You could, for example, direct runnable and callable tasks to Lite members if they require data that’s spread across the cluster for some kind of computation or processing.

Clients

Clients are Java programs that include the client (i.e. Hazelcast-client) JAR in their build, read, or create config that helps them find a cluster and perform the widest scope, typically, of client requests. These will be in your web-clients, your command-line tools, or anywhere you need to interface your systems with Hazelcast. Don’t think, though, that because they’re clients, you’re going to be doing all your processing there. Well written clients will use server-side constructs — particularly entry processors, aggregation, and executor tasks to delegate processing requests from single-threaded clients onto a massively scalable clustered storage and processing environment.

Not everything will be delegated to the back-end, of course. Many, many clients simply require extremely low latency access to fast, big data that isn’t changed too often and isn’t changed (ideally) by separate clients (i.e. sticky sessions are good). For these, near-caches are extremely effective. Each member can host (within its process space) potentially large subsets of data that are being actively managed by the cluster. We’re talking mostly about the open-source version here, but it’s worth noting that IMDG Enterprise HD will allow off-heap near-caches, giving you low latency access to potentially many gigabytes of near-cache data in each client. This has a broad range of applications across industries; real-time inventory for e-commerce and fraud detection for credit card processors are two. Note that in neither of these is the data static — that’s not a requirement. But the data is read much more often than it’s changed, making both of these ideal cases for near-caching.

Section 3

A Simple Spring Boot Cluster Member

So, finally a little more code. All of this is on GitHub, so you can look at the POM (pom.xml) file there. It’s basic — you just need the spring-boot parent entry and the Hazelcast dependency (from above). Make your main class a Spring Boot application with the @SpringBootApplication annotation. I’m adding the @Configuration annotation so I can have one class that serves up the beans and executes them.

Configuring Hazelcast

So, you have already run code — why talk about configuration now? Because the simple examples use all the defaults. While they’re interesting to run, you wouldn’t really go much past “hello world” with that.

The main things you’ll want to change are the network configuration — particularly the join configuration — and the definition of Maps and caches.

The join configuration dictates how members find each other. It starts with multicast, which probably won’t fly in most production environments, as multicast traffic is generally frowned upon by network administrators. TCP discovery is great, if you know the addresses and they don’t change. A data center with dedicated hardware works well. In the cloud, however, this falls apart — you don’t know the IPs up front and they’re pretty much guaranteed to change. There are cloud-based cluster discovery plugins that will make this work well and easily, but that’s too much for this intro.

Maps (IMAP) are the workhorse of distributed storage data-structures, and caches (ICache) are basically the same thing but for the JCache (JSR107) API. By default, you can create a Map using default configuration and you get a container with no limitations on it. The data is not bounded, it doesn’t expire, and it isn’t evicted — the problems with that should be obvious. The system will store and retain data until it runs out of memory and that’s ugly. For any IMDG, you want to be careful about managing memory, and Hazelcast has lots of options, e.g. using a number of objects in the cache, an absolute size of memory that can be used, or thresholds on the heap utilization that will trigger eviction. In addition to evicting on space, you can set an expiration interval on your data — you decide up front.

A Slightly More Robust Server

We can do better on the server code.

​x
1
package com.hazelcast.tao.gettingstarted;
2
​
3
import org.slf4j.logger;
4
import org.slf4j.loggerFactory;
5
import org.springramework.boot.SpringApplication;
6
import org.springframework.boot.autoconfigure.SpringBootApplication;
7
​
8
@SpringBootApplication
9
public class ServerV1
10
        {
11
​
12
                private final static logger L = LoggerFactory.getLogger(ServerV1.class);
13
​
14
                public static void main(String[] args)
15
                        {
16
                                L.info("Calling the Spring Application 'run' method");
17
                                SpringApplication.run(ServerV1.class, args);
18
                        }
19
        }

Here’s a concise server. Everything that it’s going to do is going to be injected by Spring Boot. It relies on some configuration code:

30
1
import com.hazelcast.config.Config;
2
import com.hazelcast.core.Hazelcast;
3
import com.hazelcast.core.HazelcastInstance;
4
​
5
@Configuration
6
public class ServerConfig
7
        {
8
                    @Bean
9
                    public Config config(int instanceId)
10
                            {
11
                                    Config config = new Config();
12
​
13
                                    config.getGroupConfig()
14
                                                    .setName("dev");
15
                                    config.getGroupConfig()
16
                                                    .setPassword("dev-pass");
17
                                    return config;
18
                            }
19
​
20
                    @Bean
21
                    public HazelcastInstance hazelcastInstance(Config config)
22
                            {
23
                                    HazelcastInstance instance =
24
                                            Hazelcast.newHazelcastInstance(config);
25
​
26
                                    return instance;
27
​
28
                            }
29
​
30
        }

Spring Boot will create a default instance of Hazelcast — which may not give you what you want. Having a bean for the config and one for an instance can be useful. Here’s the commandLineRunner that makes the Spring app work:

48
1
package com.hazelcast.tao.gettingstarted;
2
​
3
import java.util.Date;
4
​
5
import org.slf4j.Logger;
6
import org.slf4j.LoggerFactory;
7
import org.springframework.boot.CommandLineRunner;
8
import org.springframework.context.ApplicationContext;
9
import org.springframework.context.ApplicationContextAware;
10
import org.springframework.stereotype.Component;
11
​
12
import com.hazelcast.core.HazelcastInstance;
13
import com.hazelcast.core.IMap;
14
​
15
@Component("commandLineRunner")
16
public class ServerRunner implements CommandLineRunner, ApplicationContextAware
17
            {
18
​
19
                    private final static Logger    |        = LoggerFactory.getLogger(ServerRunner.class);
20
​
21
                    private ApplicationContext    applicationContext;
22
​
23
                    @Override
24
                    public void run(String... args)
25
                            {
26
​
27
                                        Object bean = applicationContext.getBean("hazelcastInstance");
28
                                        HazelcastInstance member = (HazelcastInstance) bean;
29
                                        System.out.printIn("this was all that was needed to start a member");
30
                                        IMap<String, String> map = member.getMap("foo");
31
                                        for (int i = 0; i < 10; i++)
32
                                                    {
33
                                                                map.put("key:" + i, "value: " + i + ":: "
34
                                                                         + new Date().toString());
35
                                                    }
36
​
37
                                                    l.debug("at startup, map size: {}", map.size())
38
​
39
                            }
40
​
41
                    publicApplicationContext getApplicationContext()
42
                            {
43
                                        return applicationContext;
44
                            }
45
​
46
                    @Override
47
                    public void setApplicationContext(ApplicationContext applicationContext)
48
            }

Simple Map Access

This part is easy — a Hazelcast IMap is a java.util.map, so you can take existing code for the Java Collections API and just repurpose it. Here’s a little code showing how easy that can be:

44
1
package com.hazelcast.tao.gettingstarted.port;
2
​
3
import java.util.Map;
4
import java.util.concurrent.ConcurrentHashMap;
5
​
6
import org.slf4j.Logger;
7
import org.slf4j.LoggerFactory;
8
​
9
import com.hazelcast.core.HazelcastInstance;
10
import com.hazelcast.core.HazelcastInstanceAware;
11
​
12
public class MapClientDemo implements HazelcastInstanceAware
13
        {
14
​
15
            private HazelcastInstance hazelcastInstance;
16
​
17
            private final static Logger    L    =
18
                    LoggerFactory.getLogger(MapClientDemo.class);
19
​
20
            public void oldMethod()
21
                    {
22
                        Map<String, String> myMap = new ConcurrentHashMap<>();
23
                        String key = "SomeKey";
24
                        String value = "Just a random string";
25
                        myMap.put(key, value);
26
​
27
                        L.info("getting key {} yields {}", key, myMap.get(key))
28
                    }
29
​
30
            public void hzMethod()
31
                    {
32
                        //wait-shouldn't this be an 'IMap'?
33
                        Map<String, String> myMap = 
34
                                hazelcastInstance.getMap("myMap");
35
                        String key = "SomeKey";
36
                        String value = "Just a random string";
37
                        myMap.put(key, value);
38
​
39
                        L.info("getting key {} yields {}", key, myMap.get(key));
40
                    }
41
​
42
            // getters, setters, random noise follow.
43
​
44
        }

In that bit, there’s a method that creates a Map and uses it. In the second method (hzMethod), the only change was to use the injected Hazelcast instance (injected via annotation) to get a reference to a distributed map in the IMDG. There’s no magic — Hazelcast is designed so that you can swap it in that easily, using the familiar Collections API. But back to that comment for a second… shouldn’t the declaration have been IMap, not Map? It depends: It could be, but it doesn’t need to be. Hazelcast Maps implement the java.util implements, so that’s valid, but maybe not useful. In a minute, we’re going to use some Hazelcast-specific methods on the Map, and to make those visible, you want to change the declaration. If you’re just doing put, get, size, remove, and all of those, then no. One interesting note on that: It’s easy to forget that “put” returns the old mapping, as it inserts the new. Think about that in a network environment: When you do a “put”, Hazelcast (conforming to the contract) returns the old mapping over the network, incurring serialization for no reason because nobody ever looks at it. Hazelcast has added a “set” method that works like “put”, save that it doesn’t return the value. This may seem like small stuff, but think about a heavily utilized production environment getting a surge of requests; you’re busy and half of that flavor of network traffic is stuff you’re never going to look at. Change two letters in your code and the network traffic drops — possibly by lots.

Keep in mind, however, that there are differences. It’s a distributed Map — aside from security configuration, other clients/threads can use the same Map. If you test the size of a new in-process Map that you create in your thread, the size will be “0”. When you get a reference to a distributed collection from the IMDG, it will create it (if required), or return a reference to an existing collection if it’s already been created. This can be a very powerful feature — you can pre-populate a collection from a persistent store or any other data-source. Your client code will be smaller and simpler because you can make assumptions about it. If you’re using a Map for a scratchpad cache; however, keep in mind that you may want to create unique map instances or manage data so that your thread doesn’t collide with other clients.

Near Cache Access

Hazelcast supports second-level, or edge, caching in client processes and refers to it as near caching. Near caches are almost transparent to your code — there are things you need to be aware of, though. Each mapping in the near cache is fundamentally managed by the IMDG member that owns the “master” copy of the data. It may be cached on multiple clients in multiple apps, and each caching client app may define their own policy for managing updates. The data in your near cache may be stale — you probably shouldn’t cache things for overly long times (you set the expiry interval in config). You should be careful about using near caches for things that are updated frequently and very careful about using them for things that are updated from multiple points. For a web application with sticky sessions, you should be able to count on certain objects being in only one client process — that’s a good scenario.

Section 4

Simple Query Operations

SQL Queries

Hazelcast is not an SQL database or a SQL query tool, but it provides a workable, robust subset of SQL query functionality. It’s accessible for developers. If you have an SQL background, this is nothing; if you don’t, it’s still pretty intuitive. The SqlPrecicate encapsulates the where clause of a query. Since you’re dealing with purely in-memory data, this is going to be very fast.

36
1
public void sqlQueryDemo()
2
        {
3
            IMap<Integer, Employee> employees =
4
                hazelcastInstance.getMap("employees");
5
            Employee emp = new Employee();
6
            emp.setId(Integer.valueOf(1));
7
            emp.setFirstName("John");
8
            emp.setLastName("Doe");
9
            emp.setAge(new Random(System.currentTimeMillis()).nextInt(99));
10
            emp.setDeptId(Integer.valueOf(13));
11
​
12
            // put the dummy employee in the map using employee-id as the map key
13
            employees.set(emp.getId(), emp);
14
​
15
            Predicate sqlPredicate =
16
                new SqlPredicate(String.format("lastName = '%s'",
17
                        emp.getLastName()));
18
            Collection<Employee> matching = employees.values(sqlPredicate);
19
​
20
            // wildcards are supported, too - look for last names starting
21
            // with 'D' using the same 'values' call.
22
            sqlPredicate = new SqlPredicate("lastName like 'D%'");
23
            matching = employees.values(sqlPredicate);
24
​
25
            // compound predicates work, as well
26
            sqlPredicate =
27
                new SqlPredicate("lastName like 'D%' and "
28
                    + "age between 21 and 30");
29
            matching = employees.values(sqlPredicate);
30
​
31
            // this could go on, but it's a pretty robust subset of sql
32
            // functionality.
33
            sqlPredicate = new SqlPredicate("deptId not in (13, 23, 33)");
34
            matching = employees.values(sqlPredicate);
35
​
36
        }

This shows how easy it is to bridge an SQL background with the IMDG SQL-like query. The caveat here is that out-of-the-box, IMDG is not a good tool for joins because of the nature of the data. We split it up because it’s big data, and because it’s big data, joining it back together is more complex. I mentioned Jet earlier; this would a useful tool for that.

Predicate Queries: Criteria API

For Java developers who really never liked SQL, there’s also a pure Java approach to querying the IMDG: the Criteria API.

20
1
public void criteriaAPIDemo()
2
        {
3
            IMap<Integer, Employee> employees = 
4
                hazelcastInstance.getMap("employees");
5
​
6
            <snip> same employee as before
7
​
8
            // create a couples predicate and then, the and of them
9
            Predicate<Integer, Employee> lastName = equal("lastName", "Doe");
10
            Predicate<Integer, Employee> age = greaterThan("age",
11
                    Integer.valueOf(99));
12
            Predicate<Integer, Employee> dept = equal("deptId",
13
                    Integer.valueOf(13));
14
            // and is a variadic method, so you can just keep
15
            // adding predicates and get the logical 'and' of all of them
16
            Predicate<Integer, Employee> ageEtAl = and(age,
17
                lastName, dept);
18
​
19
            Collection<Employee> matching = employees.value(ageEtAl);
20
        }

One thing that, looking at this, really needs to be mentioned, if only briefly, is memory in the query client. Hazelcast query operations can quickly retrieve very large volumes of data. If you had, say, 20 storage members and queried the cluster bringing back 1GB or so of data from each, you’d be looking at 20GB of data from one query. You may or may not have that much memory available in your client. The fix for that is paging predicates. These predicates wrap (logically subsume the functions of) other predicates so that you get the same logical comparisons — the same filtering — wrapped in a container that lets you bring results back in batch sizes that you specify. It’s as though you’re reading a printed book and seeing one page at a time.

16
1
Predicate<Integer, Employee> lastName = equal("lastName", "Doe");
2
Predicate<Integer, Employee> age = greaterThan("age", Interger.valueOf(99));
3
Predicate<Integer, Employee> dept = equal("deptId", Integer.valueOf(13));
4
​
5
Predicate<Integer, Employee> ageEtAl = and(age, lastName, dept);
6
​
7
// construct a paging predicate to get max of 10 objects per
8
// page, based on
9
// the previously constructed predicate.
10
PagingPredicate<Integer, Employee> pagedResults =
11
      new PagingPredicate<>(ageEtAl, 10);
12
while (true)
13
       {
14
             Collection<Employee> oldPeople =
15
                    employees.values(pagedResults);
16
       }

This was only taking the previously used predicate and returning the results in paged sets, with the page size set to 10 in this example.

Aggregations

Aggregations — or, as they’re now called, “fast aggregations” — allow data query and transformation to be dispatched to the cluster. It can be extremely effective. Keeping with the Employee class and the “employees” Map from the other examples, let’s do a quick and dirty aggregation. You could just do the aggregation across the entire entry set of the Map, but using a predicate to filter, or map, the objects before the aggregator does the reduction on them will prove more effective. Department 13 is used to represent group-W. You know those people; they’re everywhere.

This is going to create an anonymous class, which is quick and easy and highly effective for this.

52
1
public void simpleAverageAgeAggregation(IMap<Integer, Employee> employees)
2
       {
3
​
4
            Predicate<Integer, Employee> deptPredicate =
5
                  equal("deptId", Integer.valueOf(13));
6
            Aggregator<Map.Entry<Integer, Employee>, Double> ageAggregator
7
                  = new Aggregator<Map.Entry<Integer, Employee>, Double>()
8
​
9
                  {
10
                        <snip> -     serialVersionUID: don't forget this is
11
                                     going over the wire in serialized 
12
                                     format.
13
​
14
                        protected long                       sum = 01;
15
                        protected long                       count = 01;
16
​
17
                        @Override
18
                        public void accumulate(Map.Entry<Integer,
19
                                                  Employee> entry)
20
                               {
21
                                     count++;
22
                                     sum += entry.getValue()
23
                                                  .getAge();
24
                               }
25
​
26
                        @Override
27
                        public void combine(Aggregator aggregator)
28
                               {
29
                                     this.sum += this.getClass()
30
                                                  .cast(aggregator).sum;
31
                                     this.count += this.getClass()
32
                                                  .cast(aggregator).count;
33
                               }
34
​
35
                        @Override
36
                        public Double aggregate()
37
                               {
38
                                       if (count == 0)
39
                                             {
40
                                                    return null;
41
                                             }
42
                                       double dsum = (double) sum;
43
                                       return Double.valueOf(dsum / count);
44
                               }
45
                        };
46
                  // find the average age of employees in department 13
47
                  Double avgAge =
48
                         employees.aggregate(ageAggregator, deptPredicate);
49
​
50
                  L.info("average age: {}", avgAge);
51
            }
52
       }

The code is pretty self-explanatory. The predicate will ensure that only matching elements from the distributed map are included in the calculation. The aggregator will “accumulate” data — examining the matching subset and adding the age into the sum — but where does that happen? The accumulate call is called on each storage member (i.e. not the clients and not Lite members); it’s passed by each filtered (by deptPredicate) matching entry and it accumulates the raw values. Note that these run in parallel on each member involved. Because the data is partitioned across members and only a filtered subset is processed, it’s going to be very fast. In the second phase, each of these aggregator instances are returned to the caller for processing — the instance of the anonymous aggregator class examines each returned aggregator (instances of the same anonymous class) and combines all the raw results. In that part of the code, because this wasn’t a concrete class, it’s necessary to call the class.cast() method, to allow access to the count and the sum. The final step, also in the calling member, the aggregate method is invoked and performs the simple calculation of average age. This is really not much code for that kind of power.

There is also a very complete set of built-in aggregators for things like min, max, count, and avg of different numeric types, distinct for any comparable type. They can be used with very little setup, like this:

3
1
// get a list of distinct last name
2
Set<String> lastNames = employees
3
      .aggregate(Aggregators.<Map.Entry<Integer, Employee>, String>)

Implicit there was a static import of distinct.

Entry Processors

Entry processors are pretty cool. You can do highly efficient in-place processing with a minimum of locking. Consider what people often end up doing to work with remote objects: lock a key, fetch the value, mutate the value, put it back (in a finally block), and unlock the key. That’s four network calls to start with — three if you’re only looking at the data and not updating the central source. Your objects may be large and incur significant cost in terms of CPU and network for serialization and transport.

Entry processors allow you to dispatch a key-based “task” object across the LAN — directly to the member owning a key, where it is executed in a lock-free, thread-safe fashion. Hazelcast has an interesting threading model that allows this to happen.

Here’s a brain-dead simple entry processor example — but it’s still a really useful approach:

19
1
@Component
2
@Profile("client")
3
public class EntryProcessorRunner implements CommandLineRunner, ApplicationCon
4
  {
5
    private ApplicationContext applicationContext;
6
    @Override
7
    public void run(String... args) throws Exception
8
      {
9
        HazelcastInstance instance = (HazelcastInstance) applicationContext.ge
10
        IMap<String, String> demo = instance.getMap("demo");
11
        String key = "someKey";
12
        demo.set(key, "Just a String value...");
13
        demo.executeOnKey(key, new DemoEntryProcessor());
14
        EntryProcessor<String, String> asyncProcessor = new DemoOffloadableEnt
15
        demo.submitToKey(key, asyncProcessor);
16
        ExecutionCallback<String> callback = new AsynchCallbackDemo();
17
        demo.submitToKey(key, asyncProcessor, callback);
18
      }
19
<snip> }

Here’s the entry processor that is called in this:

14
1
public class DemoEntryProcessor extends AbstractEntryProcessor<String, String>
2
  {
3
    <snip>
4
    @Override
5
    public Object process(Entry<String, String> entry)
6
      {
7
        String key = entry.getKey();
8
        String value = entry.getValue();
9
        l.info("in-place processing called for {}::{}", key, value);
10
        SimpleDateFormat sdf = new SimpleDateFormat(dtFormat);
11
        entry.setValue(String.format("This value was modified at %s -- %s", sd
12
        return null;
13
} <snip>
14
}

Here, we describe the simplest case, but as simple as it is, this is a very powerful tool to have. From your client, you can dispatch computation to your servers — no locking, no contention, efficient use of the LAN. But you’re waiting for the result to be returned, so it may not seem like a really big deal. Hazelcast also has a number of options to run synchronously, asynchronously, and on multiple entries.

7
1
executeOnKey(Object, EntryProcessor);                       //synch
2
submitToKey(Object, EntryProcessor);                        //asynch
3
submitToKey(Object, EntryProcessor, ExecutionCallback);     //asynch
4
​
5
executeOnEntries(EntryProcessor);                           //synch
6
executeOnEntries(EntryProcessor, Predicate);                //synch
7
executeOnKeys(Set, EntryProcessor);                         //synch

This is a really useful set of calls — the first one, executeOnKey, does exactly that; it makes one direct call to the key owner and executes synchronously on that entry. The next two execute asynchronously — your client code doesn’t need to wait for long running operations. A word of warning, though — long-running entry processors can be truly evil. Use off-loadable in your code, by annotation, to tell Hazelcast to move the operation off of the default threading structure.

Here’s an async call, similar in function to the first.

20
1
public class DemoOffloadableEntryProcessor extends AbstractEntryProcessor<Stri
2
  {
3
    <snip>
4
    public final static String  OFFLOADABLE_EXECUTOR  = "default";
5
    @Override
6
    public Object process(Entry<String, String> entry)
7
      {
8
        String key = entry.getKey();
9
        String value = entry.getValue();
10
        l.info("in-place processing called for {}::{}", key, value);
11
      SimpleDateFormat sdf = new SimpleDateFormat(dtFormat);
12
      String newValue = String.format("This value was modified at %s -- %s",
13
      entry.setValue(newValue);
14
      return newValue;
15
    }
16
  @Override
17
  public String getExecutorName()
18
    {
19
      return OFFLOADABLE_EXECUTOR;
20
} 

We recommend submitting the processing, tagging it with a callback, and going on your way. This could be used if you have a stream of data requests that need to be initiated without the caller needing to see a result. If you’re going to do this, be sure to read ahead to the section on back pressure.

The execution callback executes in the caller’s process space, so it’s your notification that the invocation is complete. This one just logs, like this:

19
1
package com.hazelcast.tao.gettingstarted.executors;
2
​
3
<snip> - imports
4
​
5
public class SimpleExecutionCallback<V> implements ExecutionCallback<V>
6
      {
7
             @Override
8
             public void onFailure(Throwable throwable)
9
                    {
10
                           L.error("execution failed - {}"),
11
                                  throwable.getMessage());
12
                    }
13
​
14
             @Override
15
             public void onResponse(V value)
16
                    {
17
                           L.info("processing complete - {}", value)
18
                    }
19
      }

Because nothing is free, there’s a bit of server-side configuration to go with this. Back pressure, both in the client and in server members, is important when using asynchrony. Configuring it can be done with system properties like this:

14
1
# the default is false, you should enable it for async stuff
2
hazelcast.backpressure.enabled=true
3
​
4
# that's the default, look at your hardware resources before increasing it.
5
hazelcast.backpressure.max.concurrent.invocations.per.partition=100
6
​
7
# this one really just relates to making async backups, not async data operations,
8
# but I thought I'd put it here for completeness. If the system cannot keep up with
9
# async backup requests, they will periodically turn into sync backups based on
10
# this window. (millis, of course)
11
hazelcast.backpressure.syncwindow=1000
12
​
13
# configure how long hz will wait for invocation resources
14
hazelcast.backpressure.backoff.timeout.millis=60000

Back pressure is a topic worth some consideration. Hazelcast is using its threading model to execute these, so there’s a limit to how many invocations can be in flight at any one point in time. The absolute number doesn’t matter, as that would depend upon the size of your cluster and the number of CPUs/cores/physical threads. What’s likely to be interesting is how many can be queued up for one partition — by default, mutating entry processors operate on a partition thread. In configuring your cluster, you know how many physical threads you have, so you can configure the partition thread count to be a somewhat sensible number. Too few and you’ll have contention; too many (one-to-one sounds ideal, though it rarely is) and you won’t get good resource utilization.

Runnable Tasks

These are simply Java runnable objects that are dispatched to one or more members. Keep in mind that the salient part of the signature is public void run() — i.e. nothing is returned. The way that they’re dispatched to the members is very flexible; it can be one member, all members, a member that owns a key, or members selected by an attribute you have set on them.

Here’s an example of running something on the member that owns a key:

13
1
@Component("runnableDemo")
2
@Profile("client")
3
public class RunnableDemoCaller implements CommandLineRunner, ApplicationConte
4
  {
5
    private ApplicationContext applicationContext;
6
    @Override
7
    public void run(String... args) throws Exception
8
      {
9
        HazelcastInstance client = (HazelcastInstance) applicationContext.getB
10
        IExecutorService executorService = client.getExecutorService("default"
11
        executorService.executeOnAllMembers(new LoggingRunnable());
12
} <snip>
13
}

The Runnable object is pretty ordinary, with the caveat that it needs to be serializable. This runnable is also HazelcastInstanceAware so that when it’s set up on the target node, the Hazelcast framework will inject the correct instance, so that it can communicate with the cluster.

13
1
@Component("loggingRunnable")
2
public class LoggingRunnable implements Runnable, HazelcastInstanceAware
3
  {
4
    <snip>
5
    private HazelcastInstance  hazelcastInstance;
6
    @Override
7
    public void run()
8
      {
9
        l.info("into run, cluster size: {}", getHazelcastInstance().getCluster
10
      }
11
​
12
    <snip>
13
  }

This code wasn’t particularly profound, but there’s one cool aspect to it — you can direct processing to a member that owns a key (or other members) and process that key and or other keys in multiple maps. So, complex manipulation may be performed outside your client, eliminating multiple network round-trips. They data need not come all from one member, either — there are no restrictions on that. It can be a significant performance boost to design your data so that related items are all within one node — then, this kind of task will tend not to make network calls but will not be restricted from doing so.

Callable Tasks

As with runnable tasks, callable tasks are dispatched to one or more members, but offer more options for things like bringing back data. Here’s a really simple callable that will be dispatched to a member, log some noise to show it ran, and return the partition count. There are better ways to monitor or manage partitions, but this should just show how you get a value — easily — from a member.

38
1
package com.hazelcast.tao.gettingstarted.port;
2
​
3
<snip> - imports. They should be just what you'd expect
4
​
5
public class PartitionReporter implements Callable<Integer>, HazelcastInstanceAware
6
       {
7
             <snip> - local data, ctor and such
8
​
9
             @Override
10
             public Integer call()
11
                    {
12
                           Member member = hazelcastInstance.getCluster()
13
                                        .getLocalMember();
14
                            String fmt = "member listening on %s:%d";
15
                            String whoAmI = String.format(fmt,
16
                                         member.getSocketAddress()
17
                                               .getHostName(),
18
                                         member.getSocketAddress()
19
                                                      .getPort());
20
                            PartitionService service = 
21
                                  hazelcastInstance.getPartitionService();
22
                            boolean memberIsSafe = service.isLocalMemberSafe();
23
                            String memberSafety = memberIsSafe? "": "not ";
24
                            boolean clusterIsSafe = service.isClusterSafe();
25
                            String clusterSafety = clusterIsSafe? "": "not ";
26
​
27
                            int partitionCount = service.getPartitions()
28
                                         .size();
29
                            fmt = "executing in member @ {} - cluster is {}safe, " +
30
                                  "member is {}safe, hosting {} partitions";
31
                            L.debug(fmt, whoAmI, clusterSafety, memberSafety,
32
                                   partitionCount);
33
​
34
                            return Integer.valueOf(partitionCount);          
35
                    }
36
​
37
                <snip> - getters, setters
38
       }

That’s just looking at the member, which you might want to do. Importantly, if you wanted to get/set/remove data, run a query, or perform any other Hazelcast operation, you can do that from that code. The call is easy. As with the collections, it draws heavily from the Java Executors API:

13
1
public void callableTaskDemoMembers(Set<Member> members)
2
       throws InterruptedException, ExecutionException
3
       {
4
              Callable<Integer> partitionReporter = new PartitionReporter();
5
​
6
              IExecutorService executorService =
7
                    hazelcastInstance.getExecutorService("default");
8
​
9
              Map<Member, Future<Integer>> futures =
10
                    executorService.submitToMembers(partitionReporter,
11
                          members);
12
              // process these
13
       }

Events

There are lots of events, but for just right now, let’s stick to the data events: listeners and interceptors. This is still a fairly big topic, so let’s talk about a workable subset of it. Within data-data events, there are what are called map events and entry events. Map events are called for map-level changes, specifically clear() or evictAll(). Entry events are called after changes to map-entries and there’s an interesting set of those changes — there are events for entries being added, removed, updated, and evicted. This isn’t a listener for get, however. The entry events are only for changes. In that context, it makes sense that there’s an interceptor for “get”. Events may be added in a number of ways — they can be added in configuration or programmatically. In addition to that, they can be done from a member or a client. Within each member, you get local events — events triggered by (after) data-mutating events within that JVM. Here’s a simple example of listening for entry:

15
1
public static class My EntryAddedListener<K, V> implements EntryAddedListener<K, V>
2
       {
3
              Override
4
              public void entryAdded(EntryEvent<K, V> event)
5
                     {
6
                           String whoami = event.getMember()
7
                                        .getSocketAddress()
8
                                        .getHostName() + ":"
9
                                        + event.getMember()
10
                                                     .getSocketAddress()
11
                                                     .getPort();
12
                            L.trace("member: {} - added - key: {}, value: {}", whoami,
13
                                  event.getKey(),event.getValue());
14
                     }
15
       }

Adding the entry-added listener could be done in config, but here’s how to use the Java API to do it:

4
1
public void addEntryAddedListener(IMap<String, String> myMap)
2
       {
3
             myMap.addEntryListener(new MyEntryAddedListener<>(), true);
4
       }

This code will add the entry listener — listening only for entries being added. The boolean parameter tells Hazelcast that the value should be available (i.e. getValue()) in the entry event that’s going to be delivered to the listener.

Clients may add these listeners, also — in addition to client lifecycle events, cluster membership events and distributed object creation/deletion. So, they may be notified of their own client lifecycle: starting, started, shutting down, and shutdown; they may be notified of membership changes or storage members joining and leaving, and they may be notified of distributed object creation or destruction — Maps, caches, queues, and all. A word of caution, though: High-volume activity will create high volumes of events. Look carefully at your resources, like client CPU/RAM and especially network. Think in a distributed perspective and put the listener where it needs to be, not simply where it seems convenient.

Section 5

Conclusion

This is just a little of what you can do with Hazelcast. Hazelcast has been doing distributed systems for some time now; it is deliberately designed to deliver performance and simplicity. You can be up-and-running in minutes and rolling out production-quality code that looks an awful lot like your Java collections code. It’s a fun environment for programmers. A little Java gets can be all you need on the server side, then you can cut loose with Java, .NET, C++, Node.js, Python, Go, or Scala — and that list is going to grow as new languages emerge.

Like This Refcard? Read More From DZone

related article thumbnail

DZone Article

Distributed Locks Are Dead, Long Live Distributed Locks
related article thumbnail

DZone Article

How to Convert XLS to XLSX in Java
related article thumbnail

DZone Article

Automatic Code Transformation With OpenRewrite
related article thumbnail

DZone Article

Accelerating AI Inference With TensorRT
related refcard thumbnail

Free DZone Refcard

Getting Started With Vector Databases
related refcard thumbnail

Free DZone Refcard

MongoDB Essentials
related refcard thumbnail

Free DZone Refcard

PostgreSQL Essentials
related refcard thumbnail

Free DZone Refcard

NoSQL Migration Essentials

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: