In-Memory Data Grids With Hazelcast, Hibernate, and Spring Boot

If your priority is the performance of queries on large amounts of data and you have a lot of RAM, an in-memory data grid is the right solution for you!

Piotr Mińkowski

May. 12, 17 · Tutorial

Likes (12)

Comment

Save

17.1K Views

In my previous article about JPA caching with Hazelcast, Hibernate, and Spring Boot, I described an example illustrating Hazelcast usage as a solution for the Hibernate second-level cache. One big disadvantage of that example was an ability by caching entities only by primary key. The opportunity to cache JPA queries by some other indices helped, but that did not solve the problem completely because the query could use already-cached entities even if they matched the criteria.

In this article, I’m going to show you a smart solution of that problem based on Hazelcast distributed queries.

Image title

Spring Boot has a built-in auto configuration for Hazelcast if such a library is available under the application classpath and @BeanConfig is declared:

@Bean
Config config() {
 Config c = new Config();
 c.setInstanceName("cache-1");
 c.getGroupConfig().setName("dev").setPassword("dev-pass");
 ManagementCenterConfig mcc = new ManagementCenterConfig().setUrl("http://192.168.99.100:38080/mancenter").setEnabled(true);
 c.setManagementCenterConfig(mcc);
 SerializerConfig sc = new SerializerConfig().setTypeClass(Employee.class).setClass(EmployeeSerializer.class);
 c.getSerializationConfig().addSerializerConfig(sc);
 return c;
}

In the code fragment above, we declared cluster names and password credentials, connection parameters to the Hazelcast Management Center, and entity serialization configuration. The entity is pretty simple; it has @Id and two fields for searching personId and company.

@Entity
public class Employee implements Serializable {

 private static final long serialVersionUID = 3214253910554454648 L;

 @Id
 @GeneratedValue
 private Integer id;
 private Integer personId;
 private String company;

 public Integer getId() {
  return id;
 }

 public void setId(Integer id) {
  this.id = id;
 }

 public Integer getPersonId() {
  return personId;
 }

 public void setPersonId(Integer personId) {
  this.personId = personId;
 }

 public String getCompany() {
  return company;
 }

 public void setCompany(String company) {
  this.company = company;
 }

}

Every entity needs to have the serializer declared if it is to be inserted and selected from the cache. There are same default serializers available inside the Hazelcast library, but I implemented the custom one for our sample. It is based on StreamSerializer and ObjectDataInput.

public class EmployeeSerializer implements StreamSerializer < Employee > {

 @Override
 public int getTypeId() {
  return 1;
 }

 @Override
 public void write(ObjectDataOutput out, Employee employee) throws IOException {
  out.writeInt(employee.getId());
  out.writeInt(employee.getPersonId());
  out.writeUTF(employee.getCompany());
 }

 @Override
 public Employee read(ObjectDataInput in ) throws IOException {
  Employee e = new Employee();
  e.setId( in .readInt());
  e.setPersonId( in .readInt());
  e.setCompany( in .readUTF());
  return e;
 }

 @Override
 public void destroy() {}

}

There is also a DAO interface for interacting with the database. It has two searching methods and extends Spring data CrudRepository.

public interface EmployeeRepository extends CrudRepository<Employee, Integer> {

    public Employee findByPersonId(Integer personId);
    public List<Employee> findByCompany(String company);

}

In this sample, the Hazelcast instance is embedded into the application. When starting the Spring Boot application, we have to provide VM argument -DPORT , which is used for exposing the service REST API. Hazelcast automatically detects other running member instances and its port will be incremented out of the box. Here’s the REST @Controller class with exposed API:

@RestController
public class EmployeeController {

 private Logger logger = Logger.getLogger(EmployeeController.class.getName());

 @Autowired
 EmployeeService service;

 @GetMapping("/employees/person/{id}")
 public Employee findByPersonId(@PathVariable("id") Integer personId) {
  logger.info(String.format("findByPersonId(%d)", personId));
  return service.findByPersonId(personId);
 }

 @GetMapping("/employees/company/{company}")
 public List < Employee > findByCompany(@PathVariable("company") String company) {
  logger.info(String.format("findByCompany(%s)", company));
  return service.findByCompany(company);
 }

 @GetMapping("/employees/{id}")
 public Employee findById(@PathVariable("id") Integer id) {
  logger.info(String.format("findById(%d)", id));
  return service.findById(id);
 }

 @PostMapping("/employees")
 public Employee add(@RequestBody Employee emp) {
  logger.info(String.format("add(%s)", emp));
  return service.add(emp);
 }

}

@Service is injected into the EmployeeController. Inside EmployeeService, there is a simple implementation of switching between the Hazelcast cache instance and Spring Data DAO @Repository. In every find method, we are trying to find data in the cache, and in case it’s not there, we are searching for it in the database and then putting the found entity into the cache.

@Service
public class EmployeeService {

 private Logger logger = Logger.getLogger(EmployeeService.class.getName());

 @Autowired
 EmployeeRepository repository;
 @Autowired
 HazelcastInstance instance;

 IMap < Integer, Employee > map;

 @PostConstruct
 public void init() {
  map = instance.getMap("employee");
  map.addIndex("company", true);
  logger.info("Employees cache: " + map.size());
 }

 @SuppressWarnings("rawtypes")
 public Employee findByPersonId(Integer personId) {
  Predicate predicate = Predicates.equal("personId", personId);
  logger.info("Employee cache find");
  Collection < Employee > ps = map.values(predicate);
  logger.info("Employee cached: " + ps);
  Optional < Employee > e = ps.stream().findFirst();
  if (e.isPresent())
   return e.get();
  logger.info("Employee cache find");
  Employee emp = repository.findByPersonId(personId);
  logger.info("Employee: " + emp);
  map.put(emp.getId(), emp);
  return emp;
 }

 @SuppressWarnings("rawtypes")
 public List < Employee > findByCompany(String company) {
  Predicate predicate = Predicates.equal("company", company);
  logger.info("Employees cache find");
  Collection < Employee > ps = map.values(predicate);
  logger.info("Employees cache size: " + ps.size());
  if (ps.size() > 0) {
   return ps.stream().collect(Collectors.toList());
  }
  logger.info("Employees find");
  List < Employee > e = repository.findByCompany(company);
  logger.info("Employees size: " + e.size());
  e.parallelStream().forEach(it -> {
   map.putIfAbsent(it.getId(), it);
  });
  return e;
 }

 public Employee findById(Integer id) {
  Employee e = map.get(id);
  if (e != null)
   return e;
  e = repository.findOne(id);
  map.put(id, e);
  return e;
 }

 public Employee add(Employee e) {
  e = repository.save(e);
  map.put(e.getId(), e);
  return e;
 }

}

If you are interested in running a sample application, you can clone my repository on GitHub. In the person-service module, there is an example of my previous article about Hibernate second cache with Hazelcast. In employee-module , there is an example for this article.

Testing

Let’s start three instances of employee service on different ports using the VM argument -DPORT. In the first figure visible in the beginning of the article, these ports are 2222, 3333, and 4444. When starting the last third service’s instance, you should see the fragment visible below in the application logs. It means that a Hazelcast cluster of three members has been set up.

2017 - 05 - 09 23: 01: 48.127 INFO 16432-- - [ration.thread - 0] c.h.internal.cluster.ClusterService: [192.168 .1 .101]: 5703[dev][3.7 .7]

Members[3] {
 Member[192.168 .1 .101]: 5701 - 7 a8dbf3d - a488 - 4813 - a312 - 569 f0b9dc2ca
 Member[192.168 .1 .101]: 5702 - 494 fd1ac - 341 b - 451 c - b585 - 1 ad58a280fac
 Member[192.168 .1 .101]: 5703 - 9750 bd3c - 9 cf8 - 48 b8 - a01f - b14c915937c3 this
}

Here is a picture from the Hazelcast Management Center for two running members (only two members are available in the freeware version of Hazelcast Management Center).

Image title

Then, run Docker containers with MySQL and Hazelcast Management Center.

docker run -d --name mysql -p 33306:3306 mysql
docker run -d --name hazelcast-mgmt -p 38080:8080 hazelcast/management-center:latest

Now, you could try to call endpoint http://localhost:/employees/company/{company} on all of your services. You should see that data is cached in the cluster and even if you call the endpoint on a different service, it find entities put into the cache by different services. After several attempts, my service instances put about 100k entities into the cache. Distribution between two Hazelcast members is 50% to 50%.

Image title

Final Words

We could probably implement a smarter solution, but I just wanted to show you the idea. I tried to use Spring Data Hazelcast for that, but I’ve got a problem with running it on Spring Boot application. It has a HazelcastRepository interface, which is something similar to Spring data CrudRepository but is based on cached entities in Hazelcast grid and also uses the Spring data KeyValue module. The project is not well-documented and like I said before, it didn’t work with Spring Boot, so I decided to implement my simple solution!

In my local environment, which was visualized in the beginning of the article, queries on the cache were about 10 times faster than similar queries on the database. I inserted 2M records into the employee table. Hazelcast data grid could not only be a second-level cache but even a middleware between your application and database. If your priority is a performance of queries on large amounts of data and you have a lot of RAM, an in-memory data grid is the right solution for you!

Spring Framework Hazelcast Spring Boot Data (computing) Database Hibernate Spring Data Cache (computing) application

Published at DZone with permission of Piotr Mińkowski, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

In-Memory Data Grids With Hazelcast, Hibernate, and Spring Boot

If your priority is the performance of queries on large amounts of data and you have a lot of RAM, an in-memory data grid is the right solution for you!

Testing

Final Words

Related

Partner Resources