Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Perform Bulk Inserts With Elasticsearch's REST High-Level Client

DZone's Guide to

Perform Bulk Inserts With Elasticsearch's REST High-Level Client

Generating data sets and inserting/ingesting them into databases is a key role of any data scientist. Learn how to do it with Elasticsearch!

· Big Data Zone ·
Free Resource

The Architect’s Guide to Big Data Application Performance. Get the Guide.

We would often like to generate some kind of random data when playing with databases or for just throwing some data at our application. Faker can be very useful for these purposes. It generates data for various domain objects that you would want to model in your application. For instance, the first name or last name of a person, book names and their authors and publishers, etc. The entire list of “fakers” (domain objects) is provided in the Faker GiHub readme file. Another interesting/useful feature is that we can also generate “locale” specific data from it.

The Faker gihub repository can be found here: Faker Github

In this tutorial, we will create a sample Spring Boot application and use the above Faker dependency to generate some data and then use that data to populate our Elastic DB. You could use any other database along with any other Java-based application depending on your needs. This will also serve as an example on Elastic Search's REST High-Level Client usage.

1. Let us create a simple Spring Boot application and test the Faker service.

FakerAndESApp.java

package techgabs;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class FakerAndESApp {
    public static void main(String[] args){
        SpringApplication.run(FakerAndESApp.class, args);
    }
}

2. Test the class responsible for displaying the output of the Faker service.

TestFaker.java

package techgabs;
import com.github.javafaker.Faker;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;

import java.util.Locale;

@Component
public class TestFaker {

    @EventListener
    public void test(ApplicationReadyEvent event){

        Faker faker = new Faker(new Locale("en-IND"));
        System.out.println(faker.name().firstName());
        System.out.println(faker.name().lastName());

        System.out.println(faker.name().firstName());
        System.out.println(faker.name().lastName());
    }
}

Sample output:


Chandira
Iyer
Varalakshmi
Naik

Note that the locale was set to “en-IND.” The entire list of locale settings can be found here, Faker Github
Another thing to note is that every time faker.name().firstName() is called, a new string is returned even though the same Faker object is used. Every call to the method returns a new value.

3. Now that we know how Faker works, let us try to generate some book data and insert them into ES.

Let us first get the required list of Gradle dependencies that we need for the project -> build.gradle

plugins {
    id 'java'
}

group 'techgabs.faker.es'
version '1.0-SNAPSHOT'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}
dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.12'

    // https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web
    compile group: 'org.springframework.boot', name: 'spring-boot-starter-web', version: '2.1.0.RELEASE'

    compile 'com.github.javafaker:javafaker:0.16'

    // https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client
    compile 'org.elasticsearch.client:elasticsearch-rest-high-level-client:6.4.2'
}

4. Create a Book model to hold Faker generated data.

Book.java

package techgabs.model;
public class Book {
    public String getAuthor() {
        return author;
    }
    public void setAuthor(String author) {
        this.author = author;
    }
    public String getGenre() {
        return genre;
    }
    public void setGenre(String genre) {
        this.genre = genre;
    }
    public String getPublisher() {
        return publisher;
    }
    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    private String id;
    private String author;
    private String genre;
    private String publisher;
    private String title;
}

Prerequisites for ElasticSearch:

Please make sure the Elasticsearch DB is up and running.

On Mac, I found it easier to install ES via brew.

brew update
brew install elasticsearch

On Windows, you can download the MSI from here -> ElasticSearch MSI For Windows

A better approach in both cases would be to useDdocker to download an ES image and run it.

5. Create a service to generate fake data.

package techgabs.service;


import com.github.javafaker.Faker;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import techgabs.dao.BookDao;
import techgabs.model.Book;

import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.UUID;

@Service
public class BulkService {

    private Faker faker =new Faker(new Locale("en-IND"));

    @Autowired
    private BookDao bookDao;

    public void fakeBulkInsert(int count){

        bookDao.bulkInsert(getFakeBookList(count));
    }

    private List getFakeBookList(int count) {
        List bookList = new ArrayList<>();

        for(int i=0;i < count;i++) {
            Book book = new Book();
            book.setId(UUID.randomUUID().toString());
            book.setAuthor(faker.book().author());
            book.setGenre(faker.book().genre());
            book.setPublisher(faker.book().publisher());
            book.setTitle(faker.book().title());
            bookList.add(book);
        }
        return bookList;
    }
}

We will now use the RestHighLevelClient ES module to perform bulk inserts of the data generated in the previous step. Below is the Config class for creating RestHighLevelClient. Note that it's important to destroy the client explicitly after use. It also uses a low-level RestClient. Please check Elastic Search Rest High Level Client docs for more information.

6. Config Class for Rest High-Level Client for ES.

ESConfig.java 
package techgabs.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.config.AbstractFactoryBean;
import org.springframework.context.annotation.Configuration;

import java.io.IOException;

@Configuration
public class ESConfig extends AbstractFactoryBean {

    private RestHighLevelClient restHighLevelClient;

    @Override
    public Class getObjectType() {
        return RestHighLevelClient.class;
    }

    @Override
    protected RestHighLevelClient createInstance() throws Exception {

        try {
            restHighLevelClient = new RestHighLevelClient(
                    RestClient.builder(new HttpHost("localhost", 9200, "http"),
                            new HttpHost("localhost", 9201, "http")
                    )
            );

        }
        catch (Exception ex){
            System.out.println(ex.getMessage());
        }
        return restHighLevelClient;
    }

    @Override
    public void destroy(){

        try {
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

7. Create a DAO layer to perform bulk inserts.

package techgabs.dao;

import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import techgabs.model.Book;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@Component
public class BookDao {

    private static final String INDEX="book_index";

    private static final String TYPE="book_type";

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private ObjectMapper objectMapper;

    public void bulkInsert(List bookList){

        BulkRequest bulkRequest = new BulkRequest();

        bookList.forEach(book -> {
            IndexRequest indexRequest = new IndexRequest(INDEX,TYPE,book.getId()).
                    source(objectMapper.convertValue(book, Map.class));

            bulkRequest.add(indexRequest);
        });

        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

8. We will now create a controller from which we will invoke the bulk insert service method.

package techgabs.controller;

import com.github.javafaker.Faker;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RestController;
import techgabs.service.BulkService;

@RestController
public class Controller {

    @Autowired
    private BulkService bulkService;

    @PostMapping("/faker/bulk/{count}")
    public void bulkInsertWithFakeData(@PathVariable("count") int count){
        bulkService.fakeBulkInsert(count);
    }
}

9. Insert data via a REST client, like Postman or curl.

POST http://localhost:8080/faker/bulk/2

Verify that the data is correctly inserted in ES.

POST http://localhost:9200/book_index/_search

Output:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "book_index",
                "_type": "book_type",
                "_id": "3f2bcae1-4314-4a05-b9f5-86782320e9da",
                "_score": 1,
                "_source": {
                    "id": "3f2bcae1-4314-4a05-b9f5-86782320e9da",
                    "author": "Deenabandhu Banerjee",
                    "genre": "Suspense/Thriller",
                    "publisher": "André Deutsch",
                    "title": "As I Lay Dying"
                }
            },
            {
                "_index": "book_index",
                "_type": "book_type",
                "_id": "1ea8da99-7df7-407f-a059-a2c31ae95138",
                "_score": 1,
                "_source": {
                    "id": "1ea8da99-7df7-407f-a059-a2c31ae95138",
                    "author": "Baalaaditya Banerjee",
                    "genre": "Speech",
                    "publisher": "Signet Books",
                    "title": "The Moving Toyshop"
                }
            }
        ]
    }
}

Summary

We created a sample application to demonstrate how the Faker service generates sample data and then later we inserted that sample data into Elasticsearch. In the process, we also understood how to configure Elasticsearch and use the RestHighLevelClient to create indexes. We also verified the results by using the search REST end point of Elasticsearch.

Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.

Topics:
spring boot ,data generation ,elasticsearch tutorial for beginners ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}