DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Hear from the experts in our "DZone 2023 Preview: DevOps Edition" on Fri, Jan 27!
Save your seat
  1. DZone
  2. Data Engineering
  3. Data
  4. Perform Bulk Inserts With Elasticsearch's REST High-Level Client

Perform Bulk Inserts With Elasticsearch's REST High-Level Client

Generating data sets and inserting/ingesting them into databases is a key role of any data scientist. Learn how to do it with Elasticsearch!

Sujith Menon user avatar by
Sujith Menon
·
Jan. 07, 19 · Tutorial
Like (8)
Save
Tweet
Share
39.22K Views

Join the DZone community and get the full member experience.

Join For Free

We would often like to generate some kind of random data when playing with databases or for just throwing some data at our application. Faker can be very useful for these purposes. It generates data for various domain objects that you would want to model in your application. For instance, the first name or last name of a person, book names and their authors and publishers, etc. The entire list of “fakers” (domain objects) is provided in the Faker GiHub readme file. Another interesting/useful feature is that we can also generate “locale” specific data from it.

The Faker gihub repository can be found here: Faker Github

In this tutorial, we will create a sample Spring Boot application and use the above Faker dependency to generate some data and then use that data to populate our Elastic DB. You could use any other database along with any other Java-based application depending on your needs. This will also serve as an example on Elastic Search's REST High-Level Client usage.

1. Let us create a simple Spring Boot application and test the Faker service.

FakerAndESApp.java

package techgabs;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class FakerAndESApp {
    public static void main(String[] args){
        SpringApplication.run(FakerAndESApp.class, args);
    }
}

2. Test the class responsible for displaying the output of the Faker service.

TestFaker.java

package techgabs;
import com.github.javafaker.Faker;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;

import java.util.Locale;

@Component
public class TestFaker {

    @EventListener
    public void test(ApplicationReadyEvent event){

        Faker faker = new Faker(new Locale("en-IND"));
        System.out.println(faker.name().firstName());
        System.out.println(faker.name().lastName());

        System.out.println(faker.name().firstName());
        System.out.println(faker.name().lastName());
    }
}

Sample output:


Chandira
Iyer
Varalakshmi
Naik

Note that the locale was set to “en-IND.” The entire list of locale settings can be found here, Faker Github
Another thing to note is that every time faker.name().firstName() is called, a new string is returned even though the same Faker object is used. Every call to the method returns a new value.

3. Now that we know how Faker works, let us try to generate some book data and insert them into ES.

Let us first get the required list of Gradle dependencies that we need for the project -> build.gradle

plugins {
    id 'java'
}

group 'techgabs.faker.es'
version '1.0-SNAPSHOT'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}
dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.12'

    // https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web
    compile group: 'org.springframework.boot', name: 'spring-boot-starter-web', version: '2.1.0.RELEASE'

    compile 'com.github.javafaker:javafaker:0.16'

    // https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client
    compile 'org.elasticsearch.client:elasticsearch-rest-high-level-client:6.4.2'
}

4. Create a Book model to hold Faker generated data.

Book.java

package techgabs.model;
public class Book {
    public String getAuthor() {
        return author;
    }
    public void setAuthor(String author) {
        this.author = author;
    }
    public String getGenre() {
        return genre;
    }
    public void setGenre(String genre) {
        this.genre = genre;
    }
    public String getPublisher() {
        return publisher;
    }
    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    private String id;
    private String author;
    private String genre;
    private String publisher;
    private String title;
}

Prerequisites for ElasticSearch:

Please make sure the Elasticsearch DB is up and running.

On Mac, I found it easier to install ES via brew.

brew update
brew install elasticsearch

On Windows, you can download the MSI from here -> ElasticSearch MSI For Windows

A better approach in both cases would be to useDdocker to download an ES image and run it.

5. Create a service to generate fake data.

package techgabs.service;


import com.github.javafaker.Faker;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import techgabs.dao.BookDao;
import techgabs.model.Book;

import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.UUID;

@Service
public class BulkService {

    private Faker faker =new Faker(new Locale("en-IND"));

    @Autowired
    private BookDao bookDao;

    public void fakeBulkInsert(int count){

        bookDao.bulkInsert(getFakeBookList(count));
    }

    private List getFakeBookList(int count) {
        List bookList = new ArrayList<>();

        for(int i=0;i < count;i++) {
            Book book = new Book();
            book.setId(UUID.randomUUID().toString());
            book.setAuthor(faker.book().author());
            book.setGenre(faker.book().genre());
            book.setPublisher(faker.book().publisher());
            book.setTitle(faker.book().title());
            bookList.add(book);
        }
        return bookList;
    }
}

We will now use the RestHighLevelClient ES module to perform bulk inserts of the data generated in the previous step. Below is the Config class for creating RestHighLevelClient. Note that it's important to destroy the client explicitly after use. It also uses a low-level RestClient. Please check Elastic Search Rest High Level Client docs for more information.

6. Config Class for Rest High-Level Client for ES.

ESConfig.java 
package techgabs.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.config.AbstractFactoryBean;
import org.springframework.context.annotation.Configuration;

import java.io.IOException;

@Configuration
public class ESConfig extends AbstractFactoryBean {

    private RestHighLevelClient restHighLevelClient;

    @Override
    public Class getObjectType() {
        return RestHighLevelClient.class;
    }

    @Override
    protected RestHighLevelClient createInstance() throws Exception {

        try {
            restHighLevelClient = new RestHighLevelClient(
                    RestClient.builder(new HttpHost("localhost", 9200, "http"),
                            new HttpHost("localhost", 9201, "http")
                    )
            );

        }
        catch (Exception ex){
            System.out.println(ex.getMessage());
        }
        return restHighLevelClient;
    }

    @Override
    public void destroy(){

        try {
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

7. Create a DAO layer to perform bulk inserts.

package techgabs.dao;

import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import techgabs.model.Book;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@Component
public class BookDao {

    private static final String INDEX="book_index";

    private static final String TYPE="book_type";

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private ObjectMapper objectMapper;

    public void bulkInsert(List bookList){

        BulkRequest bulkRequest = new BulkRequest();

        bookList.forEach(book -> {
            IndexRequest indexRequest = new IndexRequest(INDEX,TYPE,book.getId()).
                    source(objectMapper.convertValue(book, Map.class));

            bulkRequest.add(indexRequest);
        });

        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

8. We will now create a controller from which we will invoke the bulk insert service method.

package techgabs.controller;

import com.github.javafaker.Faker;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RestController;
import techgabs.service.BulkService;

@RestController
public class Controller {

    @Autowired
    private BulkService bulkService;

    @PostMapping("/faker/bulk/{count}")
    public void bulkInsertWithFakeData(@PathVariable("count") int count){
        bulkService.fakeBulkInsert(count);
    }
}

9. Insert data via a REST client, like Postman or curl.

POST http://localhost:8080/faker/bulk/2

Verify that the data is correctly inserted in ES.

POST http://localhost:9200/book_index/_search

Output:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "book_index",
                "_type": "book_type",
                "_id": "3f2bcae1-4314-4a05-b9f5-86782320e9da",
                "_score": 1,
                "_source": {
                    "id": "3f2bcae1-4314-4a05-b9f5-86782320e9da",
                    "author": "Deenabandhu Banerjee",
                    "genre": "Suspense/Thriller",
                    "publisher": "André Deutsch",
                    "title": "As I Lay Dying"
                }
            },
            {
                "_index": "book_index",
                "_type": "book_type",
                "_id": "1ea8da99-7df7-407f-a059-a2c31ae95138",
                "_score": 1,
                "_source": {
                    "id": "1ea8da99-7df7-407f-a059-a2c31ae95138",
                    "author": "Baalaaditya Banerjee",
                    "genre": "Speech",
                    "publisher": "Signet Books",
                    "title": "The Moving Toyshop"
                }
            }
        ]
    }
}

Summary

We created a sample application to demonstrate how the Faker service generates sample data and then later we inserted that sample data into Elasticsearch. In the process, we also understood how to configure Elasticsearch and use the RestHighLevelClient to create indexes. We also verified the results by using the search REST end point of Elasticsearch.

REST Web Protocols Elasticsearch Data (computing) Spring Framework application

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Last Chance To Take the DZone 2023 DevOps Survey and Win $250! [Closes on 1/25 at 8 AM]
  • Unlocking the Power of Polymorphism in JavaScript: A Deep Dive
  • A Real-Time Supply Chain Control Tower Powered by Kafka
  • Why Every Fintech Company Needs DevOps

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: