DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Building a Cost-Effective ELK Stack for Centralized Logging
  • Optimizing Vector Search Performance With Elasticsearch
  • CRUDing NoSQL Data With Quarkus, Part Two: Elasticsearch
  • Pydantic and Elasticsearch: Dynamic Couple for Data Management

Trending

  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • Unlocking AI Coding Assistants: Generate Unit Tests
  • A Deep Dive Into Firmware Over the Air for IoT Devices
  • MySQL to PostgreSQL Database Migration: A Practical Case Study
  1. DZone
  2. Data Engineering
  3. Data
  4. Compress Your Data Within Elasticsearch

Compress Your Data Within Elasticsearch

Learn how to compress your data within Elasticsearch to reduce network latency.

By 
Hakan Altındağ user avatar
Hakan Altındağ
·
Jun. 22, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
15.4K Views

Join the DZone community and get the full member experience.

Join For Free

Compressing is awesome, making something smaller than the original size sounds like magic but it is possible. We know it from our WinRar, 7Zip or other tools. Even Elasticsearch has a property to compress the data which will be tossed between the nodes and the clients, this could be very useful to reduce network latency when handling huge responses from Elasticsearch. Within this article we will cover the following topics:

  1. Enable HTTP/TCP compression
  2. Handling compressed responses
    • Elasticsearch 7.7 and below
    • Elasticsearch 7.8 and upwards
    • Future Elasticsearch release 7.9 and 8.0

Most of us are already familiar with Elasticsearch from Elastic when working with application logs, but a-lot of people never heard about. Below is a short summary:

What is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open-source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.

Enable HTTP/TCP Compression

Elastic has made it really easy to enable http compression on their nodes. Just providing the following properties within the elasticsearch.yml file will do the trick:

YAML
 




x


 
1
http.compression: true
2
http.compression_level: 1


Or the following property for tcp compression:

YAML
 




xxxxxxxxxx
1


 
1
transport.compress: true



Handling Compressed Responses

With the changes from the previous section we enabled compression. When enabling tcp compression you don't need to do anything for handling the compressed data. Elasticsearch will use tcp communication protocol to communicate between the different Elasticsearch nodes and it is able to decompress that by itself. But when you have enabled http compression your client (terminal, postman, java client) needs to know how to decompress that or else you will get not human-readable data. For this article we will focus on the java client.

18th of June 2020 Elastic has released Elasticsearch 7.8 with their java library which makes handling compressed data easier, see here the release notes: Elasticsearch 7.8 release notes

Even though you enabled Elasticsearch to send you compressed data, Elasticsearch will only compress it when the client is requesting for it. The java client can request for it by sending additional request options within the http request, see below for an example:

Java
 




x


 
1
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
2
    .addHeader("Accept-Encoding", "gzip")
3
    .addHeader("Content-type", "application/json");


 

Handling Compressed Responses Elasticsearch 7.7 and Below

The Java library of Elastic provides two clients: Rest High Level Client and Low Level Rest Client. The high level client doesn't support handeling compressed responses. It will throw a runtime exception when it is receiving a compressed response. The low level client will provide you the raw response send by Elasticsearch and therefor it is possible to decompress that. There are multiple ways, but we will cover two ways for this article:

Java
 




x





1
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
2
    .addHeader("Accept-Encoding", "gzip")
3
    .addHeader("Content-type", "application/json");
4
 
          
5
Request request = new Request("GET", "test/_search");
6
request.setOptions(requestOptions);
7
 
          
8
Response response = client.getLowLevelClient().performRequest(request);
9
byte[] entity = EntityUtils.toByteArray(response.getEntity())
10
 
          
11
String decompressedResponse = "";
12
try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(entity);
13
     GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream);
14
     BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream, StandardCharsets.UTF_8));) {
15
    
16
    decompressedResponse = bufferedReader.lines()
17
                    .collect(Collectors.joining());
18
}
19
 
          
20
System.out.println(decompressedResponse)


It can also be decompressed with the following snippet:

Java
 




x


 
1
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
2
    .addHeader("Accept-Encoding", "gzip")
3
    .addHeader("Content-type", "application/json");
4
 
          
5
Request request = new Request("GET", "test/_search");
6
request.setOptions(requestOptions);
7
 
          
8
Response response = client.getLowLevelClient().performRequest(request);
9
String decompressedResponse = EntityUtils.toString(new GzipDecompressingEntity(response.getEntity()))
10
 
          
11
System.out.println(decompressedResponse)



Handling Compressed Responses Elasticsearch 7.8

The rest high level client has now the feature with release 7.8 to automatically decompress compressed data. The example above could be rewritten:

Java
 




x
9


 
1
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
2
    .addHeader("Accept-Encoding", "gzip")
3
    .addHeader("Content-type", "application/json");
4
  
5
SearchRequest request = new SearchRequest("twitter");
6
SearchResponse searchResponse = client.search(searchRequest, requestOptions);
7
 
          
8
System.out.println(decompressedResponse)
6
String decompressedResponse = EntityUtils.toString(new GzipDecompressingEntity(response.getEntity()))



So you as a developer don't need to write additional logic to handle the response. If you are using the low level REST client you still need to write your own custom decompression logic as seen within the previous example above.

Handling Compressed Responses with Future Elasticsearch Release 7.9 and 8.0

With the upcoming release the low level client will also have the built in decompression feature, see the details within this pull request: 55413. This will make the developer experience of the Java-library users much better as it doesn't require any custom decompression logic within your code base. The code example for the low level client will be:

Java
 




x


 
1
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
2
    .addHeader("Accept-Encoding", "gzip")
3
    .addHeader("Content-type", "application/json");
4
 
          
5
Request request = new Request("GET", "test/_search");
6
request.setOptions(requestOptions);
7
 
          
8
Response response = client.getLowLevelClient().performRequest(request);
9
String responseBody = EntityUtils.toString(response.getEntity())
10
 
          
11
System.out.println(responseBody)


This change within the low-level client will also have breaking change for their end-users. If you already have your own decompression logic it will probably throw a runtime exception as it will try to decompress not compressed data (because it is already decompressed by the client). The rest high-level client remains untouched and still has the same ability to decompress compressed data out of the box.

Hope you enjoyed reading the small, yet big change within the Java API for the different versions of Elasticsearch!

Elasticsearch Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Building a Cost-Effective ELK Stack for Centralized Logging
  • Optimizing Vector Search Performance With Elasticsearch
  • CRUDing NoSQL Data With Quarkus, Part Two: Elasticsearch
  • Pydantic and Elasticsearch: Dynamic Couple for Data Management

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!