DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Two-Pass Huffman in Blocks of 2 Symbols: Golang Implementation
  • What Is IoT Gateway? Is It Important
  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  • When Perfect Data Breaks: The Journey from Data Quality to Data Observability

Trending

  • Why AI-Generated Code Breaks Your Testing Assumptions
  • Run Gemma 4 on Your Laptop: A Hands-On Guide to Google's Latest Open Multimodal LLM
  • Agentic Testing: Moving Quality From Checkpoint to Control Layer
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  1. DZone
  2. Data Engineering
  3. Data
  4. Why I'll Never Go Back to GZIP After Trying ZSTD

Why I'll Never Go Back to GZIP After Trying ZSTD

Faster speeds. Better compression. Facebook's ZSTD is rapidly becoming the new standard for data compression in modern applications.

By 
Aditya Karnam Gururaj Rao user avatar
Aditya Karnam Gururaj Rao
·
Dec. 10, 24 · Opinion
Likes (0)
Comment
Save
Tweet
Share
13.7K Views

Join the DZone community and get the full member experience.

Join For Free

Data processing speed and efficiency matter most with big datasets. GZIP and ZLIB compressed data for years. But ZSTD often works much better now.

Let us investigate a compression experiment comparing ZSTD, GZIP, and ZLIB regarding speed, compression ratio, and decompression efficiency. By the end, you’ll see why ZSTD should be your go-to choice when compressing large data.

The Experiment Setup

For this experiment, I used a dataset consisting of JSON dictionaries that are commonly used in data transmission and API interactions. The payload size was varied to represent different scales of data compression needs.

Number of Payloads and Their Structure

I used 100,000 JSON payloads, each representing a dictionary with several nested elements. Here’s an example of a single JSON dictionary:

JSON
 
{
  "id": "12345",
  "name": "Sample Entry",
  "attributes": {
    "category": "example",
    "tags": ["test", "sample"],
    "nested": {
      "level1": "data",
      "level2": "more_data"
    }
  },
  "timestamp": "2024-10-05T12:34:56Z"
}


The number of payloads and the size of the JSON structure can be easily modified in the code to suit your dataset requirements. Here’s a snippet from the script used to generate the dataset:

JSON
 
import json
import random
from datetime import datetime

# Generate 100,000 random JSON dictionaries for testing
payloads = []
for i in range(100000):
    payload = {
        "id": str(i),
        "name": f"Sample Entry {i}",
        "attributes": {
            "category": random.choice(["example", "test", "sample"]),
            "tags": ["tag1", "tag2"],
            "nested": {
                "level1": "value1",
                "level2": "value2"
            }
        },
        "timestamp": datetime.utcnow().isoformat()
    }
    payloads.append(json.dumps(payload))

# This payload list is later used for compression testing


You could adjust the number of payloads by modifying the loop’s range, allowing you to test compression across various data sizes.

Compression Methods

We compared the following compression methods:

  • GZIP: A well-established compression algorithm that’s often used in web applications and file compression.
  • ZLIB: Another commonly used compression library that powers formats like PNG.
  • ZSTD (Zstandard): ZSTD is a new compression algorithm made by Facebook. It aims for smaller compressed sizes and faster speeds than other methods. We measured how long it took to compress, how much smaller the data got, and how fast it decompressed the data.

The metrics gathered include compression time, compression ratio (i.e., how much the data was reduced in size), and decompression time.

Results and Analysis

Below is a comparison of the results from each method. This will give you a clear picture of why ZSTD outperforms the other algorithms.

Compression Metrics

GZIP: Compression Time: 21.6559 s, Compression Ratio: 1.3825, Decompression Time: 5.2070 s

Standard ZSTD: Compression Time: 2.8654 s, Compression Ratio: 1.3924, Decompression Time: 2.2661 s

ZSTD Streaming: Compression Time: 3.8126 s, Decompression Time: 1.6418 s

ZSTD Dictionary: Compression Time: 2.6642 s, Decompression Time: 1.8132 s

ZLIB: Compression Time: 18.8420 s, Compression Ratio: 1.3824, Decompression Time: 3.1688 s


Why ZSTD Is the Superior Choice

From these metrics, it’s clear that ZSTD outshines both GZIP and ZLIB in nearly every aspect:

1. ZSTD compresses data much faster than GZIP. It compresses 7-8x faster. It also decompresses 2-3x faster. This saves time when storing and retrieving data.

2. ZSTD shrinks data better than other methods. It has a higher compression ratio. This helps reduce storage costs.

3. ZSTD has different modes:

  • Standard
  • Streaming
  • Dictionary-based
    You can choose the best one for your needs. It works well for both static datasets and streaming data.

Example Code: Compressing Data Using ZSTD

Here’s a snippet showing how you can use ZSTD to compress and decompress data in Python:

Python
 
import zstandard as zstd

# Compressing data using ZSTD
def compress_data(data):
    compressor = zstd.ZstdCompressor()
    return compressor.compress(data.encode('utf-8'))

# Decompressing data using ZSTD
def decompress_data(compressed_data):
    decompressor = zstd.ZstdDecompressor()
    return decompressor.decompress(compressed_data).decode('utf-8')

# Sample usage
json_data = '{"id": "1", "name": "Sample Entry", "attributes": {"category": "test"}}'
compressed = compress_data(json_data)
decompressed = decompress_data(compressed)


Final Graph

Below is a visual representation of the compression times, ratios, and decompression speeds, showcasing the performance of each algorithm:

Compression times, ratios, and decompression speeds of ZSTD, GZIP and ZLIB


This graph highlights how ZSTD consistently outperforms both GZIP and ZLIB in all key metrics.

Conclusion

If you use GZIP or ZLIB to compress data, try ZSTD instead. ZSTD will speed up compression for large stored datasets. It also optimizes data transfer rates. ZSTD saves time, storage space, and compute resources. The performance gains are significant. Test ZSTD in your next project and see the benefits yourself!

Data compression Zlib Data (computing)

Published at DZone with permission of Aditya Karnam Gururaj Rao. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Two-Pass Huffman in Blocks of 2 Symbols: Golang Implementation
  • What Is IoT Gateway? Is It Important
  • Scaling Cloud Data Automation: A Practical Guide to Open Table Formats
  • When Perfect Data Breaks: The Journey from Data Quality to Data Observability

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook