DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Data Annotation's Essential Role in Machine Learning Success
  • Auditing Spring Boot Using JPA, Hibernate, and Spring Data JPA
  • OneStream Fast Data Extracts APIs
  • Artificial Intelligence (AI) Utilizing Deep Learning Techniques to Enhance ADAS

Trending

  • Agile Metrics and KPIs in Action
  • Build a Serverless App Fast With Zipper: Write TypeScript, Offload Everything Else
  • How To Validate Archives and Identify Invalid Documents in Java
  • How To Verify Database Connection From a Spring Boot Application
  1. DZone
  2. Data Engineering
  3. Data
  4. Some thoughts about compression & storage

Some thoughts about compression & storage

Oren Eini user avatar by
Oren Eini
·
Jun. 24, 13 · Interview
Like (0)
Save
Tweet
Share
1.95K Views

Join the DZone community and get the full member experience.

Join For Free

One of the advantages that keeps showing up with levelDB is the notion that it compresses the data on disk by default. Since reading data from disk is way more expensive than the CPU cost of compression & decompression, that is a net benefit.

Or is it? In the managed implementation we are currently working on, we chose to avoid this for now. For a very simple reason. By storing the compressed data on disk, it means that you cannot just give a user the memory mapped buffer and be done with it, you actually have to decompress the data yourself, then hand the user the buffer to the decompressed memory. In other words, instead of having a single read only buffer that the OS will manage / page / optimize for you, you are going to have to allocate memory over & over again, and you’ll pay the cost of decompressing again and again.

I think that it would be better to have the client make that decision. They can send us data that is already compressed, so we won’t need to do anything else, and we would still be able to just hand them a buffer of data. Sure, it sounds like we are just moving the cost around, isn’t it? But what actually happens is that you have a better chance to do optimizations. For example, if I am storing the data compressing via gzip. And I’m exposing the data over the wire, I can just stream the results from the storage directly to the HTTP stream, without having to do anything about it. It can be decompressed on the client.

On the other hand, if I have storage level decompression, I am paying for the cost of reading the compressed data from disk, then allocating new buffer, decompressing the data, then going right ahead and compressing it again for sending over the wire.

Data (computing)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Annotation's Essential Role in Machine Learning Success
  • Auditing Spring Boot Using JPA, Hibernate, and Spring Data JPA
  • OneStream Fast Data Extracts APIs
  • Artificial Intelligence (AI) Utilizing Deep Learning Techniques to Enhance ADAS

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: