DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • Java Memory Management
  • Lessons Learned Moving From On-Prem to Cloud Native
  • All You Need To Know About Garbage Collection in Java

Trending

  • Unit Testing Large Codebases: Principles, Practices, and C++ Examples
  • Integrating Model Context Protocol (MCP) With Microsoft Copilot Studio AI Agents
  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • How AI Agents Are Transforming Enterprise Automation Architecture
  1. DZone
  2. Data Engineering
  3. Databases
  4. How Java Apps Litter Beyond the Heap

How Java Apps Litter Beyond the Heap

A look at the garbage Java apps generate, demonstrated with some help from Postgres and SSDs.

By 
Denis Magda user avatar
Denis Magda
DZone Core CORE ·
Jun. 05, 22 · Analysis
Likes (13)
Comment
Save
Tweet
Share
9.3K Views

Join the DZone community and get the full member experience.

Join For Free

As Java developers, we’re no strangers to the concept of garbage collection. Our apps generate garbage all the time, and that garbage is meticulously cleaned out by CMS, G1, Azul C4, and other types of collectors. Basically, our apps are born to bring value to this world, but, nothing is perfect—including our apps that leave litter in the Java heap.

However, the story doesn’t end with the Java heap. In fact, it only starts there. Let’s take the example of a basic Java application that uses a relational database such as PostgreSQL and solid-state drives (SSDs) as a storage device. From here, we’ll explore how our applications generate garbage beyond the boundaries of the Java runtime.

Filling Up PostgreSQL With Dead Tuples

When your Java application executes a DELETE or UPDATE statement against a PostgreSQL database, a deleted record is not removed immediately nor is an existing record updated in its place. Instead, the deleted record is marked as a dead tuple and will remain in storage. The updated record is, in fact, a brand new record that PostgreSQL inserts by copying the previous version of the record and updating requested columns. The previous version of that updated record is considered deleted and, as with the DELETE operation, marked as a dead tuple.

There is a good reason why the database engine keeps old versions of the deleted and updated records in its storage. For starters, your application can run a bunch of transactions against PostgreSQL in parallel. Some of those transactions do start earlier than others. But if a transaction deletes a record that still might be of interest to a few transactions started earlier, then the record needs to be kept in the database (at least until the point in time when all earlier started transactions finish). This is how PostgreSQL implements MVCC (multi-version concurrency protocol).

It’s clear that PostgreSQL can’t and doesn’t want to keep the dead tuples forever. This is why the database has its own garbage collection process called vacuuming. There are two types of VACUUM — the plain one and the full one. The plain VACUUM works in parallel with your application workloads and doesn’t block your queries. This type of vacuuming marks the space occupied by dead tuples as free, making it available for new data that your app will add to the same table later. The plain VACUUM doesn’t return the space to the operating system so that it can be reused by other tables or 3rd party applications (except in some corner cases when a page includes only dead tuples and the page is in the end of a table). 

An example of (concurrent) VACUUM in PostgreSQL

An example of (concurrent) VACUUM

By contrast, the full VACUUM does reclaim the free space to the operating system, but it blocks application workloads. You can think of it as Java’s “stop-the-world” garbage collection pause. It’s only in PostgreSQL that such a pause can last for hours (or days). Thus, database admins try their best to prevent the full VACUUM from happening at all.

Let me stop here and move down to the next level — SSDs. Check out this demo-driven article if you’d like to develop a much deeper understanding of vacuuming.

Generating Stale Data in SSDs

If you thought garbage collection is just for software then… surprise, surprise! Some hardware devices also need to perform garbage collection routines. SSDs do garbage collection all the time!

Whenever your Java application deletes or updates any data on disk - through PostgreSQL as discussed above or directly via the Java File API - then the app generates garbage on SSDs.

An SSD stores data in pages (usually between 4KB and 16KB in size) and the latter are grouped in blocks. While your data can be written or read at the page level, the stale (deleted) data can be erased only at the block level. The erasure requires more voltage than for reading/writing operations, and it’s hard to target that voltage at the page level without impacting the adjacent cells.

So, if your Java app updates a file, then, in fact, an updated segment will be written to an empty page potentially in a different block. The segment with the old data will be marked as stale and garbage collected later. First, a garbage collector in SSDs traverses blocks of pages with stale data and moves good data to other blocks (similar to the compaction phase in  Java’s G1 collector). Second, the collector erases blocks that have only stale data left and makes those blocks available to future data.

An example of garbage collection in SSDs

Curious how SSD manufacturers prevent or minimize the number of “stop-the-world” pauses? There is a concept of SSD over-provisioning, when each device comes with an extra space that is unavailable to your apps. That space is a sort of a safe buffer that allows apps to continue writing or modifying data while the garbage collector erases stale data concurrently. Read more about the over-provisioning here. 

Summary

So, next time someone asks you to explain the internals of Java garbage collection, go ahead and surprise them by expanding the topic to include databases and hardware. 

On a serious note, garbage collection is a widespread technique that is used far beyond the Java ecosystem. If implemented properly, garbage collection can simplify the architecture of software and hardware without performance impact. Java, PostgreSQL, and SSDs are all good examples of products that successfully take advantage of garbage collection and still remain among the top products in their categories.

Database engine Relational database app application Blocks Data (computing) Garbage (computer science) garbage collection Java (programming language) PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • Java Memory Management
  • Lessons Learned Moving From On-Prem to Cloud Native
  • All You Need To Know About Garbage Collection in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!