DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Intro to AI. Dive into the fundamentals of artificial intelligence, machine learning, neural networks, ethics, and more.

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • Java Memory Management
  • Optimizing Data Repositories Usage in Java Multi-Threaded Applications
  • Lessons Learned Moving From On-Prem to Cloud Native

Trending

  • BPMN 2.0 and Jakarta EE: A Powerful Alliance
  • ChatGPT Code Smell [Comic]
  • Secure Your API With JWT: Kong OpenID Connect
  • Modern Python: Patterns, Features, and Strategies for Writing Efficient Code (Part 1)
  1. DZone
  2. Data Engineering
  3. Databases
  4. How Java Apps Litter Beyond the Heap

How Java Apps Litter Beyond the Heap

A look at the garbage Java apps generate, demonstrated with some help from Postgres and SSDs.

By 
Denis Magda user avatar
Denis Magda
DZone Core CORE ·
Jun. 05, 22 · Analysis
Like (13)
Save
Tweet
Share
8.5K Views

Join the DZone community and get the full member experience.

Join For Free

As Java developers, we’re no strangers to the concept of garbage collection. Our apps generate garbage all the time, and that garbage is meticulously cleaned out by CMS, G1, Azul C4, and other types of collectors. Basically, our apps are born to bring value to this world, but, nothing is perfect—including our apps that leave litter in the Java heap.

However, the story doesn’t end with the Java heap. In fact, it only starts there. Let’s take the example of a basic Java application that uses a relational database such as PostgreSQL and solid-state drives (SSDs) as a storage device. From here, we’ll explore how our applications generate garbage beyond the boundaries of the Java runtime.

Filling Up PostgreSQL With Dead Tuples

When your Java application executes a DELETE or UPDATE statement against a PostgreSQL database, a deleted record is not removed immediately nor is an existing record updated in its place. Instead, the deleted record is marked as a dead tuple and will remain in storage. The updated record is, in fact, a brand new record that PostgreSQL inserts by copying the previous version of the record and updating requested columns. The previous version of that updated record is considered deleted and, as with the DELETE operation, marked as a dead tuple.

There is a good reason why the database engine keeps old versions of the deleted and updated records in its storage. For starters, your application can run a bunch of transactions against PostgreSQL in parallel. Some of those transactions do start earlier than others. But if a transaction deletes a record that still might be of interest to a few transactions started earlier, then the record needs to be kept in the database (at least until the point in time when all earlier started transactions finish). This is how PostgreSQL implements MVCC (multi-version concurrency protocol).

It’s clear that PostgreSQL can’t and doesn’t want to keep the dead tuples forever. This is why the database has its own garbage collection process called vacuuming. There are two types of VACUUM — the plain one and the full one. The plain VACUUM works in parallel with your application workloads and doesn’t block your queries. This type of vacuuming marks the space occupied by dead tuples as free, making it available for new data that your app will add to the same table later. The plain VACUUM doesn’t return the space to the operating system so that it can be reused by other tables or 3rd party applications (except in some corner cases when a page includes only dead tuples and the page is in the end of a table). 

An example of (concurrent) VACUUM in PostgreSQL

An example of (concurrent) VACUUM

By contrast, the full VACUUM does reclaim the free space to the operating system, but it blocks application workloads. You can think of it as Java’s “stop-the-world” garbage collection pause. It’s only in PostgreSQL that such a pause can last for hours (or days). Thus, database admins try their best to prevent the full VACUUM from happening at all.

Let me stop here and move down to the next level — SSDs. Check out this demo-driven article if you’d like to develop a much deeper understanding of vacuuming.

Generating Stale Data in SSDs

If you thought garbage collection is just for software then… surprise, surprise! Some hardware devices also need to perform garbage collection routines. SSDs do garbage collection all the time!

Whenever your Java application deletes or updates any data on disk - through PostgreSQL as discussed above or directly via the Java File API - then the app generates garbage on SSDs.

An SSD stores data in pages (usually between 4KB and 16KB in size) and the latter are grouped in blocks. While your data can be written or read at the page level, the stale (deleted) data can be erased only at the block level. The erasure requires more voltage than for reading/writing operations, and it’s hard to target that voltage at the page level without impacting the adjacent cells.

So, if your Java app updates a file, then, in fact, an updated segment will be written to an empty page potentially in a different block. The segment with the old data will be marked as stale and garbage collected later. First, a garbage collector in SSDs traverses blocks of pages with stale data and moves good data to other blocks (similar to the compaction phase in  Java’s G1 collector). Second, the collector erases blocks that have only stale data left and makes those blocks available to future data.

An example of garbage collection in SSDs

Curious how SSD manufacturers prevent or minimize the number of “stop-the-world” pauses? There is a concept of SSD over-provisioning, when each device comes with an extra space that is unavailable to your apps. That space is a sort of a safe buffer that allows apps to continue writing or modifying data while the garbage collector erases stale data concurrently. Read more about the over-provisioning here. 

Summary

So, next time someone asks you to explain the internals of Java garbage collection, go ahead and surprise them by expanding the topic to include databases and hardware. 

On a serious note, garbage collection is a widespread technique that is used far beyond the Java ecosystem. If implemented properly, garbage collection can simplify the architecture of software and hardware without performance impact. Java, PostgreSQL, and SSDs are all good examples of products that successfully take advantage of garbage collection and still remain among the top products in their categories.

Database engine Relational database app application Blocks Data (computing) Garbage (computer science) garbage collection Java (programming language) PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • Java Memory Management
  • Optimizing Data Repositories Usage in Java Multi-Threaded Applications
  • Lessons Learned Moving From On-Prem to Cloud Native

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: