DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Reducing RAG Hallucinations With Relationship-Aware Retrieval
  • Building a Vector Index in Azure AI Search: HNSW, Profiles, and RAG Retrieval
  • Amazon OpenSearch Vector Search Explained for RAG Systems
  • Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search Fills the Gaps)

Trending

  • LLM Agents and Getting Started with Them
  • How to Implement AI Agents in Rails With RubyLLM
  • Advanced Error Handling and Retry Patterns in Enterprise REST Integrations
  • Best Practices for Evaluating LLMs and RAG Systems
  1. DZone
  2. Data Engineering
  3. Data
  4. Resizing the HashMap: Dangers Ahead

Resizing the HashMap: Dangers Ahead

The problems with threading on Hashmap and leaky abstractions. The use of proper monitoring and careful use of data structures is very important in modern coding.

By 
Nikita Salnikov-Tarnovski user avatar
Nikita Salnikov-Tarnovski
·
Aug. 04, 16 · Tutorial
Likes (20)
Comment
Save
Tweet
Share
16.7K Views

Join the DZone community and get the full member experience.

Join For Free

I recently stumbled upon a bug caused by improper usage of java.util.HashMap from multiple threads. The bug was an excellent example of the leaking abstractions. Only the knowledge of the implementation-level details of the data structures helped me solve the issue at hand. So I hope that sharing the problem I faced will encourage some of our readers to familiarize themselves with the ways basic data structures are implemented.

The symptoms I faced raised their ugly head on a day where certain analysis processes which normally take just minutes to complete had been running for hours. Being a true believer in our craft I was timely notified by our own monitoring software and started to investigate the cause.

I also had several thread dumps available from the processing threads. They indicated that the code was just processing entries at the hashmap found inside the heap dump, seemingly in an unterminated loop. So it appeared that the data being analyzed was somehow corrupted, containing a circular reference.

To my surprise, this was indeed the case. The HashMap entries inside the analyzed heap content were referencing one another. When designing the heap analysis algorithms we never expected this to be possible. Apparently we were wrong.

As the HashMap implementation is known not to be threadsafe, I was now suspecting it is somehow related to concurrency issues with HashMap usage. And indeed, there was a problem hidden in the design of the java.util.HashMap. As I am sure you are aware, a HashMap consists of array of buckets with each bucket referring to a linked list of entries. The entries in turn refer to the next entry in list until the last entry refers to null:

how java hashmap is built

What our analyser got stuck with was the situation where two entries referred to one another forming a closed cycle.

circular references in hashmap after resize in multithreaded environment

With the help of Google I discovered how one can end up creating such circular references an issue in a multithreaded environment. As you again are likely aware, the HashMaps are resized dynamically during runtime, based on the number of entries in the map. By default, the HashMaps uses a load factor of 75%. This means that whenever the number of entries in the map exceeds 75% of the available capacity, the map size is increased to avoid too many collisions on map element entries.

So here I had it. Apparently multiple threads had attempted to resize the map at the same time, creating a loop in some of the buckets. The culprit was eventually hidden in the following lines in theJava HashMap source code:

void transfer(Entry[] newTable, boolean rehash) {
... skipped for brevity ...
Entry next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
... skipped for brevity ... 
}

The solution from our analytics endpoint was now easy. We just had to keep a ledger about the processed entries and not process any of the entries twice was all we needed.

I do believe this serves as a great example about failing abstractions. The HashMaps in Java are well built and tend serve you well, even if you do not understand the implementation details. Until they don’t. In such cases, the in-depth knowledge about the data structure implementation details will make all the difference for you.

Data structure

Published at DZone with permission of Nikita Salnikov-Tarnovski. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Reducing RAG Hallucinations With Relationship-Aware Retrieval
  • Building a Vector Index in Azure AI Search: HNSW, Profiles, and RAG Retrieval
  • Amazon OpenSearch Vector Search Explained for RAG Systems
  • Production-Grade RAG: Why Vector Search Isn't Enough (and How Hybrid Search Fills the Gaps)

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook