DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Getting Started With the YugabyteDB Managed REST API
  • Top 10 Pillars of Zero Trust Networks
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Mastering Time Series Analysis: Techniques, Models, and Strategies

Trending

  • Getting Started With the YugabyteDB Managed REST API
  • Top 10 Pillars of Zero Trust Networks
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Mastering Time Series Analysis: Techniques, Models, and Strategies
  1. DZone
  2. Data Engineering
  3. Data
  4. A Disk-Backed ArrayList

A Disk-Backed ArrayList

Learn more about disk-backed ArrayLists and Java performance.

Bozhidar Bozhanov user avatar by
Bozhidar Bozhanov
·
Updated Jan. 02, 20 · Analysis
Like (1)
Save
Tweet
Share
11.93K Views

Join the DZone community and get the full member experience.

Join For Free

Learn more about disk-backed ArrayLists and Java performance.

A Java disk-based ArrayList can sometimes occur when your list becomes too big to fit in memory and you have to do something to avoid running out of memory.

The proper way to do that is streaming — instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

You may also like: Performance Evaluation of Java ArrayLists

However, there are cases when code that's outside of your control requires a List and you can't use streaming. These cases are rather rare, but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list — one that works as a list but underneath stores and loads elements from disk.

Searching for existing solutions results in several 3+ years old repos like this one and this one and this one.

And then there's MapDB, which is great and supported. It's mostly about maps, but it does support a List as well, as shown here.

And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I've done it here - DiskBackedArrayList.java. It doesn't support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn't support random adding and random getting, and also toArray(). It's purely "fill the collection" and then "iterate the collection". It relies on ObjectOutputStream, which is not terribly efficient, but is simple to use.

Note that I've allowed a short in-memory prependList in case small amounts of data need to be prepended to the list.

The list gets filled in memory until a specified threshold and then gets flushed to disk, clearing the memory which starts getting filled again. This too can be more efficient — with background flushing in another thread that doesn't interfere with adding elements to the list, but optimizations complicate things, and in this case, the total running time was not an issue.

Most importantly, the iterator() method is overridden to return a custom iterator that first streams the prepended list, then reads everything from disk and finally iterates over the latest batch, which is still in memory. And finally, the clear() method should be called in the end in order to close the underlying stream.

An output stream could be opened and closed on each flush, but ObjectOutputStream can't be used in append mode due to some implementation-specific about writing headers first.

So basically, we hide the streaming approach underneath a List interface — it's still streaming elements and discarding them when not needed. Ideally, this should be done at the source of the data (e.g. a database, message queue, etc.) rather than using the disk as overflow space, but there are cases where using the disk is fine. This implementation is a starting point, as it's not tested in production, but illustrates that you can adapt existing classes to use different data access patterns if needed.

Further Reading

How to Create an ArrayList in Java

Performance Evaluation of Java ArrayLists

Using Java ArrayLists

Data structure

Published at DZone with permission of Bozhidar Bozhanov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Getting Started With the YugabyteDB Managed REST API
  • Top 10 Pillars of Zero Trust Networks
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Mastering Time Series Analysis: Techniques, Models, and Strategies

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: