DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Jackson vs Gson: Edge Cases in JSON Parsing for Java Apps
  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • Migrating MuleSoft System API to AWS Lambda (Part 1)

Trending

  • Your Kubernetes Survival Kit: Master Observability, Security, and Automation
  • Code of Shadows: Master Shifu and Po Use Functional Java to Solve the Decorator Pattern Mystery
  • Memory Leak Due to Uncleared ThreadLocal Variables
  • Create POM With LLM (GitHub Copilot) and Playwright MCP
  1. DZone
  2. Data Engineering
  3. Databases
  4. Writing Large JSON Files With Jackson

Writing Large JSON Files With Jackson

Let's take a look at how to easily write a large amount of JSON data to a file using everyone's favorite JSON library, Jackson! Click here to learn more.

By 
Bozhidar Bozhanov user avatar
Bozhidar Bozhanov
·
Aug. 20, 18 · Tutorial
Likes (8)
Comment
Save
Tweet
Share
31.0K Views

Join the DZone community and get the full member experience.

Join For Free

Sometimes, you need to export a lot of data to a JSON file. Maybe, you need to export all data to JSON or the GDPR "Right to portability," where you effectively need to do the same.

And, as with any big dataset, you can't just fit it all in memory and write it to a file. It takes a while, it reads a lot of entries from the database, and you need to be careful not to make such exports overload the entire system or run out of memory.

Luckily, it's fairly straightforward to do that with the help of Jackson's SequenceWriter and optional piped streams. Here's what it would look like:


    private ObjectMapper jsonMapper = new ObjectMapper();
    private ExecutorService executorService = Executors.newFixedThreadPool(5);

    @Async
    public ListenableFuture<Boolean> export(UUID customerId) {
        try (PipedInputStream in = new PipedInputStream();
                PipedOutputStream pipedOut = new PipedOutputStream(in);
                GZIPOutputStream out = new GZIPOutputStream(pipedOut)) {

            Stopwatch stopwatch = Stopwatch.createStarted();

            ObjectWriter writer = jsonMapper.writer().withDefaultPrettyPrinter();

            try(SequenceWriter sequenceWriter = writer.writeValues(out)) {
                sequenceWriter.init(true);

                Future<?> storageFuture = executorService.submit(() ->
                       storageProvider.storeFile(getFilePath(customerId), in));

                int batchCounter = 0;
                while (true) {
                    List<Record> batch = readDatabaseBatch(batchCounter++);
                    for (Record record : batch) {
                        sequenceWriter.write(entry);
                    }
                }

                // wait for storing to complete
                storageFuture.get();
            }  

            logger.info("Exporting took {} seconds", stopwatch.stop().elapsed(TimeUnit.SECONDS));

            return AsyncResult.forValue(true);
        } catch (Exception ex) {
            logger.error("Failed to export data", ex);
            return AsyncResult.forValue(false);
        }
    }


The above code does a few things:

  • Uses a SequenceWriter to continuously write records. It is initialized with an  OutputStream, which contains everything written. This could be a simple FileOutputStream or a piped stream, as discussed below. Note that the naming here is a bit misleading — writeValues(out) sounds like you are instructing the writer to write something now; instead, it configures it to use a particular stream later.
  • The SequenceWriter is initialized with true, which means "wrap in array." You are writing many identical records, so they should represent an array in the final JSON.
  • Uses PipedOutputStream and PipedInputStream to link the SequenceWriter to a an InputStream, which is then passed to a storage service. If we were explicitly working with files, there would be no need for that — simply passing a FileOutputStream would do. However, you may want to store the file differently, e.g. in Amazon S3, and there the putObject call requires an InputStream  from which reads data and stores it in S3. So, in effect, you are writing to an OutputStream that is directly written to an InputStream, which, when attempted to be read from, gets everything written to another OutputStream.
  • Storing the file is invoked in a separate thread so that writing to the file does not block the current thread, whose purpose is to read from the database. Again, this would not be needed if the simple FileOutputStream  was used.
  • The whole method is marked as @Async (Spring) so that it doesn't block the execution. It gets invoked and finishes when ready, using an internal Spring executor service with a limited thread pool.
  • The database batch reading code is not shown here, as it varies depending on the database. The point is that you should fetch your data in batches, rather than SELECT * FROM X.
  • The OutputStream is wrapped in a GZIPOutputStrea , as text files like JSON, with repetitive elements, benefit significantly from compression.

The main work is done by Jackson's SequenceWriter , and the (kind of obvious) point to take home is — don't assume your data will fit in memory. It almost never does, so do everything in batches and incremental writes. Hope this helps!

JSON Database Jackson (API)

Published at DZone with permission of Bozhidar Bozhanov, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Jackson vs Gson: Edge Cases in JSON Parsing for Java Apps
  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • Migrating MuleSoft System API to AWS Lambda (Part 1)

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: