DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Using Java Class Extension Library for Data-Oriented Programming - Part 2
  • Using Java Class Extension Library for Data-Oriented Programming
  • Java 23: What Developers Need to Know
  • Using Lombok Library With JDK 23

Trending

  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation
  • Scaling in Practice: Caching and Rate-Limiting With Redis and Next.js
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Why I Started Using Dependency Injection in Python
  1. DZone
  2. Coding
  3. Languages
  4. uniVocity-parsers: A powerful CSV/TSV/Fixed-width file parser library for Java

uniVocity-parsers: A powerful CSV/TSV/Fixed-width file parser library for Java

By 
Jerry Joe user avatar
Jerry Joe
·
Apr. 27, 15 · Interview
Likes (0)
Comment
Save
Tweet
Share
8.5K Views

Join the DZone community and get the full member experience.

Join For Free
uniVocity-parsers is an open-source project CSV/TSV/Fixed-width file parser library in Java, providing many capabilities to read/write files with simplified API, and powerful features as shown below.

Unlike other libraries out there, uniVocity-parsers built its own architecture for parsing text files, which
focuses on maximum performance and flexibility while making it easy to extend and build new parsers.

Contents

Overview Installation Features Overview Reading CSV/TSV/Fixed-width Files Writing CSV/TSV/Fixed-width Files Performance and Flexibility Design and Implementations

1. Overview

I'm a Java developer working on a web-based system to evaluate telecommunication carriers' network and work out reports. In the system, the CSV format was heavily involved for the network-related data, such as real-time network status (online/offline) for the broadband subscribers, and real-time traffic for each subscriber. Generally the size of a single CSV file would exceed 1GB, with millions of rows included. And we were using the library JavaCSV as the CSV file parser. As growth in the capacity of carriers' network and the time duration our system monitors, the size of data in CSV increased so much. My team and I have to work out a solution to achieve better performance (even in seconds) in CSV files processing, and better extendability to provide much more customized functionality. We came across this library uniVocity-parsers  as a final solution after a lot of testing and analysis, and we found it great. In addition of better performance and extendability, the library provides developers with simplified APIs, detailed documents & tutorials and commercial support for highly customized functionality. This project is hosted at Github  with 62 stars & 8 forks (at the time of writing). Tremendous documents & tutorials are provided at here  and here. You can find more examples and news here as well. In addition, the well-known open-source project Apache Camel integrates uniVocity-parsers for reading and writing CSV/TSV/Fixed-width files. Find more details here.

2. Installation

I'm using version 1.5.1 , but refer to the official download page to see if there's a more recent version available. The project is also available in the maven central repository, so you can add this to your pom.xml:
<dependency>
    <groupId>com.univocity</groupId>
    <artifactId>univocity-parsers</artifactId>
    <version>1.5.1</version>
</dependency>

3. Features Overview

uniVocity-parsers provides a list of powerful features, which can fulfill all requirements you might have for processing tabular presentations of data. Check the following overview chart for the features:

4. Reading Tabular Presentations Data

Read all rows of a csv
CsvParser parser = new CsvParser(new CsvParserSettings());
List<String[]> allRows = parser.parseAll(getReader("/examples/example.csv"));
For full list of demos in reading features, refer to: https://github.com/uniVocity/univocity-parsers#reading-csv 

5. Writing Tabular Presentations Data

Write data in CSV format with just 2 lines of code:
List<String[]> rows = someMethodToCreateRows();

CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings());
writer.writeRowsAndClose(rows);
For full list of demos in writing features, refer to: https://github.com/uniVocity/univocity-parsers/blob/master/README.md#writing 

6. Performance and Flexibility

Here is the performance comparison we tested for uniVocity-parsers and JavaCSV in our system:
File sizeDuration for JavaCSV parsing
Duration for uniVocity-parsers parsing
10MB, 145453 rows
1138ms
836ms
100MB, 809008 rows
23s
6s
434MB, 4499959 rows
91s
28s
1GB, 23803502 rows
245s
70s
Here are some performance comparison tables for almost all CSV parsers libraries in existence. And you can find that uniVocity-parsers got significantly ahead of other libraries in performance. uniVocity-parsers achieved its purpose in performance and flexibility with the following mechanisms:
  • Read input on separate thread (enable by invoking CsvParserSettings.setReadInputOnSeparateThread())
  • Concurrent row processor (refer to ConcurrentRowProcessor which implements RowProcessor)
  • Extend ColumnProcessor to process columns with your own business logic
  • Extend RowProcessor to read rows with your own business logic

7. Design and Implementations

A bunch of processors in uniVocity-parsers are core modules, which are responsible for reading/writing data in rows and columns, and execute data conversions. Here is the diagram of processors: You can create your own processors easily by implementing the RowProcessor interface or extending the provided implementations. In the following example I simply used an anonymous class:
CsvParserSettings settings = new CsvParserSettings();

settings.setRowProcessor(new RowProcessor() {

    /**
    * initialize whatever you need before processing the first row, with your own business logic
    **/
    @Override
    public void processStarted(ParsingContext context) {
        System.out.println("Started to process rows of data.");
    }

    /**
    * process the row with your own business logic
    **/
    StringBuilder stringBuilder = new StringBuilder();
    
    @Override
    public void rowProcessed(String[] row, ParsingContext context) {
        System.out.println("The row in line #" + context.currentLine() + ": ");
        for (String col : row) {
            stringBuilder.append(col).append("\t");
        }
    }

    /**
    * After all rows were processed, perform any cleanup you need
    **/
    @Override
    public void processEnded(ParsingContext context) {
        System.out.println("Finished processing rows of data.");
        System.out.println(stringBuilder);
    }
});

CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader("/myFile.csv"));
The library offers a whole lot more features. I recommend you to have a look as it really made a difference in our project.
Java (programming language) Parser (programming language) Library

Opinions expressed by DZone contributors are their own.

Related

  • Using Java Class Extension Library for Data-Oriented Programming - Part 2
  • Using Java Class Extension Library for Data-Oriented Programming
  • Java 23: What Developers Need to Know
  • Using Lombok Library With JDK 23

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!