DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Design to Support New Query Parameters in GET Call Through Configurations Without Making Code Changes
  • High-Performance Batch Processing Using Apache Spark and Spring Batch
  • Batch Processing Large Data Sets with Spring Boot and Spring Batch
  • Avoid Cross-Shard Data Movement in Distributed Databases

Trending

  • Solid Testing Strategies for Salesforce Releases
  • Docker Model Runner: Streamlining AI Deployment for Developers
  • Internal Developer Portals: Modern DevOps's Missing Piece
  • My LLM Journey as a Software Engineer Exploring a New Domain
  1. DZone
  2. Data Engineering
  3. Data
  4. Spring Batch CSV Processing

Spring Batch CSV Processing

Explore how you can use Spring Batch to enable enterprise-grade batch processing with this example focusing on CSV files and anime.

By 
Michael Good user avatar
Michael Good
·
Nov. 20, 17 · Tutorial
Likes (8)
Comment
Save
Tweet
Share
128.9K Views

Join the DZone community and get the full member experience.

Join For Free

Welcome! Topics we will be discussing today include the essential concepts of batch processing with Spring Batch and how to import the data from a CSV into a database.

Spring Batch CSV Processing Example Application

We are building an application that demonstrates the basics of Spring Batch for processing CSV files. Our demo application will allow us to process a CSV file that contains hundreds of records of Japanese anime titles.

The CSV

I have downloaded the CSV we will be using from this GitHub repository, and it provides a pretty comprehensive list of animes.

Here is a screenshot of the CSV open in Microsoft Excel

Animes CSV screenshot

View and Download the code from GitHub.

Project Structure

Project structure of spring batch application

Project Dependencies

Besides typical Spring Boot dependencies, we include spring-boot-starter-batch, which is the dependency for Spring Batch as the name suggests, and hsqldb for an in-memory database. We also include commons-lang3 for ToStringBuilder.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.michaelcgood</groupId>
    <artifactId>michaelcgood-spring-batch-csv</artifactId>
    <version>0.0.1</version>
    <packaging>jar</packaging>

    <name>michaelcgood-spring-batch-csv</name>
    <description>Michael C  Good - Spring Batch CSV Example Application</description>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.7.RELEASE</version>
        <relativePath /> <!-- lookup parent from repository -->
    </parent>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.hsqldb</groupId>
            <artifactId>hsqldb</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.6</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>


</project>


Model

This is a POJO that models the fields of an anime. The fields are:

  • ID. For the sake of simplicity, we treat the ID as a String. However, this could be changed to another data type such as an Integer or Long.
  • Title. This is the title of the anime and it is appropriate for it to be a String.
  • Description. This is the description of the anime, which is longer than the title, and it can also be treated as a String.

What is important to note is our class constructor for the three fields: public AnimeDTO(String id, String title, String description). This will be used in our application. Also, as usual, we need to make a default constructor with no parameters or else Java will throw an error.

package com.michaelcgood;

import org.apache.commons.lang3.builder.ToStringBuilder;
/**
 * Contains the information of a single anime
 *
 * @author Michael C Good michaelcgood.com
 */

public class AnimeDTO {

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    } 

    private String id;

    private String title;
    private String description;

    public AnimeDTO(){

    }

    public AnimeDTO(String id, String title, String description){
        this.id = id;
        this.title = title;
        this.description = title;
    }

       @Override
        public String toString() {
           return new ToStringBuilder(this)
                   .append("id", this.id)
                   .append("title", this.title)
                   .append("description", this.description)
                   .toString();
       }

}


CSV File to Database Configuration

There is a lot going on in this class and it is not all written at once, so we are going to go through the code in steps. Visit GitHub to see the code in its entirety.

Reader

As the Spring Batch documentation states FlatFileIteamReader will “read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma)”.

We are dealing with a CSV, so of course the data is delimited by a comma, making this the perfect for use with our file.

@Bean
public FlatFileItemReader < AnimeDTO > csvAnimeReader() {
    FlatFileItemReader < AnimeDTO > reader = new FlatFileItemReader < AnimeDTO > ();
    reader.setResource(new ClassPathResource("animescsv.csv"));
    reader.setLineMapper(new DefaultLineMapper < AnimeDTO > () {
        {
            setLineTokenizer(new DelimitedLineTokenizer() {
                {
                    setNames(new String[] {
                        "id",
                        "title",
                        "description"
                    });
                }
            });
            setFieldSetMapper(new BeanWrapperFieldSetMapper < AnimeDTO > () {
                {
                    setTargetType(AnimeDTO.class);
                }
            });
        }
    });
    return reader;
}


Important points:

  • FlatFileItemReader is parameterized with a model. In our case, this is AnimeDTO.

  • FlatFileItemReader must set a resource. It uses setResource method. Here we set the resource to animescsv.csv

  • setLineMapper method converts Strings to objects representing the item. Our String will be an anime record consisting of an id, title, and description. This String is made into an object. Note that DefaultLineMapper is parameterized with our model, AnimeDTO.

  • However, LineMapper is given a raw line, which means there is work that needs to be done to map the fields appropriately. The line must be tokenized into a FieldSet, which DelimitedLineTokenizer takes care of. DelimitedLineTokenizer returns a FieldSet.

  • Now that we have a FieldSet, we need to map it. setFieldSetMapper is used for taking the FieldSet object and mapping its contents to a DTO, which is AnimeDTO in our case.

Processor

If we want to transform the data before writing it to the database, an ItemProcessor is necessary. Our code does not actually apply any business logic to transform the data, but we allow for the capability to.

Processor in CSVFILETODATABASECONFIG.java

csvAnimeProcessor returns a new instance of the AnimeProcessor object which we review below.

@Bean
ItemProcessor<AnimeDTO, AnimeDTO> csvAnimeProcessor() {
    return new AnimeProcessor();
}


ANIMEPROCESSOR.java

If we wanted to apply business logic before writing to the database, you could manipulate the Strings before writing to the database. For instance, you could add toUpperCase() after getTitle to make the title upper case before writing to the database. However, I decided not to do that or apply any other business logic for this example processor, so no manipulation is being done. The Processor is here simply for demonstration.

package com.michaelcgood;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.springframework.batch.item.ItemProcessor;

public class AnimeProcessor implements ItemProcessor<AnimeDTO, AnimeDTO> {

    private static final Logger log = LoggerFactory.getLogger(AnimeProcessor.class);

    @Override
    public AnimeDTO process(final AnimeDTO AnimeDTO) throws Exception {

        final String id = AnimeDTO.getId();
        final String title = AnimeDTO.getTitle();
        final String description = AnimeDTO.getDescription();

        final AnimeDTO transformedAnimeDTO = new AnimeDTO(id, title, description);

        log.info("Converting (" + AnimeDTO + ") into (" + transformedAnimeDTO + ")");

        return transformedAnimeDTO;
    }

}


Writer

The csvAnimeWriter method is responsible for actually writing the values into our database. Our database is an in-memory HSQLDB, however, this application allows us to easily swap out one database for another. The dataSource is autowired.

@Bean
public JdbcBatchItemWriter<AnimeDTO> csvAnimeWriter() {
     JdbcBatchItemWriter<AnimeDTO> excelAnimeWriter = new JdbcBatchItemWriter<AnimeDTO>();
     excelAnimeWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<AnimeDTO>());
     excelAnimeWriter.setSql("INSERT INTO animes (id, title, description) VALUES (:id, :title, :description)");
     excelAnimeWriter.setDataSource(dataSource);
        return excelAnimeWriter;
}


Step

A Step is a domain object that contains an independent, sequential phase of a batch job and contains all of the information needed to define and control the actual batch processing.

Now that we’ve created the reader and processor for data we need to write it. For the reading, we’ve been using chunk-oriented processing, meaning we’ve been reading the data one at a time. Chunk-oriented processing also includes creating ‘chunks’ that will be written out, within a transaction boundary. For chunk-oriented processing, you set a commit interval and once the number of items read equals the commit interval that has been set, the entire chunk is written out via the ItemWriter, and the transaction is committed. We set the chunk interval size to 1.

I suggest reading the Spring Batch documentation about chunk-oriented processing.

Then the reader, processor, and writer call the methods we wrote.

@Bean
public Step csvFileToDatabaseStep() {
    return stepBuilderFactory.get("csvFileToDatabaseStep")
            .<AnimeDTO, AnimeDTO>chunk(1)
            .reader(csvAnimeReader())
            .processor(csvAnimeProcessor())
            .writer(csvAnimeWriter())
            .build();
}


Job

A Job consists of Steps. We pass a parameter into the Job below because we want to track the completion of the Job.

@Bean
Job csvFileToDatabaseJob(JobCompletionNotificationListener listener) {
    return jobBuilderFactory.get("csvFileToDatabaseJob")
            .incrementer(new RunIdIncrementer())
            .listener(listener)
            .flow(csvFileToDatabaseStep())
            .end()
            .build();
}


Job Completion Notification Listener

The class below autowires the JdbcTemplate because we’ve already set the dataSource and we want to easily make our query. The results of our are query are a list of AnimeDTO objects. For each object returned, we will create a message in our console to show that the item has been written to the database.

SQL

We need to create a schema for our database. As mentioned, we have made all fields Strings for ease of use, so we have made their data types VARCHAR.

Main

This is a standard class with main(). As the Spring Documentation states, @SpringBootApplication is a convenience annotation that includes @Configuration, @EnableAutoConfiguration, @EnableWebMvc, and @ComponentScan.

Demo

Converting

The FieldSet is fed through the processor and “Converting” is printed to the console.
Converting CSV to database in Spring Batch

Discovering New Items In Database

When the Spring Batch Job is finished, we select all the records and print them out to the console individually.
Discovering newly imported items in database in Spring Batch application


Batch Process Complete

When the Batch Process is complete this is what is printed to the console.

Conclusion

Spring Batch builds upon the POJO-based development approach and user-friendliness of the Spring Framework’s to make it easy for developers to create enterprise-gradee batch processing.

The source code is on GitHub.

Spring Framework Spring Batch Database CSV Processing Data Types

Published at DZone with permission of Michael Good, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Design to Support New Query Parameters in GET Call Through Configurations Without Making Code Changes
  • High-Performance Batch Processing Using Apache Spark and Spring Batch
  • Batch Processing Large Data Sets with Spring Boot and Spring Batch
  • Avoid Cross-Shard Data Movement in Distributed Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!