DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Trending

  • Cosmos DB Disaster Recovery: Multi-Region Write Pitfalls and How to Evade Them
  • Optimizing Integration Workflows With Spark Structured Streaming and Cloud Services
  • Accelerating Debugging in Integration Testing: An Efficient Search-Based Workflow for Impact Localization
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2
  1. DZone
  2. Data Engineering
  3. Databases
  4. Spring Boot: Boost JPA Bulk Insert Performance by 100x

Spring Boot: Boost JPA Bulk Insert Performance by 100x

Want to improve your insert records? In this article, you can learn how to improve bulk insert performance by 100x using Spring Data JPA.

By 
Amrut Prabhu user avatar
Amrut Prabhu
·
Updated Jan. 29, 21 · Tutorial
Likes (14)
Comment
Save
Tweet
Share
83.9K Views

Join the DZone community and get the full member experience.

Join For Free

I was facing a problem where I wanted to insert millions of records into the database, which needed to be imported from the file.

So, I did some research around this, and I would like to share with you what I found which helped me improve the insert records throughput by nearly 100 times.

Initially, when I was just trying to do bulk insert using spring JPA’s saveAll method, I was getting a performance of about 185 seconds per 10,000 records. After doing the following changes below, the performance to insert 10,000 records was just in 4.3 seconds.

Yes, 4.3 Seconds for 10k records.

So, to achieve this, I had to change the way I was inserting data.

1. Change the Number of Records While Inserting

When I was inserting initially, I was pushing all the 10k records from the list directly by calling the saveAll method. I changed this to the batch size of 30. You could also increase the batch size to even 60, but it doesn’t half the time taken to insert records. See the table below.

For this, you need to set the hibernate property batch_size=30 .

Properties files
 




xxxxxxxxxx
1


 
1
spring.jpa.properties.hibernate.jdbc.batch_size=30



Then, I added the following connection string properties:

Properties files
 




xxxxxxxxxx
1


 
1
cachePrepStmts=true
2
useServerPrepStmts=true
3
rewriteBatchedStatements=true
4

          
5
e.g
6
jdbc:mysql://localhost:3306/BOOKS_DB?serverTimezone=UTC&cachePrepStmts=true&useServerPrepStmts=true&rewriteBatchedStatements=true



2. Send the Batched Records

Next, I changed the code for inserting, so that saveAll methods get batch sizes of 30 to insert as per what we also set in the properties file. A very crude implementation of something like this:

Java
 




xxxxxxxxxx
1


 
1
for (int i = 0; i < totalObjects; i = i + batchSize) {
2
    if( i+ batchSize > totalObjects){
3
        List<Book> books1 = books.subList(i, totalObjects - 1);
4
        repository.saveAll(books1);
5
        break;
6
    }
7
    List<Book> books1 = books.subList(i, i + batchSize);
8
    repository.saveAll(books1);
9
}



This reduced the time by a little; it dropped from 185 secs to 153 Secs. That's approximately an 18% improvement.

3. Change the ID Generation Strategy

This made a major impact.

Initially, I was using the @GeneratedValue annotation with strategy i.e GenerationType.IDENTITY on my entity class.

Hibernate has a disabled batch update with this strategy because it has to make a select call to get the id from the database to insert each row. You can read more about it here.

I changed the strategy to SEQUENCE and provided a sequence generator.

Java
 




xxxxxxxxxx
1


 
1
public class Book {
2
    @Id
3
    @GeneratedValue(strategy = SEQUENCE, generator = "seqGen")
4
    @SequenceGenerator(name = "seqGen", sequenceName = "seq", initialValue = 1)
5
    private Long id;



This drastically changed the insert performance, as Hibernate was able to leverage bulk insert.

From the previous performance improvement of 153 secs, the time to insert 10k records reduced to only 9 secs. That's an increase in performance by nearly 95%.

Note: MySQL doesn’t support creating sequences. 

To get around this, I created a table with the name of the sequence having a single field called next_val. Then, I added a single row with an initial value.

For the above sequence, I created the following:

SQL
 




x


 
1
CREATE TABLE `seq` (
2
  `next_val` bigint(20) DEFAULT NULL
3
);
4

          
5
INSERT INTO `seq` (`next_val`) VALUES(1);



Hibernate then used the table below as a sequence generator.

Next, I pushed it further to use higher batch sizes, and I noticed that doubling the batch size does not double down on time. The time to insert only gradually reduces. You can see this below:

Performance over time screenshot.

The most optimal batch size for my case was 1,000, which took around 4.39 secs for 10K records. After that, I saw the performance degrading, as you can see in the graph below.

Here are the stats I got:

Stats screenshot.

As always, you can find the code on my GitHub repo.

Database Boost (C++ libraries)

Published at DZone with permission of Amrut Prabhu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!