DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • SQL Server to Postgres Database Migration
  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • PostgreSQL Rewrite Rules

Trending

  • Enforcing Architecture With ArchUnit in Java
  • Monolith: The Good, The Bad and The Ugly
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Apple and Anthropic Partner on AI-Powered Vibe-Coding Tool – Public Release TBD
  1. DZone
  2. Data Engineering
  3. Databases
  4. Scalable Writes to Postgres With Spring

Scalable Writes to Postgres With Spring

This post highlights how our company tackled one of many infrastructure scaling challenges: Scalable writes to the (Postgres) database using Spring & Spring Data.

By 
Aditya Bansal user avatar
Aditya Bansal
·
Jun. 06, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free

Every engineering organization that strikes a chord with customers eventually runs into scaling issues. Scaling a product and an organization puts new demands on both your processes and your infrastructure. In this post, we highlight how our company tackled one of many infrastructure scaling challenges: Scalable writes to the (Postgres) database using Spring & Spring Data.

As we grew our userbase, we began to run into some performance issues, primarily bottlenecked by our upstream Postgres database. Our RPS went over 180 (from < 50) in just a few months, and we started running into problems like SQL connection timeouts, dropped connections, and significantly increased latency. This led to a degraded customer experience, which was unacceptable.

So, we set to work investigating how we could iron out these Postgres bottlenecks. We quickly realized that we were spending way too many cycles making database writes, which was clogging up the system. Each write to Postgres was a single call, meaning that if we wanted to save 50 rows to the database, instead of doing a single SQL call to save all these 50 rows, we’d make 1 call per row!

Root Cause: Using IDENTITY for ID Value Generation in Hibernate

Why weren’t we able to make a batch update? As it turned out, the problem had to do with how we were using Hibernate to generate identifier values (AKA primary keys) for the entities in our database.

The approach we were using involved retrieving values from the IDENTITY columns that Hibernate dynamically maintains as new entities are inserted into the database. Our writes to the DB for new resources were done without specifying the id (primary key), and instead used GenerationType.IDENTITY.

Here’s what our Spring entity looked like:

Kotlin
 
@Entity
@Table(name = "entity")
data class Entity(
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    val id: Long? = null,
    val metadata: String,
) : TenantEntity()


With this strategy, it was incredibly simple to use the ORM to both create and update existing resources:

  • If an id isn’t passed, a new row will be created.
  • If an id is passed, the existing row will be updated.

Sounds straightforward, right? We thought so too! And it seemed to work well — until we realized that using IDENTITY introduced a major performance issue. The downside of this strategy is that batch updates won’t work.

This posed a big problem for us because all our entities used IDENTITY identifier value generation. For each of our existing tables and their corresponding entities, we’d have to swap our strategy from IDENTITY to a different one that would support batch insert statements.

Migrating From IDENTITY to Sequence-Based ID Generation

Upon investigating other generation types we could use for our entities that would support batching, we came across Hibernate’s sequence-based identifier value generation. This strategy is backed by an underlying database sequence. Hibernate will request the next available id from the sequence to get the new id for the resource.

While the underlying mechanics of this strategy is beyond the scope of this article, the bottom line is that this sequence-based strategy would enable batch inserts for us.

Now we needed to figure out how to migrate from our existing IDENTITY strategy to the new sequence-based approach.

Upon investigating this further, we realized that our existing tables already had a Postgres sequence. So if we had a table defined like so:

SQL
 
CREATE TABLE IF NOT EXISTS entity (
    id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    ...
)


…a sequence with the name entity_id_seq would be created!

You can run the following SQL command to check the existence of the sequence:

 
SELECT
	*
FROM
	pg_sequence
WHERE
	seqrelid = 'entity_id_seq'::regclass;


Since we were able to easily access the sequence for our Postgres tables, we could make a very localized change to switch over to using the sequence-based strategy for id generation. 

For each entity, we only had to change a few lines of code to solve our performance bottleneck. An updated entity looks like this:

Kotlin
 
private const val TABLE = "entity"
private const val SEQUENCE = "${TABLE}_id_seq"
@Entity
@Table(name = TABLE)
data class Entity(
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = SEQUENCE)
    @SequenceGenerator(name = SEQUENCE, sequenceName = SEQUENCE, allocationSize = 50)
    @Column(name = "id")
    val id: Long? = null,
		val metadata: String,
) : TenantEntity()


AllocationSize and Sequence Increment Size

One thing to note here is that the allocationSize property in Hibernate needs to be the same as the increment size for the underlying sequence in Postgres.

This is so that Hibernate and the underlying sequence don’t go “out of sync” in terms of the ids that they’re holding. This also prevents any issues with a distributed architecture where multiple servers are writing to the same table.

By default, the increment size for our Postgres sequences was 1. We wrote a very quick migration to change it to match our allocationSize:

ALTER SEQUENCE entity_id_seq INCREMENT 50;

Now, Hibernate will only need to make 1 call to get the list of ids per 50 inserts.

And it’ll only need 1 call to insert those 50 rows as well.

Here’s a summary of what we learned from this issue: 

  • With Hibernate, start using database sequence-based identity value generation as soon as possible — especially if you foresee the number of writes increasing.

  • Keep the allocationSize and the underlying Postgres sequence increment size params the same to avoid id collisions and support a distributed system.

Finally, here’s a screenshot of our RPS going from near 180 to about 90 after we implemented this change.

RPS going from 180 to about 90


Database sql PostgreSQL

Published at DZone with permission of Aditya Bansal. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • SQL Server to Postgres Database Migration
  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • How to Store Text in PostgreSQL: Tips, Tricks, and Traps
  • PostgreSQL Rewrite Rules

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!