DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Comparing Managed Postgres Options on The Azure Marketplace
  • Avoid Cross-Shard Data Movement in Distributed Databases
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare

Trending

  • Designing AI Multi-Agent Systems in Java
  • The Rise of Self‐Service Platforms: How Cloud Development Environments Are Reshaping Dev Culture
  • It’s Not Magic. It’s AI. And It’s Brilliant.
  • Secure DevOps in Serverless Architecture
  1. DZone
  2. Data Engineering
  3. Databases
  4. Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide

Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide

Implement manual PostgreSQL sharding with simple SQL routing functions and measure performance gains through benchmarking.

By 
Aditya Karnam Gururaj Rao user avatar
Aditya Karnam Gururaj Rao
·
May. 20, 25 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
3.0K Views

Join the DZone community and get the full member experience.

Join For Free

Learn how to implement manual sharding in native PostgreSQL using Foreign Data Wrappers. This tutorial walks through creating distributed tables without additional extensions like Citus.

The Challenge With Database Scaling

As applications grow, single-node databases face several challenges:

  1. Limited storage capacity on a single machine
  2. Query performance degradation with growing datasets
  3. Higher concurrency demands exceeding CPU capabilities
  4. Difficulty maintaining acceptable latency for global users

Sharding — horizontally partitioning data across multiple database nodes — offers a solution to these scaling problems.

Why Manual Sharding in PostgreSQL?

While solutions like Citus and other distributed database systems exist, there are compelling reasons to implement manual sharding:

  1. More control: Customize the sharding logic to your specific application needs.
  2. No additional dependencies: Utilize only native PostgreSQL features.
  3. Learning opportunity: Gain a deeper understanding of distributed database concepts.
  4. Incremental adoption: Apply sharding only to specific high-volume tables.
  5. Cloud-agnostic: Implement your solution on any infrastructure.

Setting Up Our Sharded Architecture

Let's implement a simplified manual sharding approach that works with a single PostgreSQL instance. This makes it easier to test and understand the concept before potentially scaling to multiple instances.

Step 1: Create Sharded Tables

First, let's create our sharded tables in a single PostgreSQL database:

SQL
 
CREATE TABLE users_shard1 (
    id BIGINT PRIMARY KEY,
    name TEXT
);

CREATE TABLE users_shard2 (
    id BIGINT PRIMARY KEY,
    name TEXT
);


Note that we're using the BIGINT type for IDs to better handle large data volumes.

Step 2: Index the Shards for Better Performance

Adding indexes improves query performance, especially for our routing functions:

SQL
 
CREATE INDEX idx_user_id_shard1 ON users_shard1(id);


Step 3: Implement Insert Function With Routing Logic

This function routes data to the appropriate shard based on a simple modulo algorithm:

SQL
 
CREATE OR REPLACE FUNCTION insert_user(p_id BIGINT, p_name TEXT)
RETURNS VOID AS $
BEGIN
    IF p_id % 2 = 0 THEN
        INSERT INTO users_shard2 VALUES (p_id, p_name);
    ELSE
        INSERT INTO users_shard1 VALUES (p_id, p_name);
    END IF;
END;
$ LANGUAGE plpgsql;


Step 4: Create Read Function With Routing Logic

For reading data, we'll create a function that routes queries to the appropriate shard:

SQL
 
CREATE OR REPLACE FUNCTION read_user(p_id BIGINT)
RETURNS TABLE(id BIGINT, name TEXT) AS $
BEGIN
    IF p_id % 2 = 0 THEN
        RETURN QUERY SELECT u.id::BIGINT, u.name FROM users_shard2 u WHERE u.id = p_id;
    ELSE
        RETURN QUERY SELECT u.id::BIGINT, u.name FROM users_shard1 u WHERE u.id = p_id;
    END IF;
END;
$ LANGUAGE plpgsql;


Notice we use aliasing and explicit casting to handle any potential type mismatches.

Step 5: Create a Unified View (Optional)

To make queries transparent, create a view that unions the sharded tables:

SQL
 
CREATE VIEW users AS
  SELECT * FROM users_shard1
  UNION ALL
  SELECT * FROM users_shard2;


Step 6: Testing Our Sharded System

Let's test our system with a few simple inserts:

SQL
 
SELECT insert_user(1, 'Alice');
SELECT insert_user(2, 'Bob');
SELECT insert_user(3, 'Carol');
SELECT insert_user(4, 'Dave');


Now, read the data using our routing function:

SQL
 
SELECT * FROM read_user(1);
SELECT * FROM read_user(2);


Or query all data using the unified view:

SQL
 
SELECT * FROM users ORDER BY id;


Benchmarking Our Sharding Implementation

Let's benchmark our implementation to understand the performance characteristics. We'll use Python scripts to test both insertion and read performance.

Python Benchmark Script for Inserts (Sharded)

Here's the script for benchmarking inserts into our sharded tables (seed_pg_sharded.py):

Python
 
import psycopg2
from time import time

conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432")
cur = conn.cursor()

start = time()
for i in range(4, 100_001):
    cur.execute("SELECT insert_user(%s, %s)", (i, f'user_{i}'))
conn.commit()
end = time()

print("Sharded insert time:", end - start)


Python Benchmark Script for Inserts (Single Table)

For comparison, we'll also test insertion performance on a single table (seed_pg_single.py):

Python
 
import psycopg2
from time import time

conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432")
cur = conn.cursor()

start = time()
for i in range(100_001, 100_001 + 500_000):
    cur.execute("INSERT INTO users_base VALUES (%s, %s)", (i, f'user_{i}'))
conn.commit()
end = time()

print("Single-node insert time:", end - start)


Python Benchmark Script for Reads

Finally, we'll compare read performance between the single table and our sharded implementation (read_bench.py):

Python
 
import psycopg2
from time import time

# Configs
conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432")

def time_reads(cur, query, param_fn, label):
    start = time()
    for i in range(1000, 2000):  # Run 1000 point queries
        cur.execute(query, (param_fn(i),))
        cur.fetchall()
    end = time()
    print(f"{label}: {end - start:.3f} sec for 1000 point reads")

# Benchmark single table
with conn.cursor() as cur:
    print("Benchmarking Point Reads on Single Table")
    time_reads(cur, "SELECT * FROM users WHERE id = %s", lambda x: x, "Single Table")

# Benchmark sharded read_user function
with conn.cursor() as cur:
    print("\nBenchmarking Point Reads via read_user() Function")
    time_reads(cur, "SELECT * FROM read_user(%s)", lambda x: x, "Sharded Function")

conn.close()


Adding Range Read Functionality

For more complex queries, we can add a function to read a range of IDs:

PLSQL
 
CREATE OR REPLACE FUNCTION read_user_range(start_id BIGINT, end_id BIGINT)
RETURNS TABLE(id BIGINT, name TEXT) AS $
BEGIN
    -- Query from both shards and union the results
    RETURN QUERY
        (SELECT u.id::BIGINT, u.name
         FROM users_shard1 u
         WHERE u.id BETWEEN start_id AND end_id)
        UNION ALL
        (SELECT u.id::BIGINT, u.name
         FROM users_shard2 u
         WHERE u.id BETWEEN start_id AND end_id)
    ORDER BY id;
END;


This function allows us to read a range of users across both shards in a single query.

Performance Observations

Performance observations

Based on benchmarking results, we can observe several key patterns with manual sharding:

Key patterns with manual sharding


Conclusion

Manual sharding in PostgreSQL offers a powerful approach to horizontal scalability without requiring third-party extensions like Citus. Using a combination of function-based routing and separate tables, we can distribute data efficiently while maintaining a unified interface for our application.

Database Distributed database Shard (database architecture) PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Comparing Managed Postgres Options on The Azure Marketplace
  • Avoid Cross-Shard Data Movement in Distributed Databases
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: