DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Guide to Optimizing Your Snowflake Data Warehouse for Performance, Cost Efficiency, and Scalability
  • Optimizing Data Storage With Hybrid Partitioned Tables in Oracle 19c
  • Top Methods to Improve ETL Performance Using SSIS
  • Creating a Web Project: Key Steps to Identify Issues

Trending

  • Turn SQL into Conversation: Natural Language Database Queries With MCP
  • How to Reduce Technical Debt With Artificial Intelligence (AI)
  • Memory Leak Due To Mutable Keys in Java Collections
  • Building Resilient Go Apps: Mocking and Testing Database Error Responses
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Explore how PostgreSQL handles large data using TOAST and improves query speed with indexes like B-tree, GIN, and BRIN, cutting query times by over 90%.

By 
arvind toorpu user avatar
arvind toorpu
DZone Core CORE ·
Jun. 06, 25 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
2.6K Views

Join the DZone community and get the full member experience.

Join For Free

PostgreSQL employs sophisticated techniques for data storage and indexing to ensure efficient data management and fast query performance. This guide explores PostgreSQL's mechanisms, showcases practical examples, and includes simulated performance metrics to illustrate the impact of indexing.

Data Storage in PostgreSQL

Table Structure and TOAST (The Oversized-Attribute Storage Technique)

Table Structure: PostgreSQL stores table data in a format known as a heap. Each table's heap contains one or more pages (blocks), where each page is typically 8KB in size. This size can be altered when compiling PostgreSQL from source. PostgreSQL organizes table data in a heap structure with 8KB pages by default. Rows exceeding a page size are handled using TOAST, which compresses and stores oversized attributes in secondary storage.

Example: Managing Large Text Data

Consider a documents table:

SQL
 
CREATE TABLE documents (
    doc_id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);


  • Scenario: Storing a document with 10MB of content.
  • Without TOAST: The entire document resides in the table, slowing queries.
  • With TOAST: The content is compressed and stored separately, leaving a pointer in the main table.

Expected Performance Improvement

Metric Without TOAST With TOAST
Query Execution Time ~4.2 seconds ~2.1 seconds (50% faster)

TOAST significantly reduces table size, enhancing read and write efficiency.

MVCC (Multi-Version Concurrency Control):

  • Consistency with Row Versions:
    PostgreSQL uses MVCC to ensure data consistency and support concurrent transactions. Each transaction sees a snapshot of the database, isolating it from others and preventing locks during long queries.
  • Transaction Management with XIDs:
    Each row version includes Transaction IDs (XIDs) to indicate when it was created and when it expired. This enables PostgreSQL to manage concurrency and recovery efficiently. For example, while editing an inventory item during a sales report generation, MVCC ensures the sales report sees the original data while the update operates independently.

Indexing in PostgreSQL

Indexes in PostgreSQL optimize queries by reducing the need for full-table scans. Below are examples showcasing indexing techniques, their use cases, and expected improvements.

B-Tree Index: Default for Range Queries

B-tree indexes are efficient for equality and range queries.

Example: Product Price Filtering

Given a products table:

SQL
 
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT,
    price NUMERIC
);

Query Without Index

SQL
 
SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~8.3 seconds (full scan on 1 million rows).

Query With B-Tree Index

MariaDB SQL
 
CREATE INDEX idx_price ON products(price);

SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~0.6 seconds (direct row access).

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~8.3 seconds ~0.6 seconds ~92.8% faster


Hash Index: Fast Equality Searches

Hash indexes are ideal for simple equality searches.

Example: User Email Lookup

Given a users table:

SQL
 
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT UNIQUE
);

Query Without Index

SQL
 
SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~4.5 seconds (scans 500,000 rows).

Query With Hash Index

SQL
 
CREATE INDEX idx_email_hash ON users USING hash(email);
SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~0.3 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~4.5 seconds ~0.3 seconds ~93.3% faster


GiST Index: Handling Spatial Data

GiST indexes are designed for complex data types, such as geometric or spatial queries.

Example: Store Locator

Given a locations table:

SQL
 
CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name TEXT,
    coordinates GEOMETRY(Point, 4326)
);

Query Without Index

SQL
 
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~6.7 seconds.

Query With GiST Index

SQL
 
CREATE INDEX idx_coordinates_gist ON locations USING gist(coordinates);
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~1.2 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~6.7 seconds ~1.2 seconds ~82% faster


GIN Index: Full-Text Search

GIN indexes optimize composite or multi-value data types, such as arrays or JSON.

Example: Tag Search

Given an articles table:

SQL
 
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]
);

Query Without Index

SQL
 
SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~9.4 seconds.

Query With GIN Index

SQL
 
CREATE INDEX idx_tags_gin ON articles USING gin(tags);
SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~0.7 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~9.4 seconds ~0.7 seconds ~92.6% faster


BRIN Index: Large Sequential Datasets

BRIN indexes summarize data blocks, suitable for massive sequential datasets.

Example: Log File Queries

Given a logs table:

SQL
 
CREATE TABLE logs (
    log_id SERIAL PRIMARY KEY,
    log_time TIMESTAMP,
    message TEXT
);

Query Without Index

SQL
 
SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~45 seconds.

Query With BRIN Index

SQL
 
CREATE INDEX idx_log_time_brin ON logs USING brin(log_time);
SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~3.2 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~45 seconds ~3.2 seconds ~92.9% faster


Performance Considerations

Impact on Writes:
Indexes can slow down INSERT, UPDATE, or DELETE operations as they require updates to all associated indexes. Balancing the number and type of indexes is crucial.

Example: An orders table with multiple indexes may experience slower insert speeds, requiring careful optimization.

Index Maintenance:
Over time, indexes can fragment and degrade in performance. Regular maintenance with commands like REINDEX can restore efficiency:

SQL
 
REINDEX INDEX idx_salary;

Using Execution Plans:
Analyze queries with EXPLAIN to understand index usage and identify performance bottlenecks:

SQL
 
EXPLAIN SELECT * FROM employees WHERE salary BETWEEN 50000 AND 70000;

Conclusion

PostgreSQL employs effective storage and indexing strategies, such as the TOAST mechanism for handling oversized data and various specialized index types, to significantly enhance query performance. This guide provides examples and performance metrics that showcase the tangible benefits of using indexes in various scenarios. By applying these techniques, database engineers can optimize both read and write operations, leading to robust and scalable database systems.

Data storage Data (computing) PostgreSQL Performance

Opinions expressed by DZone contributors are their own.

Related

  • Guide to Optimizing Your Snowflake Data Warehouse for Performance, Cost Efficiency, and Scalability
  • Optimizing Data Storage With Hybrid Partitioned Tables in Oracle 19c
  • Top Methods to Improve ETL Performance Using SSIS
  • Creating a Web Project: Key Steps to Identify Issues

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: