DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Why Angular Performance Problems Are Often Backend Problems
  • Fine-Tuning of Spring Cache
  • Optimizing Data Loader Jobs in SQL Server: Production Implementation Strategies
  • Oracle Data Loading Reimagined: Performance Strategies for Modern Workloads

Trending

  • Using LLMs to Automate Data Cleaning and Transformation Pipelines
  • Integrating AI-Driven Decision-Making in Agile Frameworks: A Deep Dive into Real-World Applications and Challenges
  • A Scalable Framework for Enterprise Salesforce Optimization: Turning Outcomes Into an Operating System
  • Bringing Intelligence Closer to the Source: Why Real-Time Processing is the Heart of Edge AI
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Explore how PostgreSQL handles large data using TOAST and improves query speed with indexes like B-tree, GIN, and BRIN, cutting query times by over 90%.

By 
arvind toorpu user avatar
arvind toorpu
DZone Core CORE ·
Jun. 06, 25 · Tutorial
Likes (7)
Comment
Save
Tweet
Share
3.8K Views

Join the DZone community and get the full member experience.

Join For Free

PostgreSQL employs sophisticated techniques for data storage and indexing to ensure efficient data management and fast query performance. This guide explores PostgreSQL's mechanisms, showcases practical examples, and includes simulated performance metrics to illustrate the impact of indexing.

Data Storage in PostgreSQL

Table Structure and TOAST (The Oversized-Attribute Storage Technique)

Table Structure: PostgreSQL stores table data in a format known as a heap. Each table's heap contains one or more pages (blocks), where each page is typically 8KB in size. This size can be altered when compiling PostgreSQL from source. PostgreSQL organizes table data in a heap structure with 8KB pages by default. Rows exceeding a page size are handled using TOAST, which compresses and stores oversized attributes in secondary storage.

Example: Managing Large Text Data

Consider a documents table:

SQL
 
CREATE TABLE documents (
    doc_id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);


  • Scenario: Storing a document with 10MB of content.
  • Without TOAST: The entire document resides in the table, slowing queries.
  • With TOAST: The content is compressed and stored separately, leaving a pointer in the main table.

Expected Performance Improvement

Metric Without TOAST With TOAST
Query Execution Time ~4.2 seconds ~2.1 seconds (50% faster)

TOAST significantly reduces table size, enhancing read and write efficiency.

MVCC (Multi-Version Concurrency Control):

  • Consistency with Row Versions:
    PostgreSQL uses MVCC to ensure data consistency and support concurrent transactions. Each transaction sees a snapshot of the database, isolating it from others and preventing locks during long queries.
  • Transaction Management with XIDs:
    Each row version includes Transaction IDs (XIDs) to indicate when it was created and when it expired. This enables PostgreSQL to manage concurrency and recovery efficiently. For example, while editing an inventory item during a sales report generation, MVCC ensures the sales report sees the original data while the update operates independently.

Indexing in PostgreSQL

Indexes in PostgreSQL optimize queries by reducing the need for full-table scans. Below are examples showcasing indexing techniques, their use cases, and expected improvements.

B-Tree Index: Default for Range Queries

B-tree indexes are efficient for equality and range queries.

Example: Product Price Filtering

Given a products table:

SQL
 
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT,
    price NUMERIC
);

Query Without Index

SQL
 
SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~8.3 seconds (full scan on 1 million rows).

Query With B-Tree Index

MariaDB SQL
 
CREATE INDEX idx_price ON products(price);

SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~0.6 seconds (direct row access).

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~8.3 seconds ~0.6 seconds ~92.8% faster


Hash Index: Fast Equality Searches

Hash indexes are ideal for simple equality searches.

Example: User Email Lookup

Given a users table:

SQL
 
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT UNIQUE
);

Query Without Index

SQL
 
SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~4.5 seconds (scans 500,000 rows).

Query With Hash Index

SQL
 
CREATE INDEX idx_email_hash ON users USING hash(email);
SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~0.3 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~4.5 seconds ~0.3 seconds ~93.3% faster


GiST Index: Handling Spatial Data

GiST indexes are designed for complex data types, such as geometric or spatial queries.

Example: Store Locator

Given a locations table:

SQL
 
CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name TEXT,
    coordinates GEOMETRY(Point, 4326)
);

Query Without Index

SQL
 
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~6.7 seconds.

Query With GiST Index

SQL
 
CREATE INDEX idx_coordinates_gist ON locations USING gist(coordinates);
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~1.2 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~6.7 seconds ~1.2 seconds ~82% faster


GIN Index: Full-Text Search

GIN indexes optimize composite or multi-value data types, such as arrays or JSON.

Example: Tag Search

Given an articles table:

SQL
 
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]
);

Query Without Index

SQL
 
SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~9.4 seconds.

Query With GIN Index

SQL
 
CREATE INDEX idx_tags_gin ON articles USING gin(tags);
SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~0.7 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~9.4 seconds ~0.7 seconds ~92.6% faster


BRIN Index: Large Sequential Datasets

BRIN indexes summarize data blocks, suitable for massive sequential datasets.

Example: Log File Queries

Given a logs table:

SQL
 
CREATE TABLE logs (
    log_id SERIAL PRIMARY KEY,
    log_time TIMESTAMP,
    message TEXT
);

Query Without Index

SQL
 
SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~45 seconds.

Query With BRIN Index

SQL
 
CREATE INDEX idx_log_time_brin ON logs USING brin(log_time);
SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~3.2 seconds.

Performance Improvement

Metric Without Index With Index Improvement (%)
Query Execution Time ~45 seconds ~3.2 seconds ~92.9% faster


Performance Considerations

Impact on Writes:
Indexes can slow down INSERT, UPDATE, or DELETE operations as they require updates to all associated indexes. Balancing the number and type of indexes is crucial.

Example: An orders table with multiple indexes may experience slower insert speeds, requiring careful optimization.

Index Maintenance:
Over time, indexes can fragment and degrade in performance. Regular maintenance with commands like REINDEX can restore efficiency:

SQL
 
REINDEX INDEX idx_salary;

Using Execution Plans:
Analyze queries with EXPLAIN to understand index usage and identify performance bottlenecks:

SQL
 
EXPLAIN SELECT * FROM employees WHERE salary BETWEEN 50000 AND 70000;

Conclusion

PostgreSQL employs effective storage and indexing strategies, such as the TOAST mechanism for handling oversized data and various specialized index types, to significantly enhance query performance. This guide provides examples and performance metrics that showcase the tangible benefits of using indexes in various scenarios. By applying these techniques, database engineers can optimize both read and write operations, leading to robust and scalable database systems.

Data storage Data (computing) PostgreSQL Performance

Opinions expressed by DZone contributors are their own.

Related

  • Why Angular Performance Problems Are Often Backend Problems
  • Fine-Tuning of Spring Cache
  • Optimizing Data Loader Jobs in SQL Server: Production Implementation Strategies
  • Oracle Data Loading Reimagined: Performance Strategies for Modern Workloads

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook