Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Explore how PostgreSQL handles large data using TOAST and improves query speed with indexes like B-tree, GIN, and BRIN, cutting query times by over 90%.

arvind toorpu

CORE ·

Jun. 06, 25 · Tutorial

Likes (7)

Comment

Save

3.8K Views

PostgreSQL employs sophisticated techniques for data storage and indexing to ensure efficient data management and fast query performance. This guide explores PostgreSQL's mechanisms, showcases practical examples, and includes simulated performance metrics to illustrate the impact of indexing.

Data Storage in PostgreSQL

Table Structure and TOAST (The Oversized-Attribute Storage Technique)

Table Structure: PostgreSQL stores table data in a format known as a heap. Each table's heap contains one or more pages (blocks), where each page is typically 8KB in size. This size can be altered when compiling PostgreSQL from source. PostgreSQL organizes table data in a heap structure with 8KB pages by default. Rows exceeding a page size are handled using TOAST, which compresses and stores oversized attributes in secondary storage.

Example: Managing Large Text Data

Consider a documents table:

    SQL
   
 

   CREATE TABLE documents (
    doc_id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);
  

Scenario: Storing a document with 10MB of content.
Without TOAST: The entire document resides in the table, slowing queries.
With TOAST: The content is compressed and stored separately, leaving a pointer in the main table.

Expected Performance Improvement

Metric	Without TOAST	With TOAST
Query Execution Time	~4.2 seconds	~2.1 seconds (50% faster)

TOAST significantly reduces table size, enhancing read and write efficiency.

MVCC (Multi-Version Concurrency Control):

Consistency with Row Versions:
PostgreSQL uses MVCC to ensure data consistency and support concurrent transactions. Each transaction sees a snapshot of the database, isolating it from others and preventing locks during long queries.
Transaction Management with XIDs:
Each row version includes Transaction IDs (XIDs) to indicate when it was created and when it expired. This enables PostgreSQL to manage concurrency and recovery efficiently. For example, while editing an inventory item during a sales report generation, MVCC ensures the sales report sees the original data while the update operates independently.

Indexing in PostgreSQL

Indexes in PostgreSQL optimize queries by reducing the need for full-table scans. Below are examples showcasing indexing techniques, their use cases, and expected improvements.

B-Tree Index: Default for Range Queries

B-tree indexes are efficient for equality and range queries.

Example: Product Price Filtering

Given a products table:

    SQL
   
 

   CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT,
    price NUMERIC
);
  

Query Without Index

    SQL
   
   SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~8.3 seconds (full scan on 1 million rows).

Query With B-Tree Index

    MariaDB SQL
   
   CREATE INDEX idx_price ON products(price);

SELECT * FROM products WHERE price BETWEEN 50 AND 100;

Execution Time: ~0.6 seconds (direct row access).

Performance Improvement

Metric	Without Index	With Index	Improvement (%)
Query Execution Time	~8.3 seconds	~0.6 seconds	~92.8% faster

Hash Index: Fast Equality Searches

Hash indexes are ideal for simple equality searches.

Example: User Email Lookup

Given a users table:

    SQL
   
 

   CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT UNIQUE
);
  

Query Without Index

SQL

SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~4.5 seconds (scans 500,000 rows).

Query With Hash Index

SQL

CREATE INDEX idx_email_hash ON users USING hash(email);
SELECT * FROM users WHERE email = '[email protected]';

Execution Time: ~0.3 seconds.

Performance Improvement

Metric	Without Index	With Index	Improvement (%)
Query Execution Time	~4.5 seconds	~0.3 seconds	~93.3% faster

GiST Index: Handling Spatial Data

GiST indexes are designed for complex data types, such as geometric or spatial queries.

Example: Store Locator

Given a locations table:

    SQL
   
 

   CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name TEXT,
    coordinates GEOMETRY(Point, 4326)
);
  

Query Without Index

    SQL
   
   SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~6.7 seconds.

Query With GiST Index

    SQL
   
   CREATE INDEX idx_coordinates_gist ON locations USING gist(coordinates);
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000);

Execution Time: ~1.2 seconds.

Performance Improvement

Metric	Without Index	With Index	Improvement (%)
Query Execution Time	~6.7 seconds	~1.2 seconds	~82% faster

GIN Index: Full-Text Search

GIN indexes optimize composite or multi-value data types, such as arrays or JSON.

Example: Tag Search

Given an articles table:

    SQL
   
 

   CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]
);
  

Query Without Index

    SQL
   
   SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~9.4 seconds.

Query With GIN Index

    SQL
   
   CREATE INDEX idx_tags_gin ON articles USING gin(tags);
SELECT * FROM articles WHERE tags @> ARRAY['technology'];

Execution Time: ~0.7 seconds.

Performance Improvement

Metric	Without Index	With Index	Improvement (%)
Query Execution Time	~9.4 seconds	~0.7 seconds	~92.6% faster

BRIN Index: Large Sequential Datasets

BRIN indexes summarize data blocks, suitable for massive sequential datasets.

Example: Log File Queries

Given a logs table:

    SQL
   
 

   CREATE TABLE logs (
    log_id SERIAL PRIMARY KEY,
    log_time TIMESTAMP,
    message TEXT
);
  

Query Without Index

    SQL
   
   SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~45 seconds.

Query With BRIN Index

    SQL
   
   CREATE INDEX idx_log_time_brin ON logs USING brin(log_time);
SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31';

Execution Time: ~3.2 seconds.

Performance Improvement

Metric	Without Index	With Index	Improvement (%)
Query Execution Time	~45 seconds	~3.2 seconds	~92.9% faster

Performance Considerations

Impact on Writes:
Indexes can slow down INSERT, UPDATE, or DELETE operations as they require updates to all associated indexes. Balancing the number and type of indexes is crucial.

Example: An orders table with multiple indexes may experience slower insert speeds, requiring careful optimization.

Index Maintenance:
Over time, indexes can fragment and degrade in performance. Regular maintenance with commands like REINDEX can restore efficiency:

    SQL
   
   REINDEX INDEX idx_salary;

Using Execution Plans:
Analyze queries with EXPLAIN to understand index usage and identify performance bottlenecks:

    SQL
   
   EXPLAIN SELECT * FROM employees WHERE salary BETWEEN 50000 AND 70000;

Conclusion

PostgreSQL employs effective storage and indexing strategies, such as the TOAST mechanism for handling oversized data and various specialized index types, to significantly enhance query performance. This guide provides examples and performance metrics that showcase the tangible benefits of using indexes in various scenarios. By applying these techniques, database engineers can optimize both read and write operations, leading to robust and scalable database systems.

Data storage Data (computing) PostgreSQL Performance

Opinions expressed by DZone contributors are their own.

Related

Trending

Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

Explore how PostgreSQL handles large data using TOAST and improves query speed with indexes like B-tree, GIN, and BRIN, cutting query times by over 90%.

Data Storage in PostgreSQL

Table Structure and TOAST (The Oversized-Attribute Storage Technique)

Expected Performance Improvement

MVCC (Multi-Version Concurrency Control):

Indexing in PostgreSQL

B-Tree Index: Default for Range Queries

Query Without Index

Query With B-Tree Index

Performance Improvement

Hash Index: Fast Equality Searches

Query Without Index

Query With Hash Index

Performance Improvement

GiST Index: Handling Spatial Data

Query Without Index

Query With GiST Index

Performance Improvement

GIN Index: Full-Text Search

Query Without Index

Query With GIN Index

Performance Improvement

BRIN Index: Large Sequential Datasets

Query Without Index

Query With BRIN Index

Performance Improvement

Performance Considerations

Conclusion

Related

Partner Resources