DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Performing and Managing Incremental Backups Using pg_basebackup in PostgreSQL 17
  • From Concept to Cloud: Building With Cursor and the Heroku MCP Server
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • Integrating Redis With Message Brokers

Trending

  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • 5 Subtle Indicators Your Development Environment Is Under Siege
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • How to Convert XLS to XLSX in Java
  1. DZone
  2. Data Engineering
  3. Databases
  4. Introducing the MERGE Command in PostgreSQL 15

Introducing the MERGE Command in PostgreSQL 15

This article explores the advantages of the MERGE command in PostgreSQL, with real-world use cases, automation techniques, and performance benchmarking results.

By 
Vivek Singh user avatar
Vivek Singh
·
Dec. 02, 24 · Analysis
Likes (9)
Comment
Save
Tweet
Share
8.3K Views

Join the DZone community and get the full member experience.

Join For Free

Developers often need to merge data from external sources to the base table. The expectation of this merge operation is that data from external sources refresh data at the main table. Refresh means inserting new records, updating existing records, and deleting if a record is not found. 

Since the 9.4 version release, PostgreSQL has supported the INSERT command with the ‘ON CONFLICT’ clause. While this proved to be a workaround for the MERGE command, many features, such as conditional delete and simplicity of query, were missing. Other databases, such as Oracle and SQL Server, already support MERGE commands. Until PostgreSQL 15, it was still one of the long-awaited features for PostgreSQL users. 

This article discusses the advantages, use cases, and benchmarking of the MERGE command, which is newly supported by PostgreSQL 15. 

How the MERGE Command Works in PostgreSQL

The MERGE command selects rows from one or more external data sources to UPDATE, DELETE, or INSERT on the target table. Depending on the condition used in the MERGE command, the user running the MERGE command should have UPDATE, DELETE, and INSERT privileges on the target table. The user should also have SELECT privileges on the source table(s).

 PostgreSQL 15 MERGE Syntax


A typical PostgreSQL MERGE command looks like as below:

SQL
 
MERGE INTO target_table t1
USING source_table t2
ON t2.col1 = t2.col1
WHEN MATCHED THEN
UPDATE/INSERT/DELETE ...
WHEN NOT MATCHED THEN
 UPDATE/INSERT/DELETE;


Example of a MERGE Command

The image below explains a typical use case of the MERGE command in PostgreSQL. The source data that matches the target data can be updated or deleted, and the source data that does not match can be inserted into the target data. 

Understanding PostgreSQL MERGE command


For example, the image below shows a classic data warehouse populated with data from various data marts. 

A classic data warehouse wherein the data is populated with data coming from various data marts


Assume the data warehouse hosts data from IOT sensor stations. These stations host data marts 1, 2, 3, and so forth at respective stations. These data marts are hosted locally on the stations. In the steps below, we’ll merge data from different stations to the main stations.

Step-by-Step

Step 1: Create the Data Warehouse Table (Target Station Table)

Create the data warehouse table, the target station table that is loaded by other stations:

SQL
 
Postgres=> CREATE table station_main (

    station_id   integer   primary key

  , data text

  , create_time      timestamp default current_timestamp

  , update_time      timestamp default current_timestamp

);

CREATE TABLE


Step 2: Insert Sample Data into the Target Table

Load some sample records into the target table:

SQL
 
postgres=> INSERT into station_main VALUES (1, 'data11'), (2, 'data22'), (3, 'data44'), (4, 'data44'), (5, 'data55');

INSERT 0 5


Step 3: Verify Data Insertion

Check the data to ensure it's inserted correctly:

SQL
 
postgres=> SELECT * from station_main;

 station_id |  data   |        create_time        |         update_time

------------+--------+---------------------------+---------------------------

          1 | data11 | 2023-08-11 21:21:30.19364 | 2023-08-11 21:21:30.19364

          2 | data22 | 2023-08-11 21:21:30.19364 | 2023-08-11 21:21:30.19364

          3 | data44 | 2023-08-11 21:21:30.19364 | 2023-08-11 21:21:30.19364

          4 | data44 | 2023-08-11 21:21:30.19364 | 2023-08-11 21:21:30.19364

          5 | data55 | 2023-08-11 21:21:30.19364 | 2023-08-11 21:21:30.19364

(5 rows)


Step 4: Create a Table for Various Stations

Create a table for data from other stations:

SQL
 
postgres=> create table station_1 (

station_id   integer

, data text

);

CREATE TABLE

Time: 4.288 ms


Step 5: Insert Sample Data into the Station Table

Insert sample records into the newly created station table:

SQL
 
postgres=> INSERT INTO station_1 VALUES (1, 'data11'), (2, 'data22'), (3, 'data44'), (4, 'data44'), (5, 'data55');

INSERT 0 5

Time: 2.221 ms


Step 6: Verify Data in Station 1

Check the inserted data in station_1:

SQL
 
postgres=> SELECT * from station_1;

 station_id |  data

------------+--------

          1 | data11

          2 | data22

          3 | data44

          4 | data44

          5 | data55

(5 rows)


Step 7: Insert New Data and Update Existing Data in Station 1

Assuming station_1 updates some base values and inserts new values, insert new data and update existing data: 

SQL
 
postgres=> INSERT INTO station_1 VALUES (6, 'data66');

INSERT 0 1

postgres=>

postgres=> UPDATE station_1 set data='data10' where station_id=1;

UPDATE 1

postgres=>

postgres=> SELECT * from station_1;

 station_id |  data

------------+---------

          2 | data22

          3 | data44

          4 | data44

          5 | data55

          6 | data66

          1 | data10

(6 rows)


Step 8: Merge Data from Station 1 to Station Main

If station_1 only inserts new data, we could have used INSERT INTO SELECT FROM command. But, in this scenario, few data have been modified, so we need to merge the data:

SQL
 
 postgres=> MERGE INTO station_main sm

           USING  station_1 s

           ON sm.station_id=s.station_id

           when matched then

           update set data=s.data

            WHEN NOT MATCHED THEN

           INSERT (station_id, data)

            VALUES (s.station_id, s.data);

MERGE 6


Verify data after the merge:

SQL
 
postgres=> SELECT * from station_main ;
 postgres=> SELECT * from station_main ;

 station_id |  data   |        create_time         |        update_time

------------+--------+----------------------------+----------------------------

          2 | data22 | 2023-08-11 21:27:23.076226 | 2023-08-11 21:27:23.076226

          3 | data44 | 2023-08-11 21:27:23.076226 | 2023-08-11 21:27:23.076226

          4 | data44 | 2023-08-11 21:27:23.076226 | 2023-08-11 21:27:23.076226

          5 | data55 | 2023-08-11 21:27:23.076226 | 2023-08-11 21:27:23.076226

          6 | data66 | 2023-08-11 21:29:13.354859 | 2023-08-11 21:29:13.354859

          1 | data10 | 2023-08-11 21:27:23.076226 | 2023-08-11 21:27:23.076226

(6 rows)


Advantages of the Merge Command

MERGE command helps easily manage a set of external data files such as application log files. For example, you can host data and log files into separate tablespaces hosted at cheaper storage disks and then create a MERGE table to use them as one. 

Other advantages include:

  • Better speed: Small source tables perform better than a single large table. You can split a large source table based on some clause and then use individual tables to merge into the target table.
  • More efficient searches: Searching in underlying smaller tables is quicker than searching in one large table.
  • Easier table repair: Repairing individual smaller tables is easier than a large table. Here, repairing tables means removing data anomalies.
  • Instant mapping: A MERGE table does not need to maintain an index of its own, making it fast to create or remap. As a result, MERGE table collections are very fast to create or remap. (You must still specify the index definitions when you create a MERGE table, even though no indexes are created.)

 If you have a set of tables from which you create a large table on demand, you can instead create a MERGE table from them on demand. This is much faster and saves a lot of disk space.

Use Cases of MERGE Command in PostgreSQL 15 

Below are the common use cases of MERGE commands:

  • IoT data collecting from various machines: Data coming from various machines can be saved in smaller temporary tables and can be merged with target data at certain intervals. 
  • E-commerce data: Product availability data coming from various fulfillment centers can be merge with central repository of target product table. 
  • Customer data: Customer transaction data coming from different sources can be merged with main customer data. 

Automation of MERGE Command in PostgreSQL Using pg_cron

Automating MERGE is important for error-prone MERGE execution and reducing maintenance hassle. PostgreSQL supports pg_cron extension. This extension is used to configure scheduled tasks for a PostgreSQL database. A typical pg_cron command mentioned below shows the scheduling options:

  • SELECT cron.schedule('Minute Hour Date Month Day of the week', 'Task');

The pg_cron job below schedules MERGE command every minute:

SQL
 
SELECT cron.schedule('30 * * * *', $$MERGE INTO station_main sm

           USING  station_1 s

           ON sm.station_id=s.station_id

           when matched then

           update set data=s.data

            WHEN NOT MATCHED THEN

           INSERT (station_id, data)

            VALUES (s.station_id, s.data);$$);

Screenshot of the pg_cron job scheduling MERGE command every minute


Benchmarking: MERGE vs. UPSERT

In this section, we will benchmark and compare traditional UPSERT (i.e., INSERT with ‘ON CONFLICT’ clause) and the MERGE command. 

Setup

Step 1: Create a target table with 1,000,000 records, with id column as primary key:

SQL
 
CREATE table target_table as
(SELECT
generate_series(1,1000000) AS id,
      floor(random() * 1000) AS data);    

ALTER TABLE target_table   ADD PRIMARY KEY (id);


Screenshot of a target table with 1,000,000 records


Step 2: Create a source data table with all the contents of the target table and id column as a primary key:

SQL
 
CREATE table source_table as SELECT * from target_table;

ALTER TABLE source_table   ADD PRIMARY KEY (id);

 Screenshot of a source data table

Step 3: Create a temporary table with additional data:

SQL
 
CREATE table source_table_temp as
(SELECT
generate_series(1000001,1200000) AS id,
      floor(random() * 1000) AS data);

 Screenshot of a temporary table

Step 4: Insert data from the temporary table to the source table and update 400,000 rows:

Screenshot of data inserted from the temporary table to the source table


Step 5: Turn timing on and run the UPSERT command. The total time taken for the UPSERT operation is 5904 ms.

Screenshot of the turning the timing on and running the UPCERT command


Now, we repeat the above steps and run the MERGE command. The total time it takes for the MERGE operation is 4,484 ms.   

Screenshot of running the MERGE command


For our specific use case, we'll find that the MERGE command performed 30% better compared to the UPSERT operation.

Conclusion

MERGE command has been one of the most popular features added to PostgreSQL 15. It improved the performance of table refresh and makes table maintenance easier. Further, newly supported extensions, such as pg_cron can be used to automate MERGE operation.

Command (computing) Merge (version control) PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • Performing and Managing Incremental Backups Using pg_basebackup in PostgreSQL 17
  • From Concept to Cloud: Building With Cursor and the Heroku MCP Server
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • Integrating Redis With Message Brokers

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!