How to Use Self Join and WITH Clause in Oracle

The Oracle WITH clause is one of the most commonly used techniques to simplify the SQL source code and improve performance.

Sachin More

Oct. 18, 24 · Tutorial

Likes (6)

Comment

Save

5.2K Views

The Oracle WITH clause is one of the most commonly used techniques to simplify the SQL source code and improve performance. In Oracle SQL, the 'WITH' clause also known as a Common Table Expression (CTE) is a powerful tool which is also used to enhance the code readability.

WITH is commonly used to define temporary named result sets, also referred to as subqueries or CTEs as defined earlier. These temporary named sets can be referenced multiple times within the main SELECT SQL query. The CTEs are like virtual tables and are very helpful in organizing and modularizing the SQL code.

Understanding the WITH Clause

Syntax

The usage of the WITH clause is very simple. Create a namespace with the AS operator followed by the SELECT query and you can add as many SELECT queries as you want followed by a comma (,). It's a good practice to use meaningful terms for namespaces in order to distinguish in the main SELECT.

In terms of internal execution of the WITH clause, Oracle will internally execute the namespaces individually and cache the results in the memory which will then be utilized by the main SELECT SQL. It mimics a materialized view with intermediate results and reduces redundant calculations. This suggests that Oracle optimizes SQL queries with CTEs by storing the results of the subqueries temporarily, allowing for faster retrieval and processing in subsequent parts of the query.

    SQL
   
 

   WITH 
    cte_name1 as (SELECT * FROM Table1),
    cte_name2 as (SELECT * FROM Table2),
    ...
SELECT ...
FROM cte_name1, cte_name2
WHERE ...;
  

Use Case

In this use case, I am going to talk specifically about how you can effectively utilize inner joins alongside, using a WITH clause, which can tremendously help in performance tuning the process. Let's take a look at the dataset first and the problem statement before we delve deep into the solution.

The scenario is of an e-commerce retail chain for whom the bulk product sales price data needs to be loaded for a particular e-store location. Imagine that a product can have several price lines meant for regular prices, promotional and BOGO offer prices. In this case, the user is trying to create multiple promotional price lines and is unaware of the possible mistakes he/she could commit. Through this process, we will detect duplicate data that is functionally redundant and prevent the creation of poor data quality in the pricing system. By doing so, we will avoid the interface program failures in the Pricing repository staging layer, which acts as a bridge between the pricing computation engine and the pricing repository accessed by the e-commerce platform.

TABLE: e_promotions

Price_LINE	UPC_code	Description	Price	Start_DT	End_dt	Row_num
10001	049000093322	Coca-Cola 12 OZ	$6.86	01/01/2024	09/30/2024	1
10001	049000093322	Coca-Cola 12 OZ	$5.86	01/31/2024	03/30/2024	2
10001	049000028201	Fanta Pineapple Soda, 20 OZ	$2.89	01/01/2024	09/30/2024	3
10001	054000150296	Scott 1000	$1.19	01/01/2024	09/30/2024	4

PS: This a sample data, but in the real world, there could be thousands and millions of price lines being updated to mark down or mark up the prices on a weekly basis.

The table above captures the UPC codes and the respective items within the price line 10001. The issue with this data set is that the back office user is trying to create a duplicate line as part of the same price line through an upload process and the user does not know the duplicate data he/she may be creating.

The intent here is to catch the duplicate record and reject both entries 1 and 2 so that the user can decide which one among the two needs to go in the pricing system to be reflected on the website.

Using the code below would simplify error detection and also optimize the store proc solution for better performance.

    PLSQL
   
 

   WITH price_lines as
         (SELECT rowid, price_line, UPC, start_dt, end_dt
            FROM e_promotions
           WHERE price_line = 10001
             AND flag = 0)
        SELECT MIN(a.rowid) as row_id, a.price_line, a.UPC, a.start_dt, a.end_dt
          FROM price_lines a, price_lines b
         WHERE a.price_line = b.price_line
           AND a.flag = b.flag
           AND a.UPC = b.UPC
           AND a.rowid <> b.rowid
           AND (a.start_dt BETWEEN b.start_dt AND b.end_dt OR
               a.end_dt BETWEEN b.start_dt AND b.end_dt OR
               b.start_dt BETWEEN a.start_dt AND a.end_dt OR
               b.end_dt BETWEEN a.start_dt AND a.end_dt)
         GROUP BY a.price_line, a.UPC, a.start_dt, a.end_dt;
  

With the code above we did two things in parallel:

Queried the table once for the dataset we need to process using the WITH clause
Added the inner join to detect duplicates without having to query the table for the 2nd time, hence optimizing the performance of the store proc

This is one of the many use cases I have used in the past that gave me significant performance gain in my PLSQL and SQL coding.

Have fun and post your comments if you have any questions!

PL/SQL Data (computing) Joins (concurrency library) sql

Opinions expressed by DZone contributors are their own.

Related

Trending