Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Fun with SQL: Common Table Expressions for More Readable Queries

DZone's Guide to

Fun with SQL: Common Table Expressions for More Readable Queries

Let's get back-to-basics with Postgres and look at some standard Common Table Expressions and what you can do with them!

· Database Zone ·
Free Resource

Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.  

This week, we’re continuing our fun with SQL series. In past posts, we’ve looked at generate_serieswindow functions, and recursive CTEs. This week, we’re going to take a step backward and look at standard CTEs (common table expressions) within Postgres.

Admittedly SQL isn’t always the most friendly language to read. It’s a little more friendly to write, but even still, it's not as naturally readable as something like Python. Despite its shortcomings there, it is the lingua franca when it comes to data. SQL is the language and API that began with relational databases and now even non-traditional databases are aiming to imitate it with their own SQL-like things. But with CTEs, even queries hundreds of lines long can become readable to someone without detailed knowledge of the application.

CTEs, often referred to as with clauses/queries, are essentially views that are valid during the course of a transaction. They can reference earlier CTEs within that same transaction or query, essentially giving you separate building blocks upon which you compose your queries. It is of note that CTEs are an optimization boundary, so in some cases, they may have worse performance than their alternative non-CTE queries. Even still, they’re incredibly useful for readability and should be considered when constructing large, complex queries. Let’s dig in with an example.

We’re going to assume a basic CRM schema where we have organizations that have contacts that are tied to accounts — and those accounts have opportunities. In this CRM world, we want to create a report that has all the opportunities that were opened between 30 and 60 days ago and have a contact that was contacted within the last 30 days, grouped by the sales rep owner. The goal is to see that our sales reps are actively chasing the deals that they said exist.

Query for Opportunities Created 1-2 Months Ago

First, we’re going to construct a query that gives us all the opportunities opened in that range:

WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
)

SELECT *
FROM opp_list


You can see that we start our query with a WITH clause followed by a name we define for the query opp_list. We then put our query in there and, later, we can query that CTE with SELECT * FROM opp_list. This query gives us the list of opportunities, which account they are tied to, and their amount for our given conditions:

 opportunity_id | account_id |  account_name   | opportunity_amount |      opportunity_created
----------------+------------+-----------------+--------------------+-------------------------------
              1 |          1 | ACME Explosives |              45000 | 2018-06-14 09:06:10.06669-07
              4 |          2 | ACME Bread      |              17000 | 2018-07-09 09:06:55.312687-07
              2 |          2 | ACME Bread      |              27000 | 2018-06-26 09:06:27.182375-07
              3 |          3 | ACME Databases  |              35000 | 2018-07-09 09:06:41.615194-07
(4 rows)


Querying for Recently Contacted Contacts

Now we need to construct the next part of our query. Let’s first do this in its own CTE and find all contacts that were emailed within the last 30 days:

WITH recently_contacted AS (
  SELECT email,
         account_id,
         last_contacted
  FROM contacts
  WHERE contacts.last_contacted >= now() - '30 days'::interval
)

SELECT *
FROM recently_contacted


And here we can see we have two contacts we’ve contacted within the last 30 days:

      email       | account_id |        last_contacted
------------------+------------+-------------------------------
 jane@acme.com    |          2 | 2018-07-12 09:10:45.349093-07
 craig@acmedb.com |          3 | 2018-08-02 09:11:16.624737-07
(2 rows)


Putting Our CTE Building Blocks Together

Now we can start to combine our two CTEs to get the intersection of opportunities and who we’ve contacted in the last 30 days. A note, when chaining together multiple CTEs, you don’t repeat the WITH keyword. Instead, you continue with a comma and place the next CTE. Let’s look at it put together:

WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
),

recently_contacted AS (
  SELECT email,
         account_id,
         last_contacted
  FROM contacts
  WHERE contacts.last_contacted >= now() - '30 days'::interval
)

SELECT account_name,
       opportunity_created,
       email,
       last_contacted
FROM      opp_list
     JOIN recently_contacted
       ON opp_list.account_id = recently_contacted.account_id;


And with our final result, we get 3 open opportunities across 2 accounts that have had some interaction with the rep within the last 30 days:

  account_name  |      opportunity_created      |      email       |        last_contacted
----------------+-------------------------------+------------------+-------------------------------
 ACME Bread     | 2018-06-26 09:06:27.182375-07 | jane@acme.com    | 2018-07-12 09:10:45.349093-07
 ACME Bread     | 2018-07-09 09:06:55.312687-07 | jane@acme.com    | 2018-07-12 09:10:45.349093-07
 ACME Databases | 2018-07-09 09:06:41.615194-07 | craig@acmedb.com | 2018-08-02 09:11:16.624737-07
(3 rows)


CTEs for Better Readability

CTEs are a powerful construct to help build logic blocks and gradually construct more complex queries. Others that come after you will appreciate reading your query (even if it's hundreds of lines) if you’ve used CTEs. But you will want to be careful if CTEs are heavily used within your application as they are an optimization boundary, but for use cases where readability is as important as a few ms. then give them a strong consideration.

Download AnzoGraph now and find out for yourself why it is acknowledged as the most complete all-in-one data warehouse for BI style and graph analytics.  

Topics:
database ,cte ,tutorial ,sql queries ,optimization boundary ,postgresql

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}