DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. SQL Joins (In Postgres)

SQL Joins (In Postgres)

Let's take a look at inner and outer Joins in SQL as well as planning for joins with the PostgreSQL database.

Craig Kerstiens user avatar by
Craig Kerstiens
·
Sep. 28, 18 · Tutorial
Like (3)
Save
Tweet
Share
4.91K Views

Join the DZone community and get the full member experience.

Join For Free

Our beloved Structured Query Language may be the lingua franca for relational databases-but like many languages, SQL is in a state of constant evolution. The first releases of SQL didn't even have a notation for joins. At the time, SQL only supported inner joins.

Cross Joins and Where Filters

As a result, back in the early eighties, the only way to express a join condition between tables would be in the WHERE clause.

select companies.name, campaigns.name, campaigns.monthly_budget
  from companies, campaigns 
 where companies.id = campaigns.company_id
   and company_id = 5;

In that query, the PostgreSQL planner realizes that we are doing an INNER JOIN in between the companies table and the campaigns table, even though what we are actually writing is a CROSS JOIN in between the tables. In PostgreSQL documentation chapter Table Expressions we can read more about the CROSS JOIN operation:

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below). It is also equivalent to FROM T1, T2.

In my opinion, it is so rare that what you want to achieve actually is a cross join that cross joins should fail your SQL unit testing. After all, code should always express the intentions of the programmer, and if your intention is not to implement a CROSS JOIN, then using a comma-separated list of tables in your FROM clause looks deadly suspicious.

Inner and Outer Joins in SQL

The previous SQL query can be written properly with an inner join syntax:

select companies.name, campaigns.name, campaigns.monthly_budget
  from            companies
       inner join campaigns 
               on companies.id = campaigns.company_id
 where company_id = 5;

Lucky for us, the PostgreSQL planner is smart enough to realize that both queries are specifying the same result set and accordingly executes the same plan to fetch it.

Now, if a company didn't program any advertising campaigns yet, then the company won't be listed at all in the output. What if you want to list all companies, whether or not they have campaigns attached to them? In that case, we would use an SQL outer join:

select companies.name, campaigns.name, campaigns.monthly_budget
  from                 companies
       left outer join campaigns 
                    on companies.id = campaigns.company_id
 where company_id = 5;

The keywords INNER and OUTER in the SQL syntax above are noise words that the SQL standard allows. These noise words are not mandatory. After all, a LEFT JOIN is always an OUTER join, there's no choice here. So we can rewrite the query that way:

select companies.name, campaigns.name, campaigns.monthly_budget
  from           companies
       left join campaigns 
              on companies.id = campaigns.company_id
 where company_id = 5;

Planning for Joins With the PostgreSQL Database

We said that PostgreSQL is smart enough to realize that there's more than one way to specify the same result set with SQL queries written in different ways.

So it follows that one of the tasks that the PostgreSQL planner must implement consists of finding the most efficient way to execute the JOIN specified in the query.

After all JOIN(A, B, C) gives the same results as JOIN(JOIN(A, B), C) and the same result as JOIN(A, JOIN(B, C)). So how does PostgreSQL decide which order to prefer?

It Turns out This Problem Was Solved by a Pioneer in Relational Databases

Patricia Selinger co-published Access path selection in a relational database management system in the Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pp 23-34, where she exposes the algorithm she invented to plan for JOIN operations in SQL. Here's the summary of the paper:

In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research Laboratory.

The Selinger Algorithm is still in use as of today in PostgreSQL source code, and Daniel Gustafsson did an awesome job at presenting the Selinger algorithm in his presentation titled Through The Joining Glass that he delivered at PGConf.EU 2017, last year. If you want to learn more about how PostgreSQL implements JOINs, read Daniel's slides!

Planning for Distributed Joins With PostgreSQL and Citus

When using Citus to scale out PostgreSQL horizontally (often to reap the performance benefits of a scale-out database architecture), the same query as above might be fetching data from distributed tables. Because Citus is a PostgreSQL extension, Citus benefits from all the smarts that PostgreSQL provides when planning the query. A nice thing about this step in the planner is the Equivalence Class processing, which allows Citus to leverage the fact that PostgreSQL knows how to read the join condition and where clauses and inject the filters everywhere needed.

When PostgreSQL is done planning the query, Citus knows that there is a join in between the companies and campaigns table, and also that both campaigns.company_id = 5 and company_id = 5.

If you're developing a multi-tenant application (like a SaaS app) this means you can use Citus to scale out your data and your queries across multiple nodes. You can shard by company ID and Citus will know to route the query down to a single shard (on a single node.) This Citus multi-tenant app tutorial does a good job guiding you through the process of using the Citus database to scale out a multi-tenant application.

Database Relational database sql Joins (concurrency library) PostgreSQL

Published at DZone with permission of Craig Kerstiens, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 5 Steps for Getting Started in Deep Learning
  • Journey to Event Driven, Part 1: Why Event-First Programming Changes Everything
  • 10 Things to Know When Using SHACL With GraphDB
  • Microservices 101: Transactional Outbox and Inbox

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: