Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Say No to Venn Diagrams When Explaining JOINs

DZone's Guide to

Say No to Venn Diagrams When Explaining JOINs

It's easy to use Venn diagrams to explain JOINs, but when you think about it, they're not quite the right tool to use. That's where JOIN diagrams come in.

· Database Zone
Free Resource

Read why times series is the fastest growing database category.

In recent times, there have been a couple of tremendously popular blog posts explaining JOINs using Venn diagrams. After all, relational algebra and SQL are set oriented theories and languages, so it only makes sense to illustrate set operations like JOINs using Venn Diagrams. Right?

Google seems to say so:

venn-google


Everyone uses Venn Diagrams to explain JOINs. But that's…

PLAIN WRONG!

Venn Diagrams are perfect to illustrate … actual set operations! SQL knows three of them:

And they can be explained as such:

venn-union

venn-intersection

venn-difference

(All of these slides are taken from our Data Geekery SQL Training, do check it out!)

Most of you use UNION occasionally. INTERSECT and EXCEPT are more exotic, but do come in handy every now and then.

The point here is: These set operations operate on sets of elements (tuples), which are all of the same type. As in the examples above, all elements are people with first and last names. This is also why INTERSECT and EXCEPT are more exotic, because they're usually not very useful. JOIN is much more useful. For instance, you want to combine the set of actors with their corresponding set of films.

A JOIN is really a Ccartesian product (also cross product) with a filter. Here’s a nice illustration of a cartesian product:

venn-cross-product

So, What’s a Better Way to Illustrate JOIN Operations?

JOIN diagrams! Let's look at CROSS JOIN first, because all other JOIN types can be derived from CROSS JOIN:

venn-cross-join

Remember, a cross join (in SQL also written with a comma separated table list, historically) is just taking every item on the left side and combining it with every item on the right side. When you CROSS JOIN a table of three rows with a table of four rows, you will get 3×4=12 result rows. See, I'm using an “x” character to write the multiplication. I.e. a “cross”.

INNER JOIN

All other joins are still based on cross joins, but with additional filters, and perhaps unions. Here's an explanation of each individual JOIN type.

venn-join


In plain text, an INNER JOIN is a CROSS JOIN in which only those combinations are retained which fulfil a given predicate. For instance:

-- "Classic" ANSI JOIN syntax
SELECT *
FROM author a
JOIN book b ON a.author_id = b.author_id

-- "Nice" ANSI JOIN syntax
SELECT *
FROM author a
JOIN book b USING (author_id)

-- "Old" syntax using a "CROSS JOIN"
SELECT *
FROM author a, book b
WHERE a.author_id = b.author_id

Remember though, that the INNER JOIN is still a Cartesian product, i.e. it retains all the "duplicate" rows where the predicate evaluates to true. In the visual example, we can see that there are two resulting blue rows because the right side had two blue rows, and the left blue row is repeated. We get a cross product of 1×2=2 rows.

OUTER JOIN

OUTER JOIN types help where we want to retain those rows from either the LEFT side or the RIGHT or both (FULL) sides, for which there was no matching row where the predicate yielded true.

A LEFT OUTER JOIN in relational algebra is defined as such:

dd81ee1373d922122ce1b3e0da74cb28

Or more verbosely in SQL:

SELECT *
FROM author a
LEFT JOIN book b USING (author_id)


This will produce all the authors and their books, but if an author doesn't have any books, we still want to get the author with NULL as their only book value. So, it's the same as writing:

SELECT *
FROM author a
JOIN book b USING (author_id)

UNION

SELECT a.*, NULL, NULL, NULL, ..., NULL
FROM (
    SELECT a.*
    FROM author a
 
    EXCEPT

    SELECT a.*
    FROM author a
    JOIN book b USING (author_id)
) a


But no one wants to write that much SQL, so OUTER JOIN was implemented.

Conclusion: Say NO to Venn Diagrams

JOINs are relatively easy to understand intuitively. And they're relatively easy to explain using Venn diagrams. But whenever you do that, remember, that you’re making a wrong analogy. A JOIN is not strictly a set operation that can be described with Venn diagrams. A JOIN is always a cross product with a predicate, and possibly a UNION to add additional rows to the OUTER JOIN result.

So, if in doubt, please use JOIN diagrams rather than Venn Diagrams. They’re more accurate and visually more useful.

venn-google-say-no

Learn how to get 20x more performance than Elastic by moving to a Time Series database.

Topics:
result ,predicate ,product ,blue ,diagrams ,illustration ,rows ,blog ,popular ,nice

Published at DZone with permission of Lukas Eder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}