DZone
Database Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Database Zone > How SQL GROUP BY Should Have Been Designed: Like Neo4j’s Implicit GROUP BY

How SQL GROUP BY Should Have Been Designed: Like Neo4j’s Implicit GROUP BY

SQL GROUP BY should have been designed like Neo4j's implicit GROUP BY with its Cypher query language, which is still inspired by SQL.

Lukas Eder user avatar by
Lukas Eder
·
May. 19, 15 · Database Zone · Interview
Like (0)
Save
Tweet
4.60K Views

Join the DZone community and get the full member experience.

Join For Free

In the recent past, we’ve explained the syntactic implications of the SQL GROUP BY clause. If you haven’t already, you should read our article “Do You Really Understand SQL’s GROUP BY and HAVING clauses?“.

In essence, adding a GROUP BY clause to your query transforms your query on very implicit levels. The following reminder summarises the previous article:

  • Only column expressions referenced in the GROUP BY clause, or aggregations of other column expressions may appear in the SELECT clause
  • Aggregations without explicit GROUP BY clause imply the “grand total” GROUP BY () clause
  • Some databases (e.g. MySQL, and to some extent: the SQL standard) don’t follow these rules and allow for arbitrary column expressions (or at least functionally dependent column expressions) in the SELECT clause

How SQL GROUP BY Should Have Been Designed

There is another way of looking at GROUP BY, and it has been implemented in the equally fascinating, beautiful, and weird Cypher query language (those are good attributes) as supported by the Neo4j graph database. This alternative (yet SQL inspired) query language probably deserves a whole blog post series on its own, but let’s focus on aggregation. Because aggregation is the primary use case for grouping.

(for the record, check out the Neo4j docs about aggregation for details)

A Quick Wrap-Up to Understand Cypher

Consider this simple Cypher query:

MATCH (me:Person)-->(friend:Person)
                 -->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), 
       count(friend_of_friend)
  • Cypher MATCH corresponds to SQL FROM
  • Cypher WHERE … take an educated guess
  • Cypher RETURN corresponds to SQL SELECT (and it’s placed in a semantically more useful place)

Furthermore

  • Cypher
    (me:Person)-->(friend:Person)
               -->(friend_of_friend:Person)
    
    corresponds roughly to SQL
         Person AS me 
    JOIN Person AS friend 
      ON [ implicit equi-join predicate ]
    JOIN Person as friend_of_friend
      ON [ implicit equi-join predicate ]

Cypher’s way of writing JOIN is actually extremely useful and could also be applied to SQL. It is only a matter of time until someone will write a Cypher-to-SQL transformer that implements the syntax, at least as syntactic sugar for the equivalent ANSI equi-join notation.

Let’s Investigate Aggregation in Cypher

Here’s the query again:

MATCH (me:Person)-->(friend:Person)
                 -->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), 
       count(friend_of_friend)

So, in SQL terms, this is exactly the same as:

SELECT count(DISTINCT friend_of_friend), 
       count(friend_of_friend)
FROM   [ Persons ... ]
WHERE  me.name = 'A'

In other words, the same implicit grand total GROUP BY () is implied and all values are aggregated into a single row.

The next example from the Neo4j docs is more intriguing. This will count the number of nodes connected to a node n with name = 'A':

MATCH (n { name: 'A' })-->(x)
RETURN n, count(*)

Which is a shorter form for writing:

MATCH (n)-->(x)
WHERE n.name = 'A'
RETURN n, count(*)

This example will also perform aggregation, but this time with an implicit GROUP BY n clause. In SQL, you’d write something like:

SELECT   n.id, count(*)
FROM     n
JOIN     x
  ON     [ implicit equi-join predicate ]
WHERE    n.name = 'A'
GROUP BY n.id

The nice thing in Cypher is that the obvious GROUP BY clause (it can only be GROUP BY n.id) is implied. It doesn’t have to be written explicitly.

Takeaway for SQL

We’ve seen a couple of nice Cypher language features, especially the incredibly nice way to write “JOIN” (or rather graph traversal in Neo4j). But a much more obvious, low-hanging fruit with actual chances to make it into the SQL standard would be to make the SQL GROUP BY clause optional, and dependent on the SELECT clause using the following rules:

  • If SELECT contains no aggregation functions, there shall be no implied GROUP BY clause
  • If SELECT contains 1-N aggregation functions, there shall be an implied GROUP BY clause formed from the remaining columns
  • If SELECT contains only aggregation functions, the “grand total” GROUP BY () shall apply
  • An explicit GROUP BY clause will always be preferred to any implied GROUP BY clause

If any of you ISO / IEC committee members are reading this, this is on my wish list for a future SQL standard. And please, PostgreSQL. Implement this right away.

Liked this article?

Here’s some further reading about the SQL GROUP BY clause and aggregation:

  • The Neo4j manual about aggregation
  • How to Translate SQL GROUP BY and Aggregations to Java 8
  • Do You Really Understand SQL’s GROUP BY and HAVING clauses?
  • GROUP BY ROLLUP / CUBE
  • How to use SQL PIVOT to Compare Two Tables in Your Database
  • The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions
  • A True SQL Gem You Didn’t Know Yet: The EVERY() Aggregate Function
Database sql MySQL Neo4j

Published at DZone with permission of Lukas Eder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 7 Database Optimization Hacks for Web Developers
  • How to Design a CRUD Web Service for Inheritable Entity
  • Comparing Distributed Databases
  • 11 Best Practices to Do Functional Testing on the Cloud

Comments

Database Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo