The Dangers of SQL JOINs & Aggregate Functions: How to Avoid Fanouts
Join the DZone community and get the full member experience.
Join For FreeEvery SQL developer uses JOINs and aggregate functions, but some may not be aware that the two interacting can return incorrect results if not handled carefully. Brett Sauve at Looker has covered the issue in this recent post - specifically, he focuses on the issue of "fanouts" - and it's an easy problem to overlook.
A fanout occurs when the primary table in a query contains fewer rows than the combined table resulting from a JOIN. Because the new table contains more rows, aggregate functions like COUNT or SUM may include duplicates, throwing off the results.
According to Sauve, the answer is careful ordering of JOIN tables:
Herein lies a key point: to help avoid fanouts, begin your joins with the most granular table.
Sauve also provides some tips to help SQL developers understand how to avoid fanouts and ensure the validity of their data. He points to three types of JOINs:
- 1-to-1
- Many-to-1
- 1-to-Many
Knowing which JOIN you are using can rule out the possibility of a fanout, for example, because it is not possible in a 1-to-1 or Many-to-1 JOIN. If you are using a 1-to-Many JOIN, though, Sauve has some tips there, too. For example, a messy JOIN can be cleaned up by grouping data to create a 1-to-1 relationship.
Check out the full article for the rest of Sauve's tips to make sure your aggregate functions are working correctly, and if you're looking for a few more tips to keep your SQL skills sharp, you might try one of these:
- Yet Another 10 Common Mistakes Java Developers Make When Writing SQL
- 6 Simple Performance Tips for SQL SELECT Statements
Opinions expressed by DZone contributors are their own.
Comments