Here's a typical problem that results from "SQL Hegemony" -- all data
must be in a database, and all access must be via SQL. This can also be
called the "SQL Fetish" school of programming.
On a Data Warehousing project, we had to load and process the
organizational hierarchy. SQL doesn't do hierarchies well because they
can (and should) involve an join of indefinite depth. One of the DBA's
wanted to use a "pure SQL" traversal of the hierarchy.
opinion was that it was a waste of code. We were writing Java
programs. We could -- trivially -- fetch the entire tree into Java
objects and work with the hierarchy as a hierarchy.
DBA finally "won" because of the SQL Hegemony argument -- all access
must be in SQL, right? I say "won" because we eventually had to throw
all the SQL away and use flat files. A "pure SQL" data warehouse is
generally unacceptably slow for loading. Data mart subsets can be done
in pure SQL, but loads can't.
Recent Events. "a table called [LOTSADATA] and it has 14.7 million rows. One of the columns in [LOTSADATA] table is BRAND" for which they need to do a select distinct. "The disadvantage of [SELECT DISTINCT] is that the Oracle database engine will do a sort which is an insanely expensive operation.
Question: Are there alternative approaches to obtaining the unique brands in
Response 1. Duh. Of course there are alternatives. What are you, stupid? You have programming languages. Use them.
Response 2. You're kidding, right? Why ask me? Why not just run it? How hard can it be to benchmark this? What are you, stupid? Seriously.
Response 3. Oh. SQL Hegemony. Folks are actually arguing about the cost of a query and -- it appears -- no one can actually write the eight lines of code required to demonstrate that SELECT ALL is faster than SELECT DISTINCT.
for calling you stupid. You're paralyzed by fear, not stupidity. What
if SQL isn't the perfect end-all, do-all language? If SQL isn't
perfect for all data processing, what other lies have we been living?
Is this the end of organized data processing? The collapse of western
Indeed, I'm repeatedly shocked
that the question even comes up. And I'm more shocked that the "appeal
to authority" argument has to be used. It's trivial to measure. It
appears that it's easier to ask me than to gather data.]
SQL Hegemony? Yes. Rather than run a demonstration program, written
in Java or C# or Python, they argued about the SQL. Doing this with
minimalist SQL didn't seem to make anyone's radar. Why not? SQL
Hegemony. Rather than consider real alternatives, everyone was reduced
to looking for sneaky SQL tricks.
. Here is what I did. It's 5 lines of code for each case. [How hard can this be? Apparently, SQL hegemony makes it impossible
for some organizations to do even this.]
q1.execute( "SELECT DISTINCT BRAND FROM LOTSADATA" )
q2.execute( "SELECT ALL BRAND FROM LOTSADATA" )
print set( q2.fetchall() )
- I only simulated 100,000 rows. [I don't have the patience to wait for 15 million rows to be created, loaded and queried.]
- The table only had four columns.
- I used SQLite3 -- which is mostly in-memory -- and runs much, much faster than Oracle.
- The select all is not a specious result based on cache being filled; the results are repeatable in any ordering of the queries.
For this data, the SQL SELECT DISTINCT took almost 3x as long as the SELECT ALL. It's just that simple.
Want more speed? Use array fetch features to get more rows in bigger buffers.
This is not rocket science. SQL can be Slow. Don't Argue: Benchmark. Your Mileage May Vary.
SQL databases do locking, transaction management, backup and recovery and a bunch of things well. SQL databases are helpful and necessary. However, SQL isn't always fast.
SQL means Slow Query Language. You've been told.