"When a developer tries to befriend a DBA for the first time"
My colleague Martin Masarik, CEO of SQLdep, was speaking with a DBA friend of his the other day who runs an Oracle data-warehouse that’s about 2 terabytes big, at an international bank. He asked him: “What are some typical problems that users, data analysts, and DB developers have, that make you want to pull your hair out?” (to be fair, he doesn’t have much hair left, so if he’s willing to make that final sacrifice, these must be some big issues). In his experience, the vast majority of performance problems fall into these categories.
1) Don’t call a function on an indexed column
This effectively prevents the database from using that index. The problem can reach even deeper on partitioned tables because then you don't read the data from a specific partition but from the whole table. For large fact tables, this is a brutal performance problem.
Don’t do this:
WHERE TIME_ID+14 > to_number(to_char(sysdate,'J'))
WHERE TIME_ID > to_number(to_char(sysdate-14,'J'))
2) Analyze the tables you create
If you don’t, here’s what happens: You confuse the database query optimizer, and the chance for using a sub-optimal join technique increases. Let’s say that you create a temp table with one million rows. Without table statistics (output of the analysis), the optimizer has no clue what is really inside the table. It might decide that a nested loop join should be used, which means that for every row in one table it will access every row in the other table. In general, nested joins are the most efficient on small dataset but perform poorly when the dataset grows bigger.
ANALYZE TABLE <TABLE_NAME> COMPUTE STATISTICS
3) Break down your work into bite-sized SQL steps
Imagine your SQL is like a pizza. Whether you eat it with your hands or a fork (like the Mayor of New York, Bill de Blasio, in slow motion here), you certainly don’t shove the whole thing into your throat and try to choke it down.
Instead of creating a super-sized SQL query, break it down into 3-4 separate steps. The more simple SQL you write, the less chance you’ll have of perplexing the query optimizer. It’s also easier to debug as you can verify data in each of those tables.
4) Use Distinct only when needed
This is a good rule of thumb. Distinct is often used when you have two or more identical rows returned by a query. By adding Distinct, you can de-duplicate these rows from your query results. However you should use this with a clear purpose in mind which is most likely creating unique list based on a single column like customer ID. The abuse of Distinct over more columns sometimes leads unskilled users to making other mistakes, such as forgetting to join tables (yup, it really happens).
5) Create indexes
As a final tip to improve your own tables: Use indexes. If we simplify, consider them especially on larger tables (>100k rows). It proves very useful when you do a check like "Does a customer exist in my temp table?", there you can improve the performance a lot. Instead of the full table scan you allow the database engine to work with a much smaller and better optimized data structure -- the index.
Database Admins, developers, data analysts, and other users are all part of the same team. As with nearly every other situation in life, patience, communication, and education are often the keys to working together smoothly to achieve common goals... oh, and plenty of pizza.
- If you’re a DBA, what bugs you the most? What tips do you have for non-DBAs?
- If you’re not a DBA, and you’ve been bitten by one in the past, tell us about it. Did you learn anything from the experience?
- Full conversation here