Why You Should Not Use SELECT *
Why You Should Not Use SELECT *
Most of us don't think twice about using SELECT * in our RDBMS queries, but maybe we should. This post discusses why.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
A shorter post this week, but an important one.
Last week, Erik Darling commented on my post saying that we shouldn't use
SELECT *, which was both amusing and accurate. Amusing, because a number of the example T-SQL queries in that post made use of this construct.
Why was Erik's comment accurate? A lot of SQL Server and other RDBMS (relational database management systems) folks advise never to use
SELECT *, and it has become a running gag in my talks when I use
SELECT * in a demo and then tell my attendees not to use it.
Let's go through the main reasons why it's not a good idea to use
SELECT *, specifically in a production environment.
Querying the System Tables
When we write
SELECT * FROM table, the database engine has to go into the system tables to read the column metadata in order to materialize the results. This has a small but measurable performance impact when reading the system tables. If lots of queries use
SELECT *, this can cause noticeable locking on the system tables.
SELECT * returns the columns in the order they were created. This might result in a surprise if a particular order is assumed from output in the past, but the columns have been created in a different order during application upgrades and modifications, which can be reasonably common. Imagine a scenario where one or more columns is appended to the end to avoid rebuilding the whole table, however in a clean install of the application those columns might be in a different order. A
SELECT * query will, therefore, return the columns in a different order depending on how that table was created and/or modified.
Do we really need all the columns, all the time? By limiting the columns returned, we can make better use of indexes that consume less space in memory when our queries execute. This is by far the best reason to limit the columns in a
SELECT statement. Less memory means fewer storage reads, fewer CPU cycles, and faster queries. Since most databases are accessed over a network, this is another major performance bottleneck we can avoid.
When Should We?
As with all best practices, there are exceptions to the rule, which is why you'll often hear a consultant say "it depends."
If our application is a database design tool for example (like phpMyAdmin for MySQL and MariaDB), we should probably bring back all the columns all the time, and make use of row limits and caching to ensure that the application brings back only what it needs.
Another (common) exception is in a development and testing environment, or if we need to troubleshoot a problem in production. There are times when it does make sense to use
SELECT *. These decisions should be based on the best available information available to us, and only in the appropriate circumstances.
Share your favorite
SELECT * story with me on Twitter at @bornsql, or below in the comments.
Published at DZone with permission of Randolph West , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.