Don't Do These Things in SQL Server
Here's one professional's thoughts on what to do, what not to do, and when to do (or not do) certain things in SQL Server.
Join the DZone community and get the full member experience.
Join For FreeRecently Brent Ozar posted a link to the PostgreSQL “Don’t do this” page, which I am shamelessly reproducing below, re-tailored for a SQL Server audience.
Don’t Use -P
With sqlcmd
sqlcmd
is a cross-platform interactive command-line utility that allows you to connect to a SQL Server (or Azure SQL Database) instance and perform database operations. One of the parameters is -P
which allows you to supply a password.
Why Not?
If sqlcmd
requires a password at the command-line, it will ask for one. Generally speaking we should be using Active Directory authentication (which is supported on Windows and Linux), so specifying a command-line password is redundant in that case. As well, if we have a password hard-coded into a script, we’re doing it wrong.
When Should We?
Never. It’s not worth it to save a network round-trip to the server.
Don’t Use NOT IN
It’s a common practice to use the IN
condition in a WHERE
clause, so that we can limit our search arguments to specific values. However using NOT IN
can have some unexpected results for the same reasons that we see in PostgreSQL.
Why Not?
The first reason has to do with NULL
values. Let’s use the same example as the PostgreSQL page:
SELECT * FROM foo WHERE col NOT IN (1, NULL);
This type of query will always return zero rows. It is impossible to know if something is NOT NULL
because NULL
is always unknown, thus NOT (NULL)
returns NULL
. The WHERE
condition will never resolve to a valid value in the above example. Since col
cannot be evaluated, the query optimizer simplifies that condition to a Constant Scan against an empty internal table.
The second reason why this is a bad idea is because it makes the query optimizer pick a really bad plan. Consider this example:
SELECT * FROM foo WHERE col1 NOT IN (SELECT col1 FROM bar);
If one of the sub-query values is a NULL
, this query will return zero rows as noted above. More importantly and perhaps surprisingly, the execution plan will use a Hash Match (Right Anti Semi Join). On smaller sets this won’t be noticeable, but it will matter a lot when you have more than a few thousand rows and your performance tanks, as a Hash Match is a very expensive join operator.
When Should We?
If we use NOT IN
, we should ensure our list of elements doesn’t contain a NULL
value, and we should also avoid using a sub-query in the WHERE
clause to avoid that blocking Hash Match.
Don’t Use TIMESTAMP
We covered this in detail in a previous post, What about TIMESTAMP
? It’s better to pretend that this data type doesn’t exist.
Why Not?
It is not what you think it is. TIMESTAMP
is actually a row version value based on the amount of time since SQL Server was started. If you need to record an actual date and time, use DATETIME2
instead.
When Should We?
Never.
Don’t Use SMALLMONEY
, FLOAT
or REAL
for Money
If we need to store values that represent amounts of money (or any values that require accuracy), avoid SMALLMONEY
, FLOAT
and REAL
data types.
Why Not?
In SQL Server and Azure SQL Database, the SMALLMONEY
data type has a maximum range of -214,748.3648 to 214,748.3647. It’s clearly inappropriate for larger amounts. It may be tempting to use the MONEY
data type instead, but be wary of rounding when performing calculations, since MONEY
only stores four decimal places.
Speaking of rounding during calculations, FLOAT
and REAL
are terrible choices for calculations on any value that require accurate results. Floating point arithmetic will result in lower accuracy (sometimes significant), and on financial values this can cause incorrect results.
When Should We?
Any time we have to deal with money, use the DECIMAL
data type with an appropriate scale. In a pinch we could use the MONEY
data type, but certainly never SMALLMONEY
, FLOAT
or REAL
for money.
Don’t Use BETWEEN
This is an easy one to get wrong, especially with dates.
Why Not?
BETWEEN
uses a closed-interval comparison: the values of both ends of the specified range are included in the result. That might sound like what we’d want, but consider the following example where we want to get all the values in the next week:
SELECT * FROM table WHERE datecol BETWEEN '2018-08-01' AND '2018-08-08';
This will include results up to where the date and time is exactly 2018-08-08 00:00:00.0000000
, but not dates and times later in that same day. We may not actually want any events that might have occurred at exactly midnight on the second value.
Instead, do:SELECT * FROM table WHERE datecol >= '2018-08-01' AND datecol < '2018-08-08';
With this construction, we are getting all the values up to but not including midnight on the morning of August 8th.
When Should We?
BETWEEN
is safe for discrete quantities like integers or dates, as long as you remember that both ends of the range are included in the result. It is better to use combinations of >
, =
and <
operators instead.
Summary
SQL Server and PostgreSQL have a lot in common, especially when it comes to what not to do. I’d love to hear your own rules for either platform, with a focus on SQL Server of course. Sound out in the comments, or find me on Twitter at @bornsql.
Published at DZone with permission of Randolph West, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments