Insert Statement Without Column List (BP004)

Explore a 'defense-in-depth' approach to writing SQL in order to avoid a problem when using INSERT code.

Phil Factor

Apr. 29, 19 · Tutorial

Likes (2)

Comment

Save

9.3K Views

Many production databases have failed embarrassingly as a result of INSERT code that omits a column list, usually in mysterious ways and often without generating errors. Phil Factor demonstrates the problem and advocates a 'defense-in-depth' approach to writing SQL in order to avoid it.

You can insert rows into a table without providing a column list, but this has no valid use at all. As a rule, you should avoid all temptation to make your SQL code shorter by leaving things out. There are always exceptions to every rule, of course, but the INSERT column list is so essential that I'd advise you to stop the build if SQL Prompt (BP004), or whatever static code-analysis tool you use, detects such a crime. At the same time, I'd advise a defensive and pessimistic attitude to writing SQL code that would naturally rear up at the idea of 'chancy' code that leaves out column lists.

The Perils of the Blind Insert

Imagine that we have a Purchase table, and then tucked away in a stored procedure far away in the database code, some code that inserts into it the results from a derived table. Here, we'll just insert some data from the ever-obliging AdventureWorks2016.

USE business;
  IF Object_Id('dbo.purchase') IS NOT NULL DROP TABLE dbo.Purchase;
  CREATE TABLE Purchase
    (
    SubTotal NUMERIC(19, 4) NOT NULL,
    TaxAmt NUMERIC(19, 4) NOT NULL,
    Freight NUMERIC(19, 4) NOT NULL,
    total NUMERIC(19, 4) NOT NULL,
    OrderDate DATETIME NOT NULL
    );
  INSERT INTO Purchase
    SELECT TOP 10 SubTotal, TaxAmt, Freight,
      SubTotal + TaxAmt + Freight AS total, OrderDate
      FROM AdventureWorks2016.Sales.SalesOrderHeader;
  GO
  SELECT * FROM purchase

There was a time when there was no check that the number of columns in the table source (the derived table to be loaded) was compatible with the number in the destination table. Nowadays, the number of columns in the source must be compatible with the columns in the table or in column_list. However, there is plenty of room for chaos if someone changes the purpose of a column or changes the order of columns.

Changing the Purpose of a Column

To demonstrate that, we must imagine that the team leader suddenly realizes that that total column is redundant and that he needs, instead, the ShipDate column.

IF Object_Id('dbo.purchase') IS NOT NULL DROP TABLE dbo.Purchase;
  CREATE TABLE Purchase
    (
    SubTotal NUMERIC(19, 4) NOT NULL,
    TaxAmt NUMERIC(19, 4) NOT NULL,
    Freight NUMERIC(19, 4) NOT NULL,
    ShipDate DATETIME NOT NULL,
    OrderDate DATETIME not NULL
    );

Unfortunately, he forgot about the INSERT routine tucked away in a stored procedure. When it runs, it triggers no errors, but in the Purchase table, you'll find nastiness.

INSERT INTO Purchase
    SELECT TOP 10 SubTotal, TaxAmt, Freight,
      Total + TaxAmt + Freight AS total, OrderDate
      FROM AdventureWorks2016.Sales.SalesOrderHeader;
  --result set must be compatible with the columns in the table or in column_list.
  SELECT * FROM Purchase;

Some of those shipping dates ought to raise eyebrows. What's happened? The value for total, which is a money datatype, has become a date. How? Two points here. Some developers think that SQL Server will check the column names of the result set from the table source against the column names of the destination table. No, it doesn't. What is worse, if the data types of the columns in the source and destination don't match, SQL code will do its best to obligingly convert the former into the latter, within the strict rules about when implicit conversion is allowed.

This is what has happened here: an implicit conversion turned data that is intended as a sum of money into a date. We can show it even more simply:

DECLARE @FirstTable TABLE (TheDate DateTime, TheCredit NUMERIC(19,4),TheDebit NUMERIC(19,4))
  INSERT INTO @FirstTable 
   SELECT $43183.8419, $42856.56, $43245.78
  SELECT * FROM @FirstTable

If you were to try it the other way around (popping a date into a money column), then you'd get an error because there is a rule disallowing this implicit conversion.

Msg 257, Level 16, State 3, Line 28
  Implicit conversion from data type datetime to numeric is not allowed. Use the CONVERT function to run this query.

As the error informs us, we'd need to use an explicit conversion to convert a date into a sum of money

SELECT Convert(NUMERIC(19,4),Convert(DATETIME,'26 Mar 2018 20:12:23')) 
      AS TheDateAsMoney

Changing Column Order

We've shown that the INSERT statement without a column list, the 'blind insert', is vulnerable to changes in a table's columns, but even getting the column order wrong can cause calamity. Worse, you only get to discover this if the data doesn't reconcile because it can happen without triggering an error.

If the column into which you accidentally insert data is the same datatype as the intended column, only a well-constructed CHECK constraint would ever be able to save you from calamity. To demonstrate the point, we'll create a journal table.

IF Object_Id('dbo.JournalEntries') IS NOT NULL DROP TABLE dbo.JournalEntries;
  CREATE TABLE dbo.JournalEntries
    (
    DateOfEntry DATETIME NOT NULL DEFAULT GetDate() unique,
    description NVARCHAR(400) NOT NULL DEFAULT 'unexplained',
    account NVARCHAR(20) NOT NULL DEFAULT 'Ac44M',
    Dr NUMERIC(19, 4) NOT NULL DEFAULT 0,
    Cr NUMERIC(19, 4) NOT NULL DEFAULT 0
    );

Now we add in some journal entries using a blind insert:

INSERT INTO dbo.JournalEntries
   VALUES 
   ('23 Mar 2018','sale of Vans','ac30', 00,40345),
   ('24 Mar 2018','pay creditors','ac30', 30000,00),
   ('25 Mar 2018','payment from debtor','ac30',00,60517.45),
   ('26 Mar 2018','purchase of transport','ac30',45462.45,00),
   ('27 Mar 2018','fixtures','ac30',65.45,00),
   ('28 Mar 2018','Stock','ac30',42.60,00),
   ('29 Mar 2018','tax refund','ac30',00,45008.60)

We can now see our balance.

SELECT Convert(CHAR(11),DateOfEntry,113) AS "Date", description, account, Cr, Dr, 
    Sum(Cr - Dr) OVER (ORDER BY DateOfEntry) AS CalculatedRunningTotal
    FROM journalEntries;

In journal tables, entries are assigned to specific accounts using a FOREIGNKEY, and the journal keeps track of many accounts. In most countries, journal entries are entered in chronological order and debits are entered before credits. As such, someone decides that the credit ( Cr) column ought to be before the debit ( Dr) column:

IF Object_Id('dbo.JournalEntries') IS NOT NULL DROP TABLE dbo.JournalEntries;
  CREATE TABLE dbo.JournalEntries
    (
    DateOfEntry DATETIME NOT NULL DEFAULT GetDate() unique,
    description NVARCHAR(400) NOT NULL DEFAULT 'unexplained',
    account NVARCHAR(20) NOT NULL DEFAULT 'Ac44M',
    Cr NUMERIC(19, 4) NOT NULL DEFAULT 0, --we switched this with Dr
    Dr NUMERIC(19, 4) NOT NULL DEFAULT 0  --we switched this with Cr
    );
  INSERT INTO dbo.JournalEntries
   VALUES 
   ('23 Mar 2018','sale of Vans','ac30', 00,40345),
   ('24 Mar 2018','pay creditors','ac30', 30000,00),
   ('25 Mar 2018','payment from debtor','ac30',00,60517.45),
   ('26 Mar 2018','purchase of transport','ac30',45462.45,00),
   ('27 Mar 2018','fixtures','ac30',65.45,00),
   ('28 Mar 2018','Stock','ac30',42.60,00),
   ('29 Mar 2018','tax refund','ac30',00,45008.60)
   SELECT Convert(CHAR(11),DateOfEntry,113) AS "Date", description, account, Cr, Dr, 
    Sum(Cr - Dr) OVER (ORDER BY DateOfEntry) AS CalculatedRunningTotal
    FROM journalEntries;

In the absence of a column list, the INSERT statement simply assumes that the order of the columns in the VALUES clause matches the order of the columns in the table. If some developer switches the order of the columns, the routine will still work fine, in this case, but the books won't balance, and the office will be full of grim-faced people in black shoes and charcoal-grey suits.

There is nothing here to even indicate the order of the values in the VALUES statement, so it will take a while to spot the problem. Worst of all, you'll get the blame rather than the dev who changed the order of the columns. Not specifying column names is fine for interactive work, but if you write code that relies on the hope that nothing will ever change, then refactoring could prove to be impossible.

Adding a column list will make it clear what column of the result set goes into which column of the target table, but how can you be certain that the order in the multi-row VALUES clause, or whatever other table-source you are using, matches that of the column list? Here — just as a suggestion — is a more defensive and maintainable way of doing it, by not only adding a column list but also documenting the intended order of the VALUES clause.

INSERT INTO dbo.journalEntries (DateOfEntry, description, account, dr, cr)
    SELECT DateOfEntry, description, account, dr, cr
      FROM
        (
        VALUES ('23 Mar 2018', 'sale of Vans', 'ac30', 00, 40345),
          ('24 Mar 2018', 'pay creditors', 'ac30', 30000, 00),
          ('25 Mar 2018', 'payment from debtor', 'ac30', 00, 60517.45),
          ('26 Mar 2018', 'purchase of transport', 'ac30', 45462.45, 00),
          ('27 Mar 2018', 'fixtures', 'ac30', 65.45, 00),
          ('28 Mar 2018', 'Stock', 'ac30', 42.60, 00),
          ('29 Mar 2018', 'tax refund', 'ac30', 00, 45008.60)
        ) AS f (DateOfEntry, description, account, dr, cr);

The extra specifications of the column lists do nothing more than to emphasize what the table source intended each column to be, and it is easy to check by eye that they actually do so. It is more like documentation.

Defense in Depth: Constraints

As well as the obvious defense against this sort of thing happening, which is to specify the list of columns in order, you need constraints. Omitting them because you feel certain that they could never throw an error is like reasoning that smoke detectors aren't necessary because they are so seldom triggered by smoke.

Let's take that first example; the purchase table. What's missing? The constraints, of course. The problem was bizarre and obvious, and the purchase table should have been liberally sprinkled with constraints in order to prevent it.

IF Object_Id('dbo.purchase') IS NOT NULL DROP TABLE dbo.Purchase;
  CREATE TABLE Purchase
    (
    SubTotal NUMERIC(19, 4) NOT NULL CHECK (Subtotal>0), 
    TaxAmt NUMERIC(19, 4) NOT NULL ,
    Freight NUMERIC(19, 4) NOT NULL ,
    ShipDate DATETIME NOT NULL,
    OrderDate DATETIME not NULL,
    CONSTRAINT Shipdate_Before_Orderdate CHECK (Shipdate>OrderDate),
    CONSTRAINT Tax_Charge_Too_High CHECK (TaxAmt>(SubTotal*30/100)),
    CONSTRAINT OrderDate_Is_Impossible CHECK (Year(OrderDate)<2000),
    CONSTRAINT Freight_Charge_Too_High CHECK (Freight>(SubTotal/2))
    );

Now we test it out to see what happens

INSERT INTO Purchase
    SELECT TOP 10 SubTotal, TaxAmt, Freight,
      SubTotal + TaxAmt + Freight AS total, OrderDate
      FROM AdventureWorks2016.Sales.SalesOrderHeader;
  --result set must be compatible with the columns in the table or in column_list.
  SELECT * FROM Purchase;

And of course, the alarm bells ring out immediately:

Msg 547, Level 16, State 0, Line 31
  The INSERT statement conflicted with the CHECK constraint "Shipdate_Before_Orderdate". The conflict occurred in database "business", table "dbo.Purchase".
  The statement has been terminated.

When someone makes a mistake like this, it is much better to allow it to trigger an error in testing than to leave open even a small chance it will get into production use.

The constraints in the journalEntries table would be much more dependent on the business rules in place for the business, and it is more usual for a trigger to pick up an unusual deviation from a baseline for the balance.

Summary

Just because you can leave out the column list in the INSERT statement doesn't mean you should. It will introduce a fragility into your code that at some time will catch out either you or, more importantly, whoever must cope with your code, subsequently, as it's likely to fail in ways that are hard to predict or to detect retrospectively.

After many years of experience, I've seen the most extraordinarily unlikely things to go wrong. The wildest and most fantastic CHECK constraint conditions get triggered, and exceptions that seem impossible will always get their day of glory, written in red letters on your screen.

Database sql

Published at DZone with permission of Phil Factor. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending