Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Finding Code Smells Using SQL Prompt

DZone's Guide to

Finding Code Smells Using SQL Prompt

Let's at some creative SQL statements you can use to help root out code smells and get the data you really want.

· Database Zone ·
Free Resource

New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.

Using TOP in a SELECT statement without a subsequent ORDERBY clause is legal in SQL Server, but meaningless because asking for the TOP10 rows implies that the data is guaranteed to be in a certain order, and tables have no implicit logical order. You must specify the order.

In a SELECT statement, you should always use an ORDERBY clause with the TOP clause, to specify which rows are affected by the TOP filter. If you need to implement a paging solution in an application widget, to send chunks or “pages” of data to the client so a user can scroll through data, it is better and easier to use the OFFSETFETCH subclause in the ORDERBY clause, instead of the TOP clause.

A recommendation to avoid use TOP in a SELECT statement, without an ORDERBY, is included as a “Best Practice” code analysis rule in SQL Prompt (BP006).

Limiting Rows With TOP

TOP isn’t standard SQL, but it is intuitive. If you want just a few sample rows from a table source, then it is tempting to use the TOP keyword without the ORDERBY clause. A single table is likely to conform to the order of the clustered index but even that isn’t guaranteed due to parallelism.

If we go beyond the query for a single table and do a few joins, the ‘natural’ order is less obvious. Maybe you are in AdventureWorks and want just five customers, any five customers, and their addresses. It is perfectly legal to do this, but it is slightly dangerous if you subsequently forget why you are doing it.

Listing 1:

SELECT TOP 5 Person.Title, Person.FirstName, Person.MiddleName,
  Person.LastName, Address.AddressLine1, Address.AddressLine2, Address.City,
  Address.PostalCode, AddressType.Name
  FROM Sales.Customer
    INNER JOIN Person.Person
      ON Customer.PersonID = Person.BusinessEntityID
    INNER JOIN Person.BusinessEntityAddress
      ON Person.BusinessEntityID = BusinessEntityAddress.BusinessEntityID
    INNER JOIN Person.Address
      ON BusinessEntityAddress.AddressID = Address.AddressID
    INNER JOIN Person.AddressType
      ON BusinessEntityAddress.AddressTypeID = AddressType.AddressTypeID;


You get what you expect, just the first five customers that are returned by the query. The order I get is that of the clustered index of the Sales.Customer table’s PK_Customer_CustomerID, starting with the customer with the lowest customer_id that is a person rather than a store. A different execution strategy could well change that. You are not guaranteed a deterministic result. This may be fine if you just want a sample during development, but in production systems what you really want are the top five customers and addresses according to their rank order, as determined by some attribute such as how much they spend. You really need that ORDERBY.

In short, SQL tables have no guaranteed consistency in its intrinsic order. You may have set a PRIMARYKEYthat gives your table values some fundamental ordering, but this isn’t guaranteed to happen consistently. SQL Server reserves the right to introduce any optimisation it wants, in creating an execution plan that will return the results, even if it means delivering those results in a different order. In short, you can’t guarantee that the results will be returned in the order you expect, unless you make it explicit with an ORDERBY statement.

So, we return to the perfectly reasonable requirement that the developer has of just being able to eyeball a representative sample of rows from the query. How should this be done?

SET ROWCOUNT and TABLESAMPLE: Do They Help?

There was a time when we had to use the SETROWCOUNT statement to put a limit on the number of rows returned. One drawback to that was that the query optimiser couldn’t create an effective plan in the light of the number of rows requested, because the ROWCOUNT is a session or procedure/trigger-wide setting that wasn’t visible to the Query Optimiser within the query.

Also, it was possible to forget that you’d set the ROWCOUNT and neglected to ‘unset’ it. Another disadvantage was that you couldn’t pass the value a variable. TOP is better, because it works at the statement level, and you can pass either a row value, or a percentage, as a variable or expression.

You might think that you can reliably get a limited number of rows from a table with the TABLESAMPLE clause. The only problem with it is that it doesn’t work as advertised, and even if it did work as advertised it only works on tables, rather than the whole rich variety of table sources.

Listing 2:

SELECT * FROM Sales.Customer TABLESAMPLE SYSTEM (5);


This is supposed to limit the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows. A quick test will show you why nobody uses it.

Listing 3:

DROP TABLE IF EXISTS #Result;
CREATE TABLE #Result (TheOrder INT IDENTITY, TheRowsReturned INT);
GO
INSERT INTO #Result (TheRowsReturned)
  SELECT Count(*) FROM Sales.Customer TABLESAMPLE(200 ROWS); 
GO 30
SELECT #Result.TheOrder, #Result.TheRowsReturned FROM #Result;


Getting Meaningful Table Samples With TOP…ORDER BY

So far, because TABLESAMPLE is broken, we are stuck with the slight awkwardness of this for getting a sample from a table.

Listing 4:

SELECT TOP 5 * FROM Sales.Customer


Why is this awkward? Listing 4 will give you back five rows, but you cannot entirely rely on which rows are returned, though it is likely to be in the order of the PRIMARYKEY, as we are just accessing a single table. However, sales.customer is a bit of a ‘trick’ table in that it uses polymorphic association, and the first 700 of the 19820 rows represent stores, not people. So, Listing 4 is likely to give a very unrepresentative sample of this table because You could easily get the incorrect impression of the data in the table, thinking that customers are stores when the majority are people!

What most developers will want is to see a few rows of the table they are investigating, taken at random, but if you want the sample in random order, then you have to make that explicit.

Listing 5:

SELECT TOP 5 * FROM Sales.Customer ORDER BY NewId()


This will return you five rows in a random order, but it takes more resources to return the result. If you aren’t dealing with a ‘trick’ table and just don’t care about order, and need to indicate that fact in the code, then SQL Server will accept any system function, such as @@version or host_name(), even ORDERBY (SELECT NULL). This trick is often seen where SQL Server refuses to accept code in a window function that requires an ORDERBY. It means 'I know, I know but I’m doing it deliberately.'

Listing 6:

SELECT TOP 10 * FROM Sales.Customer ORDER BY @@identity


If you are happy with the records you get by using TOP without the ORDERBY, then it is best to be entirely explicit, and point out that you really do want it by the PRIMARYKEY field

Listing 7:

SELECT TOP 5 *
  FROM Sales.Customer
  ORDER BY Customer.CustomerID;


Using TOP With ORDER BY for Reporting Queries

TOP comes into its own for reporting purposes. Managers love lists of top customers and top salesmen. At this point, the ORDERBY part becomes crucial.

Listing 8:

SELECT TOP 10 Person.BusinessEntityID, Sum(SalesOrderHeader.TotalDue) AS expenditure
  FROM Sales.SalesPerson
    INNER JOIN Sales.SalesOrderHeader
      ON SalesPerson.BusinessEntityID = SalesOrderHeader.SalesPersonID
    INNER JOIN Person.Person
      ON SalesPerson.BusinessEntityID = Person.BusinessEntityID
  GROUP BY Person.BusinessEntityID
  ORDER BY Sum(SalesOrderHeader.TotalDue) DESC;


This gives you the top ten performing salespeople.

We might decide that this report doesn’t really tell us who the salesperson was, so we’d tweak it.

Listing 9:

SELECT SalesPerformance.SalesValue,
  Coalesce(Person.Title + ' ', '') + Person.FirstName
  + Coalesce(' ' + Person.MiddleName, '') + ' ' + Person.LastName
  + Coalesce(' ' + Person.Suffix, '') AS SalesPerson
  FROM
    (
    SELECT TOP 10 SalesPerson.BusinessEntityID AS salesPerson,
      Sum(SalesOrderHeader.TotalDue) AS SalesValue
      FROM Sales.SalesPerson
        INNER JOIN Sales.SalesOrderHeader
          ON SalesPerson.BusinessEntityID = SalesOrderHeader.SalesPersonID
        INNER JOIN Person.Person
          ON SalesPerson.BusinessEntityID = Person.BusinessEntityID
      GROUP BY SalesPerson.BusinessEntityID
      ORDER BY Sum(SalesOrderHeader.TotalDue) DESC
    ) AS SalesPerformance(SalesPerson, SalesValue)
    INNER JOIN Person.Person
      ON SalesPerformance.SalesPerson = Person.BusinessEntityID
  ORDER BY SalesPerformance.SalesValue DESC


Why would we need to do that second ORDERBY? The original SQL was an aggregate query and we needed the top 10 aggregate sales totals, so we had to impose an order on it. This was passed, with no fixed order, to the outer query that added the person’s name. To be certain of the order of this outer query, it will need an explicit ORDERBY clause too. This is sometimes referred to as a 'presentation ORDERBY', or 'presentation ordering'.

Using FETCH-OFFSET Instead of TOP

The better way of doing this is, of course, to use the ORDERBY with the optional OFFSETFETCH clause, available in SQL Server 2012 onwards, instead of TOP. It is much more versatile and is standard ANSII SQL too. Here are the twenty longest-serving employees of AdventureWorks.

Listing 10:

SELECT Employee.JobTitle, Employee.HireDate,
  Coalesce(Person.Title + ' ', '') + Person.FirstName
  + Coalesce(' ' + Person.MiddleName, '') + ' ' + Person.LastName
  + Coalesce(' ' + Person.Suffix, '') AS Name
  FROM HumanResources.Employee
    INNER JOIN Person.Person
      ON Person.BusinessEntityID = Employee.BusinessEntityID
  ORDER BY Employee.HireDate ASC 
    OFFSET 0 ROWS FETCH FIRST 20 ROWS ONLY;



Now, with ORDERBY…OFFSET…ROWSFETCHFIRST…ROWSONLY, you can provide a means of scrolling or paging through this hall of fame.

Using TOP With INSERT, UPDATE, MERGE, or DELETE

It seems odd that you are discouraged from using TOP without and ORDERBY, when in certain circumstances you are actively prevented from doing so. As well as the SELECT statement, the DELETE, INSERT, MERGE and UPDATE statements all have a TOP clause. In contrast to SELECT, you can’t have an associated ORDERBYclause. Let’s take this example.

Listing 11:

DROP TABLE IF EXISTS #tempCustomer; --in case it exists
SELECT Customer.CustomerID, Customer.PersonID, Customer.StoreID,
  Customer.TerritoryID, Customer.AccountNumber, Customer.rowguid,
  Customer.ModifiedDate
INTO #tempCustomer
  FROM Sales.Customer --just for the test

UPDATE TOP (10) #tempCustomer
  SET #tempCustomer.AccountNumber = 
    Replace(#tempCustomer.AccountNumber, 'AW', 'PF')
OUTPUT Deleted.CustomerID, Deleted.AccountNumber AS before,
  Inserted.AccountNumber AS after 



Now try putting in an ORDERBY clause! It won’t allow it. As the documentation says:

“The rows referenced in the TOP expression used with INSERT, UPDATE, MERGE or DELETE are not arranged in any order”.

No, you must do something like this.

Listing 12:

UPDATE #tempCustomer
  SET #tempCustomer.AccountNumber = --
  Replace(#tempCustomer.AccountNumber, 'AW', 'PF')
OUTPUT Deleted.CustomerID, Deleted.AccountNumber AS before,
  Inserted.AccountNumber AS AFTER
  FROM
    (
    SELECT TOP 10 CustomerID
      FROM #tempCustomer
      ORDER BY #tempCustomer.CustomerID DESC
    ) AS ordered
  WHERE #tempCustomer.CustomerID = ordered.CustomerID
  GO  


Likewise, an INSERT statement. We can’t use TOP to insert rows in a meaningful chronological order. As the book says:

“When TOP is used with INSERT the referenced rows are not arranged in any order and the ORDERBY clause cannot be directly specified in this statement.”

If you need to do this, you must use TOP together with an ORDERBY clause that is specified in a sub-select statement.

DELETE has a TOP clause, but we can’t use it either. What if you wanted to clean out old purchase order details? You’d want to be certain to clean out the oldest ones first. We can’t put an ORDERBY in the delete statement, but then we don’t have to.

Let’s set up the test.

Listing 13:

DROP TABLE IF EXISTS #tempPurchaseOrderDetail; --in case it exists
SELECT POD.PurchaseOrderID, POD.PurchaseOrderDetailID, POD.DueDate,
  POD.OrderQty, POD.ProductID, POD.UnitPrice, POD.LineTotal, POD.ReceivedQty,
  POD.RejectedQty, POD.StockedQty, POD.ModifiedDate
INTO #tempPurchaseOrderDetail
  FROM Purchasing.PurchaseOrderDetail AS POD


We now delete the ten oldest purchase order details.

Listing 14:

DELETE FROM #tempPurchaseOrderDetail
OUTPUT Deleted.DueDate, Deleted.LineTotal, Deleted.PurchaseOrderID
  WHERE PurchaseOrderDetailID IN
          (
          SELECT TOP 10 PurchaseOrderDetailID
            FROM #tempPurchaseOrderDetail
            ORDER BY DueDate ASC
          );
GO


So, what is the point of having that TOP filter if it can’t be used to DELETE, INSERT, MERGE or UPDATE? Well, in fact, it can be used, in circumstances where a particular set of records don’t need to be deleted in a specific order, as long as they eventually get deleted.

The use of the TOP filter without the ORDERBY is a life-saver if you need, for example, to delete lots of rows, regularly, from an operational production system. Deletes are logged, and can also result in escalating locks. I once had to design a system that regularly cleared out a million rows from a SQL Server database. The optimum method involved taking a lot of successive bites when eating the elephant rather than attempting it in a mouthful.

We can illustrate this very easily, though you wouldn’t see the advantage until you have a working system, especially one that was needing to access the table while you were deleting, updating, inserting or merging. Again, we will illustrate the point with a temporary table so as not to interfere with the good working of AdventureWorks.

Listing 15:

DROP TABLE IF EXISTS #tempPurchaseOrderDetail; --in case it exists
SELECT POD.PurchaseOrderID, POD.PurchaseOrderDetailID, POD.DueDate,
  POD.OrderQty, POD.ProductID, POD.UnitPrice, POD.LineTotal, POD.ReceivedQty,
  POD.RejectedQty, POD.StockedQty, POD.ModifiedDate
INTO #tempPurchaseOrderDetail
  FROM Purchasing.PurchaseOrderDetail AS POD

--we delete rows successively
DECLARE @rowcount INT = 1
WHILE @rowcount > 0
  BEGIN
    DELETE TOP (200) FROM #tempPurchaseOrderDetail
      WHERE #tempPurchaseOrderDetail.DueDate < DateAdd(YEAR, -2, GetDate())
    SELECT @rowcount = @@RowCount
  END


I’ve found, in the past, that large-scale operations like this often benefit from being done in chunks, and the size of the chunk is a matter of fine-tuning with the operational system to get it right. For this sort of work, the TOPclause without the ORDERBY in a DELETE, INSERT or UPDATE can be very valuable for doing large-scale changes, a short step at a time, in a hard-working transactional system

Summary

SQL Server’s TOP clause in a SELECT statement is very useful and intuitive, but it allows you to leave out the associated ORDERBY clause that clarifies what you had in mind: TOP by what aspect? After all, your TOP ten songs aren’t the ten loudest, or the ten sung in the highest voice. They are the ten most popular in terms of the records sold. You may, fortuitously get the right result in development work, but in production, the workload, the server and the data size could result in the queries being optimised in very different ways, resulting in a different result.

For a more versatile way of dealing with such thing, I’d suggest using the ORDERBYOFFSETFETCH syntax introduced in SQL Server 2012 because it is more versatile and conformant. It isn’t too much harder to remember than the TOP filter either.

New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.

Topics:
database ,sql prompt ,code smells ,top ,select statement ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}