When to Use the SELECT...INTO Statement
When to Use the SELECT...INTO Statement
SELECT...INTO is a good way of making a table-source temporarily persistent as part of a process — if you don't care about constraints, indexes, or special columns.
Join the DZone community and get the full member experience.Join For Free
New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.
We can use
SELECT...INTO in SQL Server to create a new table from a table source. SQL Server uses the attributes of the expressions in the
SELECT list to define the structure of the new table.
Prior to SQL Server 2005, the use of
SELECT...INTO in production code was a performance "code smell" because it acquired schema locks on system tables of the database, causing SQL Server to appear unresponsive while the query was being executed. This is because it is a DDL statement in an implicit transaction, which is inevitably long-running since the data is inserted within the same SQL Statement. However, this behavior was fixed in SQL Server 2005, when the locking model changed.
SELECT...INTO became popular because it was a faster way of inserting data than using
INTO...SELECT.... This was mainly due to the
SELECT...INTO operation being, where possible, bulk-logged. Although can now be bulk-logged, you may still see this performance advantage in SQL Server 2012 and 2014, because
SELECT...INTO can be parallelized on these versions, whereas support for parallelization of only emerged in SQL Server 2016. However, with
SELECT...INTO, you still have the task of defining all required indexes and constraints, and so on, on the new table.
SELECT...INTO is still a useful shortcut for development work, but it no longer has a clear performance advantage, and is of doubtful value for a live database. It is not part of the SQL Standard. It is usually easier to work with tables created with a statement because you then have the advantage of specifying constraints and datatypes in advance, and this also means that you're less likely to allow inconsistencies to sneak into the data.
A recommendation to avoid the use of
SELECT...INTO, for production code, is included as a code analysis rule in SQL Prompt (PE003).
Creating Tables Using the SELECT INTO Statement
SELECT...INTO feature was designed to store or "persist" a table source as part of a process. Here is a simple example:
However, a table source can be many things other than a conventional table, such as a user-defined function, an
OPENXML clause, a derived table, a joined table, a pivoted table, a remote data source, a table variable, or a variable function. It is with these more exotic table sources that the
SELECT...INTO syntax becomes more useful.
It is often used with the misunderstanding that it is a quick way of copying tables, and so it comes as a surprise that none of the indexes, constraints, calculated columns, or triggers defined in the source table are transferred to the new table. They can't be specified in the
SELECT...INTO statement, either. It also does nothing about nullability or preserving computed columns. All these tasks must be done retrospectively with the data in place, which inevitably takes time.
You can, though, use the function
IDENTITY (datatype, seed, increment) to set up an identity field, and it is possible — when the source is a single table — to make a column of the destination table into an identity column. It is this fact that probably leads developers to assume that it will transfer other column attributes.
Furthermore, it also can't create partitioned tables, sparse columns, or any other attribute inherited from a source table. How could it when the data might be coming from a query involving many joins, or from some exotic external data source?
As of SQL 2012 SP1 CU10,
SELECT...INTO can be executed in parallel. However, since SQL Server 2016, Parallel Insert has been allowed on the conventional
INSERT INTO...SELECT statement, with certain restrictions, so any performance advantage of using
SELECT...INTO is now rather diminished. The
INSERTINTO process can also be speeded up if it can be bulk-logged rather than fully recovered by setting the recovery model to simple or bulk-logged, inserting into an empty table or a heap, and setting the
TABLOCK hint for the table.
The following summarizes some of the restrictions and limitations when using
IDENTITYproperty of a column is transferred, but not if:
SELECTstatement contains joined tables (using either
GROUPBYclause, or aggregate function. If you need to avoid an
IDENTITYproperty from being carried over to the new table but need the values of the column, it is worth adding a
JOINto your table-source on a condition that is never true, or a
UNIONthat provides no rows.
IDENTITYcolumn is listed more than one time in the
IDENTITYcolumn is part of an expression.
IDENTITYcolumn is from a remote data source.
- You cannot
SELECT...INTOeither a table-valued parameter or a table variable as the destination, though you can select
- Even if your source is a partitioned table, the new table is created in the default filegroup. However, in SQL Server 2017, it is possible to specify the filegroup in which the destination table is created, via the
- You can specify an
ORDERBYclause, but it is generally ignored. Because of this, the order of
- When a computed column is included in the
SELECTlist, the corresponding column in the new table is not a computed column. The values in the new column are the values that were computed at the time
- As with a
CREATETABLEstatement, if a
SELECT...INTOstatement is contained within an explicit transaction, the underlying row(s) in the affected system tables are exclusively locked until the transaction is explicitly committed. In the meantime, this will result in blocks on other processes that use these system tables.
There is some confusion about problems that can happen with the use of
SELECT...INTO using temporary tables.
SELECT...INTO has gained a somewhat unfair reputation for this, but it was part of a more general problem involving latch contention in
tempdb under a heavy load of small temp table creation and deletion. When
SELECT...INTO was adopted with enthusiasm, it could greatly increase this type of activity. The problem could be easily fixed in SQL Server 2000 onwards with the introduction of the use of trace flag TF1118, which is no longer required from SQL Server 2016 onwards. For a full explanation, see Misconceptions Around TF 1118.
SELECT...INTO is a good way of making a table-source temporarily persistent as part of a process — if you don't care about constraints, indexes, or special columns. It is not a good way of copying a table because only the barest essentials of the table schema can be copied. Over the years, there have been factors that have increased or decreased the attraction of
SELECT...INTO, but it isn't part of the ANSII standard and it is definitely a good idea to do without it whenever possible. Instead, create a table explicitly, with the full range of features that the table possesses that are designed to ensure that data is consistent.
Published at DZone with permission of Phil Factor . See the original article here.
Opinions expressed by DZone contributors are their own.