There Is No Difference Between Table Variables, Temporary Tables, and Common Table Expressions
If you think the title of this article is true, you have another thing coming. Read on to find out why.
Join the DZone community and get the full member experience.
Join For Freei actually saw the above statement posted online. the person making the claim further stated that choosing between these three constructs was personal preference" and didn't change at all the way sql server would choose to deal with them in a query.
let’s immediately say, right up front, the title is wrong. yes, there are very distinct differences between these three constructs. yes, sql server will absolutely deal with these three constructs in different ways. no, picking which one is correct in a given situation is not about personal preference, but rather about the differences in behavior among the three.
to illustrate just a few of the differences between these three constructs, i'll use variations of this query:
select * from sales.orders as o
join sales.orderlines as ol
on ol.orderid = o.orderid
where ol.stockitemid = 227;
the execution plan for this query looks like this:
the number of reads is 1,269, and the duration is around 234ms on average.
let's modify the query to use a table variable. note, i do include a primary key with the table variable which can be used by the optimizer to make decisions based on unique values:
declare @orderlines table
(orderlineid int not null primary key,
orderid int not null,
stockitemid int not null,
description nvarchar(100) not null,
packagetypeid int not null,
quantity int not null,
unitprice decimal(18,2) null,
taxrate decimal(18,3) not null,
pickedquantity int not null,
pickingcompletedwhen datetime2 null,
lasteditedby int not null,
lasteditedwhen datetime2 not null);
insert @orderlines
(orderlineid,
orderid,
stockitemid,
description,
packagetypeid,
quantity,
unitprice,
taxrate,
pickedquantity,
pickingcompletedwhen,
lasteditedby,
lasteditedwhen
)
select *
from sales.orderlines as ol
where ol.stockitemid = 227;
select * from sales.orders as o
join @orderlines as ol
on ol.orderid = o.orderid
where ol.stockitemid = 227;
i'm not concerned with how long it takes the data to load, only the behavior of the query after i load the data. here's the execution plan:
not much to say. clearly, it's different from the regular query, but that shouldn't be a shock because we're dealing with different tables. overall, the number of reads goes to 1,508 because we're messing with data twice. performance for the whole process is about 260ms. breaking it down by statement within the batch, so that we can get a very fair comparison, the active part of the query we're concerned with, the join between the table and the table variable, runs in about 250ms and has only 356 reads.
modifying the query again for temporary tables, it looks like this:
create table #orderlines
(orderlineid int not null primary key,
orderid int not null,
stockitemid int not null,
description nvarchar(100) not null,
packagetypeid int not null,
quantity int not null,
unitprice decimal(18,2) null,
taxrate decimal(18,3) not null,
pickedquantity int not null,
pickingcompletedwhen datetime2 null,
lasteditedby int not null,
lasteditedwhen datetime2 not null);
insert #orderlines
(orderlineid,
orderid,
stockitemid,
description,
packagetypeid,
quantity,
unitprice,
taxrate,
pickedquantity,
pickingcompletedwhen,
lasteditedby,
lasteditedwhen
)
select * from sales.orderlines as ol
where ol.stockitemid = 227;
select * from sales.orders as o
join #orderlines as ol
on ol.orderid = o.orderid
where ol.stockitemid = 227;
drop table #orderlines;
the new execution plan looks like this:
don't go getting all excited. i recognize that these two plans look similar, but they are different. first, let me point out that we have more reads, with 1,546, and an increase in duration to 273ms. this comes from two places. first, we're creating statistics on the data in the temporary table where none exist on the table variable. second, because i want to run this script over and over, i'm including the drop table statement, which is adding overhead that i wouldn't see if i treated it like the table variable (which i could, but not here). however, breaking it down to the statement level, i get 250ms duration, just like with the table variable, but i see 924 reads.
what's going on?
note first the estimated costs between the two exec plans, 50/50 for the query with the table variable and 2/98 for the temporary table. why? well, let's compare the two plans (and yeah, i love the new ssms plan compare functionality). specifically, let's look at each clustered index scan operation. there are a number of differences, but the most telling is right here:
on the left is the temporary table. on the right is the table variable. note the tablecardinality values. the table variable shows zero because there are no statistics, despite the table created having a primary key. in this case, it doesn't make an appreciable difference in behavior from a pure performance standpoint (250ms to 250ms), but you can clearly see differences in behavior.
oh, and the cte? it had the same execution plan as the original query because a cte is not a table, it's an expression .
in short, yes, there are very distinct differences in behavior between a table variable, a temporary table, and a common table expression. these are not constructs that are interchangeable on a whim. you need to understand what each does in order to use each appropriately.
Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
What Is mTLS? How To Implement It With Istio
-
Web Development Checklist
-
Essential Architecture Framework: In the World of Overengineering, Being Essential Is the Answer
-
Design Patterns for Microservices: Ambassador, Anti-Corruption Layer, and Backends for Frontends
Comments