Same Query, Different Servers, Different Performance. Now What?
On your production server, the query runs for three hours before it returns, but on the test server it returns in about three seconds. What. The. Hell.
Join the DZone community and get the full member experience.Join For Free
based on the number of times i see this question on forums, it must be occurring all the time. you have two different servers that, as far as you know, are identical in terms of their options and setup (although not necessarily in terms of power, think of a test or pre-production system versus production). on these servers, you have a database on each that, as far as you know, is the same as the other in terms of options, objects, or maybe even data (although, this does mean that you have unmasked production information in your qa environment, which potentially means you’re going to jail , and you might want to address this, especially now that i’ve told you about it, mens rea—you’re welcome). on each database you run, as far as you know, the exact same query (whether a straight up ad hoc query, stored procedure, parameterized query, whatever). on your production server, let’s call it prod, the query runs for three hours before it returns, but on the test server, let’s call it test, it returns in about three seconds.
what. the. hell.
this is especially troubling because prod has eight processors that are much more powerful than test, which has only two (either can go parallel, so this is the same). test only has 16gb of ram and prod has 64gb. furthermore, you know that the databases, data, and statistics are the same because you restored the test database from prod (although, see that jail thing above). however, you’re on top of things. you capture the query metrics so you know exactly the amount of time and the reads or writes from each query and you even do this using extended events so you’re minimizing observer effects. you also have monitoring in place, so you know that there is no blocking in prod. the query is just a pig there, but not on test. you’ve even gone so far as to get an estimated execution plan from each server and they are absolutely different.
well, obviously, something somewhere is different. start by comparing everything on both servers and both databases down to…hang on, here, we’ll write a powershell script and then….
wait, wait, wait!
you have the execution plans? before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. they’re different, so we should start looking at scans and indexes and statistics and….
let’s start simple and clear. let’s take a look at the properties of the first operator:
this contains a ton of information about the settings of the systems where each plan was generated. let’s assume that, probably, the databases are the same, as is the data and the statistics, but the problem is a system or database setting. these differences can result in different execution plans and therefore different execution times. if we are dealing with two different servers and we are fairly sure the data, structure, and the statistics are the same, the properties of the first operator are a great starting point for understanding what went wrong.
oh, and the first operator is this one with the red square around it:
let’s use the new ssms compare showplan to see the differences between our two execution plans that we captured:
there are a bunch of differences highlighted, but one ought to jump out pretty quick. that’s right, these two databases have different compatibility levels which resulted in one of them using the new optimizer and one using the old optimizer as evidenced by the—
fun point, both have identical query hash values. not surprising, but additional, quick, validation that we really are talking about the same query on each server (and one of them isn’t using a trace flag to change the cardinality estimator). you now have enough information to go and make some changes to your system without lots of further fumbling and work.
when dealing with the same query from two servers that ought to result in similar behavior, but doesn’t, get the execution plans (estimated plans are fine here) and compare the the properties of the first operator. that’s the quickest way to identify the issues that could be leading to the differences between the servers.
want to talk more about execution plans and query tuning? let’s do it.
in august, i’ll be doing an all day pre-con at sqlserver geeks annual summit in bangalore india.
i’m also going to be in oslo norway for a pre-con before sql saturday oslo in september.
Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.