Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Query to Retrieve Statistics Data: dm_db_stats_histogram

DZone's Guide to

Query to Retrieve Statistics Data: dm_db_stats_histogram

While it's not going to make you coffee in the morning, the new DMF opens up possibilities to easily and quickly access information about your statistics.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Starting with SQL Server 2016 Sp1 CU2, a new way of directly querying statistics, specifically the histogram, has been introduced: dm_db_stats_histogram. We've always been able to query the statistics using DBCC SHOW_STATISTICS. However, the output, three different result sets with three different structures, made automating access to statistics information a pain. Now, we have more capability through dm_db_stats_histogram.

dm_db_stats_histogram

To access the information in dm_db_stats_histogram, you just have to pass in the object_id and the statistics_id values for the statistics you're interested in, like this:

SELECT *
FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'),
                               1) AS ddsh;

It's very straightforward to use. The results look like this:

Handy right? Now you can query the histogram directly. Yeah, I hear a few of you saying... and this helps me... how? Here's an example. This query will quickly find the rang_hi_key value set that a given value falls within:

WITH histo
AS (SELECT ddsh.step_number,
           ddsh.range_high_key,
           ddsh.range_rows,
           ddsh.equal_rows,
           ddsh.average_range_rows
    FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'),
                                   1) AS ddsh ),
     histojoin
AS (SELECT h1.step_number,
           h1.range_high_key,
           h2.range_high_key AS range_high_key_step1,
           h1.range_rows,
           h1.equal_rows,
           h1.average_range_rows
    FROM histo AS h1
        LEFT JOIN histo AS h2
            ON h1.step_number = h2.step_number + 1)
SELECT hj.range_high_key,
       hj.equal_rows,
       hj.average_range_rows
FROM histojoin AS hj
WHERE hj.range_high_key >= 17
      AND (   hj.range_high_key_step1 < 17
              OR hj.range_high_key_step1 IS NULL);

With a little work you could create a function and instead of browsing through the histogram looking at range_hi_key values to try to figure out which one applies, you can just enter the value and you get immediate feedback. This is very useful to quickly understand why the optimizer chose a plan because you can quickly determine row counts for a given value from the statistics referenced inside the execution plan (yeah, you can get those now too). You get the idea. dm_db_stats_histogram just makes things more efficient.

Conclusion

While dm_db_stats_histogram is not going to make you coffee in the morning, the new DMF opens up possibilities to easily and quickly access information about your statistics. There's also dm_db_stats_properties to retrieve the header information on the statistics. These new bits of functionality just make life a wee bit easier.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,database query ,statistics ,data access

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}