Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Query to Retrieve Statistics Data: dm_db_stats_histogram

DZone's Guide to

Query to Retrieve Statistics Data: dm_db_stats_histogram

While it's not going to make you coffee in the morning, the new DMF opens up possibilities to easily and quickly access information about your statistics.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Starting with SQL Server 2016 Sp1 CU2, a new way of directly querying statistics, specifically the histogram, has been introduced: dm_db_stats_histogram. We've always been able to query the statistics using DBCC SHOW_STATISTICS. However, the output, three different result sets with three different structures, made automating access to statistics information a pain. Now, we have more capability through dm_db_stats_histogram.

dm_db_stats_histogram

To access the information in dm_db_stats_histogram, you just have to pass in the object_id and the statistics_id values for the statistics you're interested in, like this:

SELECT *
FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'),
                               1) AS ddsh;

It's very straightforward to use. The results look like this:

Handy right? Now you can query the histogram directly. Yeah, I hear a few of you saying... and this helps me... how? Here's an example. This query will quickly find the rang_hi_key value set that a given value falls within:

WITH histo
AS (SELECT ddsh.step_number,
           ddsh.range_high_key,
           ddsh.range_rows,
           ddsh.equal_rows,
           ddsh.average_range_rows
    FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'),
                                   1) AS ddsh ),
     histojoin
AS (SELECT h1.step_number,
           h1.range_high_key,
           h2.range_high_key AS range_high_key_step1,
           h1.range_rows,
           h1.equal_rows,
           h1.average_range_rows
    FROM histo AS h1
        LEFT JOIN histo AS h2
            ON h1.step_number = h2.step_number + 1)
SELECT hj.range_high_key,
       hj.equal_rows,
       hj.average_range_rows
FROM histojoin AS hj
WHERE hj.range_high_key >= 17
      AND (   hj.range_high_key_step1 < 17
              OR hj.range_high_key_step1 IS NULL);

With a little work you could create a function and instead of browsing through the histogram looking at range_hi_key values to try to figure out which one applies, you can just enter the value and you get immediate feedback. This is very useful to quickly understand why the optimizer chose a plan because you can quickly determine row counts for a given value from the statistics referenced inside the execution plan (yeah, you can get those now too). You get the idea. dm_db_stats_histogram just makes things more efficient.

Conclusion

While dm_db_stats_histogram is not going to make you coffee in the morning, the new DMF opens up possibilities to easily and quickly access information about your statistics. There's also dm_db_stats_properties to retrieve the header information on the statistics. These new bits of functionality just make life a wee bit easier.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,database query ,statistics ,data access

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}