Query optimizer depends on statistics to estimate the cardinality or number of rows of a query result. In other words, it is the knowledge of how many rows are in a table, and how many of those rows satisfy the various search and join conditions, and so on. Based on this, optimizer creates the best query plan. Sometimes, tweaking the statistics helps to produce efficient query plan. In this article, we are going to explore the fundamentals of SQL Server statistics.
Getting Started
Before giving the deep drive into the topic, let's check some key concepts about the SQL Server statistics.
Statistics
Statistics are stored as BLOB objects which contain statistical information about the distribution of values in one or more columns of a table or indexed view. Query optimizer updates necessary statistics. If statistics don't exist then they will be created immediately and consumed by optimizer.
Histogram
A histogram is a visual representation of the distribution of quantitative data. In histogram, the entire range of values are divided into a series of intervals—and then find out how many values fall into each interval. For generating the histogram, the Query Optimizer sorts column values, counts how many match each distinct value, and groups them into up to 200 histogram steps. Each step covers a range of column values, excluding the boundary values, and ends with an upper bound. The first step's upper bound is the lowest sorted column value.
Density Vector
Density represents the number of duplicate values in a single or multiple column(s). It is calculated as 1/(number of distinct values). The Query Optimizer uses it to improve cardinality estimation of queries involving multiple columns from the same table or indexed view. As density decreases, selectivity increases. For example, assume there is a Students table where every student has a unique ID and many students belong to same Class. Creating an index on ID is more selective than index on Class. As, ID has lower density than Class.
Filtered Statistics
It depends on a filter predicate for retrieving the subset of well-defined data that is included in the statistics. It has better execution plan compared with full-table statistics.
Inspecting Statistics
You can check statistics using DBCC SHOW_STATISTICS statement. Below query was executed on AdventureWorks2022 database.
DBCC SHOW_STATISTICS ('[Person].[PersonPhone]','PK_PersonPhone_BusinessEntityID_PhoneNumber_PhoneNumberTypeID')
Diagram:- Displaying Statistics using DBCC SHOW_STATISTICS
You can also check it from SSMS. From Database Name -> Table -> Statistics.
Diagram:- Displaying Statistics using SSMS
Statistics Options
There are some database level settings which can be used to configure when statistics will be created or updated.
Diagram:- Statistics options
- Auto Create Statistics - when True, statistics is created on individual columns in the query predicate to improve cardinality estimation of the query plan. The statistics name starts with _WA.
- Auto Update Statistics - after insert/update/delete/merge operations, statistics may become out-of-date. When True, statistics is updated based on some threshold value. This is also called statistics recompilation.
- Auto Update Statistics Async - indicates whether statistics are updated synchronously or asynchronously. Default value is False i.e. statistics is updated synchronously.
- Auto Create Incremental Statistics - when True, statistics are created as per partition statistics. The default is False.
Creating statistics
Query optimizer generates statistics for below incidents:
- When new index is created. For filtered index, filtered statistics is created.
- When auto create statistics is True and query predicates has single columns.
Apart from this, you might opt for creating statistics manually to improve query plan in below scenarios:
- SQL Server Tuning Advisor recommends to create the statistics.
- Query predicate has multiple columns which have cross-column relationships and dependencies and those are separate index.
- Query selects from a subset of data.
- Query has some missing statistics to generate optimal execution plan.
While creating statistics, remember to set Auto Create Statistics to True.
For creating a new statistics on a column, you can use below statement:
use [AdventureWorks2022];
CREATE STATISTICS ModifiedDate ON Person.PersonPhone (ModifiedDate);
Missing Statistics
Query optimizer might suspends generating statistics due to any error or event. it will try to generate again next time the query is executed. In execution plan, it will be shown as warning. Apart from this, Missing Column Statistics event class of SQL Server Profiler hints missing statistics. For mitigation, you can perform:
- Check Auto Create Statistics and Auto Update Statistics are set as True.
- Ensure that the database is not read-only
- Re-generate statistics manually by using the CREATE STATISTICS statement.
Identifying out-of-date statistics
SQL Server itself updates statistics when required. You can identify when it was last updated by running sys.dm_db_stats_properties. Below query returns statistics of all tables, indexed views of your database.
SELECT DB_NAME() AS DatabaseName, o.name, o.object_id, s.name, s.stats_id, rows, rows_sampled, last_updated, modification_counter
FROM sys.objects AS o
INNER JOIN sys.stats AS s ON s.object_id = o.object_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
ORDER BY last_updated DESC
Diagram:- Statistics information
Some key columns of sys.dm_db_stats_properties are:
- last_updated - when the statistics was last updated
- rows - number of rows were in the table when when the statistics was last updated
- rows_sampled - how many rows were used for sampling purpose to generate the statistics
- modification_counter - how many modifications were done after the statistics were generated
Updating statistics
In some cases, you will need to intervene to update the statistics manually for improving query performance. You may think of updating statistics in below scenarios:
- Query is running slower than expectations
- Inserting data into ascending or descending key columns such as IDENTITY or timestamp columns
- Truncating a table or performing a bulk insert
While updating statistics, remember to set Auto Update Statistics to True.
Frequently updating statistics will incur performance issue as it will force to recompile execution plan. Remember, generating an execution plan has some costs. So, before updating statistics manually, think whether you actually need it.
For updating, you can use below statement:
UPDATE STATISTICS Person.PersonPhone [ModifiedDate] WITH AUTO_DROP = ON;
Deleting statistics
For deletion, you can use:
DROP STATISTICS Person.PersonPhone.[ModifiedDate];
Conclusion
Hope you understand the in-depth concept of statistics. In case a query is performing poorly, you must ensure that it is not using out-of-date statistics.