Mastering SQL Server Statistics Updates for Query Optimization

Applies to: SQL Server, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, Analytics Platform System (PDW), SQL analytics endpoint in Microsoft Fabric, Warehouse in Microsoft Fabric, SQL database in Microsoft Fabric

In the realm of SQL Server database management, ensuring optimal query performance is paramount. Query optimization statistics are fundamental metadata that the SQL Server query optimizer uses to create efficient query execution plans. Outdated or missing statistics can lead to suboptimal plans, resulting in slow query performance and increased resource consumption. The UPDATE STATISTICS statement in Transact-SQL is a critical tool for database administrators and developers to maintain and enhance database performance by refreshing these vital statistics. While SQL Server automatically updates statistics as needed, understanding when and how to manually trigger updates using UPDATE STATISTICS provides greater control over query optimization.

This article delves into the intricacies of the UPDATE STATISTICS statement, exploring its syntax, options, and best practices for leveraging it to achieve peak SQL Server performance. We will cover scenarios where manual statistics updates are beneficial, the various sampling methods available, and advanced options for fine-tuning statistics management.

Syntax

-- Syntax for SQL Server and Azure SQL Database
UPDATE STATISTICS table_or_indexed_view_name
[ { { index_or_statistics__name } | ( { index_or_statistics_name } [ ,...n ] ) } ]
[ WITH
    [ FULLSCAN [ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
    | SAMPLE number { PERCENT | ROWS } [ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
    | RESAMPLE [ ON PARTITIONS ( { <partition_number> | <range> } [ , ...n] ) ]
    | <update_stats_stream_option> [ ,...n ]
    ]
    [ [ , ] [ ALL | COLUMNS | INDEX ] ]
    [ [ , ] NORECOMPUTE ]
    [ [ , ] INCREMENTAL = { ON | OFF } ]
    [ [ , ] MAXDOP = max_degree_of_parallelism ]
    [ [ , ] AUTO_DROP = { ON | OFF } ]
] ;

<update_stats_stream_option> ::=
    [ STATS_STREAM = stats_stream ]
    [ ROWCOUNT = numeric_constant ]
    [ PAGECOUNT = numeric_constant ]
-- Syntax for Azure Synapse Analytics and Parallel Data Warehouse
UPDATE STATISTICS [ schema_name . ] table_name
[ ( { statistics_name | index_name } ) ]
[ WITH { FULLSCAN | SAMPLE number PERCENT | RESAMPLE } ]
[;]
-- Syntax for Microsoft Fabric
UPDATE STATISTICS [ schema_name . ] table_name
[ ( { statistics_name } ) ]
[ WITH { FULLSCAN | SAMPLE number PERCENT } ]
[;]

Note: Serverless SQL pool in Azure Synapse Analytics does not support this syntax.

Arguments

table_or_indexed_view_name

Specifies the name of the table or indexed view containing the statistics object that you want to update.

index_or_statistics_name or statistics_name | index_name or statistics_name

Indicates the name of a specific index or statistics object to update. If you don’t provide a name, UPDATE STATISTICS will refresh all statistics for the specified table or indexed view. This encompasses statistics created via CREATE STATISTICS, single-column statistics generated when AUTO_CREATE_STATISTICS is enabled, and statistics associated with indexes.

You can use the stored procedure sp_helpindex to list all indexes for a table or view. For details on AUTO_CREATE_STATISTICS, refer to ALTER DATABASE SET Options.

FULLSCAN

This option instructs SQL Server to compute statistics by reading all rows in the table or indexed view. FULLSCAN and SAMPLE 100 PERCENT produce identical results. You cannot use FULLSCAN in conjunction with the SAMPLE option.

SAMPLE number { PERCENT | ROWS }

The SAMPLE option lets you specify an approximate percentage or number of rows for the query optimizer to sample when updating statistics. For PERCENT, number can range from 0 to 100, and for ROWS, number can be from 0 to the total number of rows. It’s important to note that the actual sample size might not precisely match the specified percentage or number, as the optimizer reads entire data pages.

SAMPLE is most beneficial in specific scenarios where the default sampling might lead to suboptimal query plans. However, in most cases, specifying SAMPLE is unnecessary. The query optimizer’s default sampling is generally sufficient to determine a statistically significant sample size for generating high-quality query plans.

Note: In SQL Server 2016 (13.x) with database compatibility level 130, parallel data sampling was introduced to enhance statistics collection performance for large tables. However, starting with SQL Server 2017 (14.x), the behavior reverted to serial scanning to mitigate potential latch contention issues, regardless of compatibility level. The rest of the statistics update process can still utilize parallel execution if applicable.

You cannot use SAMPLE with the FULLSCAN option. When neither SAMPLE nor FULLSCAN is specified, the query optimizer defaults to sampled data and automatically determines the sample size.

It’s generally discouraged to use 0 PERCENT or 0 ROWS. While these options will update the statistics object, they will result in empty statistics with no data.

While full scans can offer more accurate estimates than sampled scans, especially for skewed data distributions, the performance benefits for complex query plans might be marginal. For most workloads, default sampling is adequate, and full scans are not required. For a deeper understanding, refer to Statistics.

RESAMPLE

The RESAMPLE option directs SQL Server to update each statistic using its previously used sample rate. This can be useful to maintain consistency in sampling, especially after data modifications.

Using RESAMPLE can sometimes lead to a full-table scan. For instance, index statistics typically employ a full-table scan for their sample rate. When no sampling options (SAMPLE, FULLSCAN, RESAMPLE) are specified, the query optimizer will sample the data and calculate the sample size by default.

Important: RESAMPLE is not supported in Warehouse in Microsoft Fabric.

PERSIST_SAMPLE_PERCENT = { ON | OFF }

Applies to: SQL Server 2016 (13.x) SP1 CU4, SQL Server 2017 (14.x) SP1, SQL Server 2019 (15.x) and later versions, Azure SQL Database, Azure SQL Managed Instance

The PERSIST_SAMPLE_PERCENT option, when set to ON, ensures that the statistics retain the explicitly set sampling percentage for subsequent updates that don’t specify a sampling percentage. When set to OFF (the default), the sampling percentage will revert to the default sampling in subsequent updates without an explicit percentage specified.

You can inspect the persisted sample percent value for a statistic using DBCC SHOW_STATISTICS and sys.dm_db_stats_properties.

If AUTO_UPDATE_STATISTICS is triggered, it will utilize the persisted sampling percentage if available; otherwise, it will fall back to the default sampling percentage. The behavior of RESAMPLE is not affected by this option.

Truncating a table will reset all statistics on the underlying heap or B-tree (HoBT) to use the default sampling percentage.

Note: In earlier SQL Server versions, rebuilding an index with statistics updated using PERSIST_SAMPLE_PERCENT would reset the persisted sample percent to default. However, starting with SQL Server 2016 (13.x) SP2 CU17, SQL Server 2017 (14.x) CU26, and SQL Server 2019 (15.x) CU10, the persisted sample percent is preserved even after index rebuilds.

ON PARTITIONS ( { <partition_number> | } [,…n] )

Applies to: SQL Server 2014 (12.x) and later

For partitioned tables and indexed views, the ON PARTITIONS clause forces recomputation of leaf-level statistics for the specified partitions. These partition statistics are then merged to construct global statistics. WITH RESAMPLE is mandatory when using ON PARTITIONS because partition statistics built with varying sample rates cannot be merged directly.

ALL | COLUMNS | INDEX

These options control which statistics are updated.

  • ALL: Updates all existing statistics on the table or indexed view.
  • COLUMNS: Updates statistics created on individual columns.
  • INDEX: Updates statistics created for indexes.

If none of these options are specified, UPDATE STATISTICS updates all statistics on the table or indexed view, encompassing all types.

NORECOMPUTE

The NORECOMPUTE option disables the automatic statistics update feature, AUTO_UPDATE_STATISTICS, for the specified statistics. When used, SQL Server performs the statistics update once and then deactivates future automatic updates for these statistics.

To re-enable AUTO_UPDATE_STATISTICS, execute UPDATE STATISTICS again without the NORECOMPUTE option or use the stored procedure sp_autostats.

Warning: Using NORECOMPUTE can potentially lead to suboptimal query plans over time as data changes and statistics become stale. Exercise caution when using this option and reserve it for qualified database administrators in specific scenarios where manual control over statistics updates is desired.

For more information about AUTO_STATISTICS_UPDATE, see ALTER DATABASE SET Options.

INCREMENTAL = { ON | OFF }

Applies to: SQL Server 2014 (12.x) and later

The INCREMENTAL option is relevant for partitioned tables and indexed views.

  • ON: Statistics are recreated as per partition statistics. This is generally more efficient for large partitioned objects as it only updates statistics for partitions that have changed.
  • OFF: The entire statistics tree is dropped, and SQL Server recomputes statistics for the entire table or indexed view. This can be more resource-intensive, especially for large objects. OFF is the default behavior.

An error will occur if you attempt to use incremental statistics on statistics types that do not support them. Incremental statistics are not supported for:

  • Statistics on indexes that are not partition-aligned with the base table.
  • Statistics on Always On readable secondary databases.
  • Statistics on read-only databases.
  • Statistics on filtered indexes.
  • Statistics on views.
  • Statistics on internal tables.
  • Statistics on spatial or XML indexes.

MAXDOP = max_degree_of_parallelism

Applies to: SQL Server (Starting with SQL Server 2016 (13.x) SP2 and SQL Server 2017 (14.x) CU3)

The MAXDOP option overrides the max degree of parallelism server configuration option specifically for the duration of the statistics operation. This allows you to control the number of processors used for parallel statistics operations. For more details, see Configure the max degree of parallelism Server Configuration Option. MAXDOP helps limit resource consumption during statistics updates. The maximum value is 64 processors.

  • 1: Suppresses parallel plan generation, forcing serial execution.
  • >1: Restricts the maximum number of processors used for parallel statistics operations to the specified number or fewer, depending on the current system workload.
  • 0 (default): Uses the actual number of processors available or fewer, based on the system workload.

update_stats_stream_option

These options (STATS_STREAM, ROWCOUNT, PAGECOUNT) are for internal use only and are not intended for general use. They are mentioned for informational purposes and might not be supported in future versions.

AUTO_DROP = { ON | OFF }

Applies to: SQL Server 2022 (16.x) and later.

The AUTO_DROP option addresses scenarios where statistics created by third-party tools might inadvertently block schema modifications.

When AUTO_DROP = ON is specified during statistics creation or update, the statistics object is created in a mode where schema changes will not be blocked. Instead, the statistics will be automatically dropped if they would otherwise impede a schema change. This behavior mirrors that of auto-created statistics.

Note: Attempting to explicitly set or unset the AUTO_DROP property on auto-created statistics might result in errors, as auto-created statistics always inherently use auto-drop behavior. In some cases, restoring backups might lead to an incorrect setting of this property until the next statistics update, but auto-created statistics will consistently behave as auto-drop statistics.

Remarks

When to UPDATE STATISTICS

While SQL Server automatically manages statistics updates, there are scenarios where manual updates using UPDATE STATISTICS are beneficial:

  • After substantial data modifications: If a table has undergone significant INSERT, UPDATE, or DELETE operations, the existing statistics might no longer accurately reflect the data distribution. Manually updating statistics ensures the query optimizer has current information for plan generation.
  • After index rebuilds or reorganizations: Rebuilding or reorganizing indexes can change data distribution, making it prudent to update index statistics afterward.
  • Before critical query execution: For performance-sensitive queries, especially after known data changes, proactively updating statistics can help ensure optimal plan selection.
  • Troubleshooting query performance issues: If you suspect outdated statistics are contributing to slow queries, manually updating statistics is a crucial troubleshooting step.
  • Controlling statistics update frequency: In specific environments, you might want to manage statistics update schedules more precisely than the automatic updates provide.

However, avoid overly frequent statistics updates. There’s a trade-off: while up-to-date statistics improve query plans, the recompilation of query plans triggered by statistics updates consumes resources. The optimal update frequency depends on your application’s data modification patterns and performance sensitivity.

For a deeper understanding of when to update statistics, refer to Statistics.

Limitations

  • UPDATE STATISTICS is not supported for external tables. To update statistics on external tables, you must drop and recreate the statistics.
  • The MAXDOP option is incompatible with STATS_STREAM, ROWCOUNT, and PAGECOUNT options.
  • The MAXDOP option’s effectiveness is limited by the Resource Governor workload group MAX_DOP setting, if configured.

Update all statistics with sp_updatestats

To update statistics for all user-defined and internal tables within a database, you can use the stored procedure sp_updatestats. The following example demonstrates how to update all statistics in the current database:

EXEC sp_updatestats;

Automatic index and statistics management

Consider leveraging automated solutions like Adaptive Index Defrag to streamline index defragmentation and statistics updates across databases. These tools can dynamically decide whether to rebuild or reorganize indexes based on fragmentation levels and apply linear thresholds for statistics updates.

Determining the Last Statistics Update

To find out when statistics were last updated, use the STATS_DATE function. This function is invaluable for monitoring statistics staleness and planning update schedules.

PDW / Azure Synapse Analytics Specifics

The following syntax elements are not supported in Analytics Platform System (PDW) / Azure Synapse Analytics:

UPDATE STATISTICS t1 (a,b);
UPDATE STATISTICS t1 (a) WITH SAMPLE 10 ROWS;
UPDATE STATISTICS t1 (a) WITH NORECOMPUTE;
UPDATE STATISTICS t1 (a) WITH INCREMENTAL = ON;
UPDATE STATISTICS t1 (a) WITH stats_stream = 0x01;

Permissions

Executing UPDATE STATISTICS requires ALTER permission on the target table or view.

Examples

A. Update all statistics on a table

This example updates all statistics for the SalesOrderDetail table in the AdventureWorks2022 database.

USE AdventureWorks2022;
GO
UPDATE STATISTICS Sales.SalesOrderDetail;
GO

B. Update statistics for a specific index

This example updates only the statistics associated with the AK_SalesOrderDetail_rowguid index on the SalesOrderDetail table.

USE AdventureWorks2022;
GO
UPDATE STATISTICS Sales.SalesOrderDetail AK_SalesOrderDetail_rowguid;
GO

C. Update statistics using 50 percent sampling

This example first creates statistics named Products on the Name and ProductNumber columns of the Product table, using a 50 percent sample. Then, it updates these Products statistics again, maintaining the 50 percent sampling rate.

USE AdventureWorks2022;
GO
CREATE STATISTICS Products ON Production.Product ([Name], ProductNumber) WITH SAMPLE 50 PERCENT
-- Time passes. The UPDATE STATISTICS statement is then executed.
UPDATE STATISTICS Production.Product(Products) WITH SAMPLE 50 PERCENT;

D. Update statistics using FULLSCAN and NORECOMPUTE

This example updates the Products statistics on the Product table, forcing a full scan of all rows and disabling automatic future updates for these statistics using NORECOMPUTE.

USE AdventureWorks2022;
GO
UPDATE STATISTICS Production.Product(Products) WITH FULLSCAN, NORECOMPUTE;
GO

Examples: Azure Synapse Analytics and Analytics Platform System (PDW)

E. Update statistics on a table in Azure Synapse Analytics

This example updates the CustomerStats1 statistics on the Customer table in Azure Synapse Analytics or Analytics Platform System (PDW).

UPDATE STATISTICS Customer (CustomerStats1);

F. Update statistics using a full scan in Azure Synapse Analytics

This example updates the CustomerStats1 statistics by scanning all rows of the Customer table in Azure Synapse Analytics or Analytics Platform System (PDW).

UPDATE STATISTICS Customer (CustomerStats1) WITH FULLSCAN;

G. Update all statistics on a table in Azure Synapse Analytics

This example updates all statistics for the Customer table in Azure Synapse Analytics or Analytics Platform System (PDW).

UPDATE STATISTICS Customer;

H. Using CREATE STATISTICS with AUTO_DROP

To create or update statistics with the auto-drop property enabled, add AUTO_DROP = ON to the WITH clause.

UPDATE STATISTICS Customer (CustomerStats1) WITH AUTO_DROP = ON

Related content

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *