Ensuring optimal query performance in SQL Server hinges on the accuracy of query optimizer statistics. Outdated or inaccurate statistics can lead the query optimizer to generate suboptimal execution plans, resulting in slow queries and inefficient resource utilization. The UPDATE STATISTICS
command is your primary tool to refresh these statistics, guiding the query optimizer towards better decisions. This comprehensive guide delves into the intricacies of UPDATE STATISTICS
in SQL Server, providing you with the knowledge to effectively manage and enhance your database performance.
Understanding SQL Server Statistics
Before diving into the specifics of UPDATE STATISTICS
, it’s crucial to grasp the role of statistics in SQL Server. Statistics are data distributions that the query optimizer uses to estimate the number of rows that will satisfy a query predicate. These estimations are fundamental in choosing the most efficient query execution plan.
Imagine a scenario where you’re querying a table with customer order data. The query optimizer uses statistics about the OrderDate
column to estimate how many orders fall within a specific date range. Accurate statistics enable the optimizer to decide whether to use an index seek, a table scan, or other optimization techniques.
Why Update Statistics?
Data within databases is dynamic. As data is inserted, updated, and deleted, the distribution of values in columns changes. Over time, statistics can become outdated, leading the query optimizer to make inaccurate estimations. This can result in:
- Suboptimal Query Plans: The query optimizer might choose inefficient indexes or join algorithms.
- Slow Query Performance: Queries take longer to execute due to inefficient plans.
- Increased Resource Consumption: More CPU, memory, and I/O resources are used than necessary.
Regularly updating statistics ensures that the query optimizer has the most current information about data distribution, enabling it to generate efficient query plans and maintain optimal database performance.
The UPDATE STATISTICS
Command: Syntax and Options
The UPDATE STATISTICS
Transact-SQL statement is used to update query optimization statistics on tables and indexed views. Here’s a breakdown of the syntax and key options:
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name } | ( { index_or_statistics_name } [ ,...n ] )
}
]
[ WITH
[
FULLSCAN [ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
| SAMPLE number { PERCENT | ROWS } [ [ , ] PERSIST_SAMPLE_PERCENT = { ON | OFF } ]
| RESAMPLE [ ON PARTITIONS ( { <partition_number> | <range> } [, ...n] ) ]
| <update_stats_stream_option>
[ ,...n ]
]
[ [ , ] [ ALL | COLUMNS | INDEX ] ]
[ [ , ] NORECOMPUTE ]
[ [ , ] INCREMENTAL = { ON | OFF } ]
[ [ , ] MAXDOP = max_degree_of_parallelism ]
[ [ , ] AUTO_DROP = { ON | OFF } ]
] ;
<update_stats_stream_option> ::=
[ STATS_STREAM = stats_stream ]
[ ROWCOUNT = numeric_constant ]
[ PAGECOUNT = numeric_constant ]
Let’s examine the crucial arguments and options:
1. table_or_indexed_view_name
:
- Specifies the name of the table or indexed view for which you want to update statistics.
2. index_or_statistics_name
:
- Allows you to target specific statistics to update. You can specify:
- The name of an index to update statistics for that index.
- The name of statistics created using
CREATE STATISTICS
. - If omitted, all statistics for the table or indexed view are updated, including auto-created statistics and index statistics.
3. Sampling Options:
-
FULLSCAN
: Calculates statistics by scanning all rows in the table or indexed view. This provides the most accurate statistics but can be resource-intensive on large tables.FULLSCAN
is equivalent toSAMPLE 100 PERCENT
. -
SAMPLE number { PERCENT | ROWS }
: Updates statistics based on a sample of the data.PERCENT
: Specifies the approximate percentage of rows to sample (0 to 100).ROWS
: Specifies the approximate number of rows to sample (0 to the total number of rows).- The query optimizer determines a statistically significant sample size by default, making explicit sampling often unnecessary. However, in cases with highly skewed data distributions, adjusting the sample size might be beneficial.
-
RESAMPLE
: Updates each statistic using its previously used sampling rate. This can trigger a full-table scan if the previous sample rate was based on a full scan (e.g., index statistics).
4. PERSIST_SAMPLE_PERCENT = { ON | OFF }
:
- (Applies to SQL Server 2016 SP1 CU4 and later, Azure SQL Database, Azure SQL Managed Instance)
- When
ON
, subsequent statistics updates (without an explicit sample rate) will use the same sampling percentage. - When
OFF
(default), subsequent updates will revert to default sampling. - This option is useful for maintaining consistent sampling strategies, especially after carefully tuning a sample percentage for specific statistics.
5. ON PARTITIONS ( { <partition_number> | <range> } [, ...n] )
:
- (Applies to SQL Server 2014 and later)
- For partitioned tables, this option forces recalculation of leaf-level statistics for specified partitions and merges them to build global statistics.
WITH RESAMPLE
is required because partition statistics with different sample rates cannot be merged.
6. ALL | COLUMNS | INDEX
:
- Specifies which types of statistics to update:
ALL
(default): Updates all statistics.COLUMNS
: Updates statistics created on columns (single or multi-column).INDEX
: Updates statistics created for indexes.
7. NORECOMPUTE
:
- Disables automatic statistics updates (
AUTO_UPDATE_STATISTICS
) for the specified statistics. - While it updates statistics once, future automatic updates are prevented.
- Use this option cautiously, as it can lead to suboptimal query plans if data changes significantly and statistics are not manually updated. Re-enable automatic updates by running
UPDATE STATISTICS
withoutNORECOMPUTE
or usingsp_autostats
.
8. INCREMENTAL = { ON | OFF }
:
- (Applies to SQL Server 2014 and later)
- For partitioned tables,
ON
recreates statistics as per partition statistics, whileOFF
(default) drops and recomputes the entire statistics tree. - Incremental statistics are not supported for all statistic types (e.g., non-partition-aligned indexes, read-only databases, filtered indexes, views, etc.).
9. MAXDOP = max_degree_of_parallelism
:
- (Applies to SQL Server 2016 SP2 and later, SQL Server 2017 CU3 and later)
- Overrides the
max degree of parallelism
server configuration option for the duration of the statistics update operation. - Limits the number of processors used for parallel statistics operations.
10. AUTO_DROP = { ON | OFF }
:
- **(Applies to SQL Server 2022 and later)**
- When `ON`, statistics objects created by third-party tools will not block schema changes. Instead, these statistics will be dropped automatically when schema changes are needed.
- Auto-created statistics always behave as if `AUTO_DROP = ON`.
Best Practices for Using UPDATE STATISTICS
While UPDATE STATISTICS
is essential, frequent or unnecessary updates can introduce overhead. Here are key guidelines:
-
Automatic vs. Manual Updates:
- SQL Server automatically updates statistics when a significant amount of data modification occurs (based on a threshold).
AUTO_UPDATE_STATISTICS
database option controls this behavior. - For most workloads, automatic updates are sufficient. However, in certain scenarios, manual updates are necessary.
- SQL Server automatically updates statistics when a significant amount of data modification occurs (based on a threshold).
-
When to Manually Update Statistics:
- After Large Data Modifications: If a table undergoes substantial changes (e.g., bulk load, significant updates/deletes), manually updating statistics ensures the query optimizer reflects these changes promptly.
- Before Performance-Critical Periods: For systems with predictable peak loads, updating statistics before these periods can prevent performance degradation due to outdated statistics.
- Troubleshooting Query Performance Issues: If you suspect outdated statistics are contributing to slow queries, manually updating statistics is a crucial troubleshooting step.
- After Index Maintenance: Rebuilding or reorganizing indexes can change data distribution. Updating statistics after index maintenance is recommended.
-
Choosing the Right Sampling Method:
- Default Sampling: In most cases, rely on the query optimizer’s default sampling. It’s designed to be efficient and statistically sound.
FULLSCAN
: ConsiderFULLSCAN
for:- Small to medium-sized tables where the overhead is acceptable.
- Tables with highly skewed data distributions where a full scan might provide significantly more accurate statistics.
- Critical statistics that heavily influence query performance.
SAMPLE
: UseSAMPLE
with a specified percentage or number of rows when:FULLSCAN
is too resource-intensive for large tables.- You need to adjust the sample size for specific statistics, potentially based on testing and analysis.
RESAMPLE
: Can be useful when you want to refresh statistics using the prior sampling method, especially after incremental data changes.
-
Minimize Unnecessary Updates:
- Avoid overly frequent updates, especially on volatile tables. Excessive recompilations due to statistics updates can introduce performance overhead.
- Monitor
AUTO_UPDATE_STATISTICS
behavior and adjust update strategies as needed.
-
Using
sp_updatestats
:- The
sp_updatestats
stored procedure provides a convenient way to update statistics for all user and internal tables in a database. - Use it with caution on large databases during peak hours as it can be resource-intensive. Consider running it during maintenance windows.
EXEC sp_updatestats;
- The
-
Monitoring Statistics Updates:
-
Use the
STATS_DATE()
function to determine the last time statistics were updated:SELECT STATS_DATE(OBJECT_ID('YourTable'), STATS_ID('YourStats'));
-
Query
sys.dm_db_stats_properties
DMV for detailed statistics properties, including sample percentage and modification counter:SELECT sp.stats_id, name, modification_counter, persisted_sample_percent, last_updated FROM sys.stats AS st CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) AS sp WHERE st.object_id = OBJECT_ID('YourTable');
-
-
Leverage Automated Solutions:
- Consider using solutions like Adaptive Index Defrag or other database maintenance tools to automate index defragmentation and statistics updates based on predefined thresholds and best practices. This can streamline maintenance tasks and ensure consistent database performance.
Examples of UPDATE STATISTICS
in Action
Example 1: Update all statistics on a table
USE AdventureWorks2022;
GO
UPDATE STATISTICS Sales.SalesOrderDetail;
GO
Example 2: Update statistics for a specific index
USE AdventureWorks2022;
GO
UPDATE STATISTICS Sales.SalesOrderDetail AK_SalesOrderDetail_rowguid;
GO
Example 3: Update statistics with 50 percent sampling
USE AdventureWorks2022;
GO
CREATE STATISTICS Products ON Production.Product ([Name], ProductNumber) WITH SAMPLE 50 PERCENT;
GO
-- Time passes, data changes...
UPDATE STATISTICS Production.Product(Products) WITH SAMPLE 50 PERCENT;
GO
Example 4: Update statistics with FULLSCAN
and NORECOMPUTE
USE AdventureWorks2022;
GO
UPDATE STATISTICS Production.Product(Products) WITH FULLSCAN, NORECOMPUTE;
GO
Example 5: Update statistics with AUTO_DROP = ON
UPDATE STATISTICS Customer (CustomerStats1) WITH AUTO_DROP = ON
Conclusion
Mastering UPDATE STATISTICS
is crucial for maintaining peak SQL Server performance. By understanding its syntax, options, and best practices, you can ensure that the query optimizer has the accurate statistics it needs to generate efficient query plans. Regularly reviewing your statistics update strategy and adapting it to your specific workload is a key aspect of proactive database administration and performance tuning. By strategically using UPDATE STATISTICS
, you empower your SQL Server to execute queries faster, consume fewer resources, and deliver optimal performance for your applications.