Enhance Query Performance with SQL Server Materialized Views in Azure Synapse Analytics

Applies to: Azure Synapse Analytics

This article delves into the CREATE MATERIALIZED VIEW AS SELECT T-SQL statement within Azure Synapse Analytics, providing a comprehensive guide for solution development. You’ll also find practical code examples to illustrate its usage.

Materialized Views in Azure Synapse Analytics are designed to boost the performance of complex queries, especially those involving joins and aggregations. They achieve this by persistently storing the data derived from the view definition query and automatically updating this data as changes occur in the underlying tables. The beauty of materialized views lies in their execution plan automatching capability. This means the query optimizer can consider a materialized view for substitution even if it’s not explicitly referenced in your query. This powerful feature allows data engineers to significantly improve query response times without necessitating any modifications to existing queries.

Transact-SQL syntax conventions

Syntax

CREATE MATERIALIZED VIEW [ schema_name. ] materialized_view_name WITH ( <distribution_option> ) AS <select_statement> [;]

<distribution_option> ::= {
    DISTRIBUTION = HASH ( distribution_column_name )
  | DISTRIBUTION = HASH ( [distribution_column_name [, ...n]] )
  | DISTRIBUTION = ROUND_ROBIN
}

<select_statement> ::=
    SELECT select_criteria
</select_statement>

Note: This syntax is not supported by serverless SQL pool in Azure Synapse Analytics.

Arguments

schema_name

Specifies the name of the schema to which the materialized view will be associated.

materialized_view_name

Defines the name of the materialized view. Remember that view names must adhere to standard identifier rules. While specifying the view owner name is optional, it’s good practice for clarity.

distribution option

When creating a materialized view, you can choose between HASH and ROUND_ROBIN distribution methods. For a deeper understanding of distribution options, refer to CREATE TABLE Table distribution options. To get recommendations on the optimal distribution strategy for your tables based on usage patterns or sample queries, explore the Distribution Advisor in Azure Synapse SQL.

DISTRIBUTION = HASH ( distribution_column_name )
Distributes the rows of the materialized view based on the hash values of a single designated column.

DISTRIBUTION = HASH ( [distribution_column_name [, ...n]] )
Distributes rows based on the hash values derived from up to eight columns. This multi-column distribution approach can lead to a more balanced data distribution within the materialized view, minimizing data skew and ultimately enhancing query performance over time.

Note:

  • To activate the Multi-Column Distribution feature, you need to adjust the database’s compatibility level to 50 using the following command. For detailed instructions on setting the database compatibility level, see ALTER DATABASE SCOPED CONFIGURATION. Example: ALTER DATABASE SCOPED CONFIGURATION SET DW_COMPATIBILITY_LEVEL = 50;
  • To deactivate MCD, revert the database’s compatibility level back to AUTO by running: ALTER DATABASE SCOPED CONFIGURATION SET DW_COMPATIBILITY_LEVEL = AUTO;. Existing MCD materialized views will remain but become inaccessible. To regain access, you’ll need to re-enable the feature.

select_statement

The SELECT list within the materialized view definition must satisfy at least one of the following criteria:

  • Aggregate Function Inclusion: The SELECT list must include at least one aggregate function.
  • GROUP BY Clause with All Columns Selected: If a GROUP BY clause is used in the materialized view definition, then all columns specified in the GROUP BY clause must also be included in the SELECT list. You can use up to 32 columns in the GROUP BY clause.

Aggregate functions are fundamental to materialized view definitions. Supported aggregate functions are: MAX, MIN, AVG, COUNT, COUNT_BIG, SUM, VAR, STDEV.

When utilizing MIN or MAX aggregate functions in the SELECT list of your materialized view definition, these specific requirements apply:

  • FOR_APPEND Clause is Mandatory: You must include the FOR_APPEND option when creating the materialized view. For example:
CREATE MATERIALIZED VIEW mv_test2 WITH (distribution = hash(i_category_id), FOR_APPEND) AS
SELECT MAX(i.i_rec_start_date) as max_i_rec_start_date, MIN(i.i_rec_end_date) as min_i_rec_end_date,
       i.i_item_sk, i.i_item_id, i.i_category_id
FROM syntheticworkload.item i
GROUP BY i.i_item_sk, i.i_item_id, i.i_category_id;
  • Disabled on UPDATE or DELETE for Base Tables: The materialized view will be automatically disabled if an UPDATE or DELETE operation occurs on any of the referenced base tables. This restriction does not apply to INSERT operations. To re-enable a disabled materialized view, you need to execute ALTER MATERIALIZED VIEW with the REBUILD option.

Remarks

A materialized view in Azure Synapse Analytics shares similarities with an indexed view in SQL Server. It adheres to almost the same set of restrictions as indexed views (refer to Create Indexed Views for comprehensive details), with the key distinction being that materialized views support aggregate functions.

Note:

Although CREATE MATERIALIZED VIEW does not directly support COUNT, DISTINCT, COUNT(DISTINCT expression), or COUNT_BIG (DISTINCT expression), queries employing these functions can still experience performance gains from materialized views. This is because the Synapse SQL optimizer is intelligent enough to automatically rewrite these aggregations in user queries to leverage existing materialized views. See the examples section for a practical demonstration.

APPROX_COUNT_DISTINCT is not a supported aggregate function within CREATE MATERIALIZED VIEW AS SELECT.

Only CLUSTERED COLUMNSTORE INDEX is supported for materialized views.

A materialized view cannot be built upon other views; it must directly reference base tables.

Creating a materialized view on a table with dynamic data masking (DDM) is not permitted, even if the DDM-protected column is not part of the materialized view definition. Conversely, if a table column is part of an active or disabled materialized view, you cannot add DDM to that column.

Similarly, materialized views cannot be created on tables that have row-level security enabled.

Materialized Views are compatible with partitioned tables. Partition SPLIT and MERGE operations are supported on the base tables of materialized views, but partition SWITCH is not supported.

ALTER TABLE SWITCH is also not supported for tables that are referenced by materialized views. You must disable or drop any dependent materialized views before attempting to use ALTER TABLE SWITCH.

In certain scenarios, the materialized view creation process automatically adds new columns to the materialized view definition:

Scenario New columns added to materialized view Comment
COUNT_BIG() is absent from the SELECT list in the materialized view definition COUNT_BIG (*) Automatically added during materialized view creation. No user intervention is needed.
SUM(a) is specified in the SELECT list and ‘a’ is a nullable expression COUNT_BIG (a) Users are required to manually include the expression ‘a’ in the materialized view definition.
AVG(a) is specified in the SELECT list where ‘a’ is an expression. SUM(a), COUNT_BIG(a) Automatically added during materialized view creation. No user action is required.
STDEV(a) is specified in the SELECT list where ‘a’ is an expression. SUM(a), COUNT_BIG(a), SUM(square(a)) Automatically added during materialized view creation. No user action is required.

Once created, materialized views become visible within SQL Server Management Studio under the “Views” folder of your Azure Synapse Analytics instance.

You can utilize SP_SPACEUSED and DBCC PDW_SHOWSPACEUSED to assess the storage space consumed by a materialized view. Furthermore, Dynamic Management Views (DMVs) offer more customizable queries for detailed insights into space and row consumption. For more information, consult Table size queries.

Materialized views can be removed using DROP VIEW. You can also use ALTER MATERIALIZED VIEW to disable or rebuild an existing materialized view.

Materialized views function as an automatic query optimization tool. Users do not need to directly query a materialized view. When a user query is submitted, the engine performs a permission check against the objects involved in the query (tables and regular views). If the user lacks the necessary permissions, the query will fail without execution. If permissions are validated, the optimizer automatically considers and utilizes any suitable materialized views to accelerate query execution. Crucially, users receive the same data results whether the query is processed using base tables or a materialized view.

The EXPLAIN plan and the graphical Estimated Execution Plan in SQL Server Management Studio can reveal whether the query optimizer is considering a materialized view for query execution.

To determine if a specific SQL statement could benefit from a new materialized view, execute the EXPLAIN command with the WITH_RECOMMENDATIONS option. For comprehensive details, refer to EXPLAIN (Transact-SQL).

Ownership

  • A materialized view can only be created if the owners of the underlying base tables and the materialized view being created are identical.
  • Materialized views and their base tables are permitted to reside in different schemas. Upon creation, the schema owner of the materialized view automatically becomes the owner of the materialized view itself, and this ownership cannot be subsequently altered.

Permissions

To create a materialized view, a user must possess the following permissions, in addition to meeting the object ownership requirements:

  1. CREATE VIEW permission within the database.
  2. SELECT permission on all base tables referenced by the materialized view.
  3. REFERENCES permission on the schema containing the base tables.
  4. ALTER permission on the schema where the materialized view is to be created.

Example

A. Demonstrating Automatic Materialized View Usage for Query Optimization

This example showcases how the Synapse SQL optimizer automatically leverages materialized views to enhance query performance, even when the query utilizes functions not directly supported in CREATE MATERIALIZED VIEW, such as COUNT(DISTINCT expression). A query that initially took several seconds to execute now completes in under a second without any modifications to the original user query.

-- Create a table with ~536 million rows
create table t(a int not null, b int not null, c int not null)
with (distribution=hash(a), clustered columnstore index);
insert into t values(1,1,1);
declare @p int =1;
while (@P <= 10)
begin
    insert into t select a+1, b+2, c+3 from t;
    set @p +=1;
end;

-- Create materialized view with SUM aggregate
create materialized view v1 with(distribution=hash(a)) as select a, sum(b) as sb from t group by a;

-- Query leveraging COUNT(DISTINCT expression) and benefiting from the materialized view
select count(distinct a) from t;

B. Materialized View Creation with Different User and Schema Ownership

In this example, User2 creates a materialized view within SchemaY, based on tables owned by User1 in SchemaX. The materialized view itself will be owned by User1 (the schema owner of SchemaY).

/****************************************************************
Setup:
SchemaX owner = DBO
SchemaX.T1 owner = User1
SchemaX.T2 owner = User1
SchemaY owner = User1
*****************************************************************/
CREATE USER User1 WITHOUT LOGIN;
CREATE USER User2 WITHOUT LOGIN;
GO
CREATE SCHEMA SchemaX;
GO
CREATE SCHEMA SchemaY AUTHORIZATION User1;
GO
CREATE TABLE [SchemaX].[T1] (
    [vendorID] [varchar](255) Not NULL,
    [totalAmount] [float] Not NULL,
    [puYear] [int] NULL
);
CREATE TABLE [SchemaX].[T2] (
    [vendorID] [varchar](255) Not NULL,
    [totalAmount] [float] Not NULL,
    [puYear] [int] NULL
);
GO
ALTER AUTHORIZATION ON OBJECT::SchemaX.[T1] TO User1;
ALTER AUTHORIZATION ON OBJECT::SchemaX.[T2] TO User1;

/******************************************************************************
For user2 to create a MV in SchemaY on SchemaX.T1 and SchemaX.T2, user2 needs:
1. CREATE VIEW permission in the database
2. REFERENCES permission on the schema1
3. SELECT permission on base table T1, T2
4. ALTER permission on SchemaY
******************************************************************************/
GRANT CREATE VIEW to User2;
GRANT REFERENCES ON SCHEMA::SchemaX to User2;
GRANT SELECT ON OBJECT::SchemaX.T1 to User2;
GRANT SELECT ON OBJECT::SchemaX.T2 to User2;
GRANT ALTER ON SCHEMA::SchemaY to User2;
GO

EXECUTE AS USER = 'User2';
GO
CREATE materialized VIEW [SchemaY].MV_by_User2
with(distribution=round_robin)
as
select A.vendorID, sum(A.totalamount) as S, Count_Big(*) as T
from [SchemaX].[T1] A
inner join [SchemaX].[T2] B
on A.vendorID = B.vendorID
group by A.vendorID;
GO
revert;
GO

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *