Mastering SQL Server Common Table Expressions (CTEs): A Comprehensive Guide

Common Table Expressions (CTEs) in SQL Server provide a powerful way to simplify complex queries and enhance readability. Considered as temporary named result sets, CTEs are derived from simple queries and exist within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or MERGE statement. They can also be integral to CREATE VIEW statements. A key feature of CTEs is their ability to reference themselves, forming what are known as recursive CTEs, which are particularly useful for handling hierarchical data.

Understanding CTE Syntax in SQL Server

The foundation of using CTEs lies in understanding their syntax. Here’s the basic structure:

[ WITH <common_table_expression> [ ,...n ] ]
<common_table_expression>::=
    expression_name [ ( column_name [ ,...n ] ) ]
    AS
    ( CTE_query_definition )

Let’s break down the components:

  • WITH <common_table_expression>: This clause initiates the definition of one or more CTEs. Multiple CTEs can be defined in a single WITH clause, separated by commas.
  • <common_table_expression>::=: This defines the structure of a single CTE.
  • expression_name: This is the identifier or name you assign to your CTE. It must be unique within the WITH clause but can reuse names of existing base tables or views; in such cases, the CTE takes precedence within the query’s scope.
  • ( column_name [ ,...n ] ): Optionally, you can specify column names for the CTE. If omitted, column names are derived from the CTE_query_definition. Ensure the number of specified column names matches the columns returned by the CTE query. Duplicate column names within a CTE definition are not allowed.
  • AS ( CTE_query_definition ): This is where you define the query that generates the CTE’s result set. This SELECT statement adheres to view creation rules, with the notable exception that a CTE cannot define another nested CTE directly. When defining multiple CTEs, their query definitions must be combined using set operators like UNION ALL, UNION, EXCEPT, or INTERSECT.

Guidelines for Non-Recursive CTEs

Non-recursive CTEs are foundational for simplifying queries. Here are key guidelines for their creation and usage:

  • Execution Scope: A CTE must be immediately followed by a SELECT, INSERT, UPDATE, or DELETE statement that utilizes the CTE’s columns. CTEs can also be defined within a CREATE VIEW statement.
  • Multiple Definitions: You can define several CTEs within a single WITH clause, combining them with UNION ALL, UNION, INTERSECT, or EXCEPT.
  • Referencing: CTEs can reference themselves and previously defined CTEs within the same WITH clause, but forward referencing is not permitted.
  • Single WITH Clause: Nested WITH clauses are not allowed. If a CTE_query_definition contains a subquery, that subquery cannot include another WITH clause defining a nested CTE.
  • Restricted Clauses: The CTE_query_definition cannot contain:
    • ORDER BY (unless used with TOP)
    • INTO
    • OPTION clause with query hints
    • FOR BROWSE
  • Semicolon Requirement: When using CTEs in a batch, ensure the statement preceding the WITH clause is terminated with a semicolon (;).
  • Cursor Definition: Queries using CTEs can define cursors.
  • Remote Tables: CTEs can reference tables located on remote servers.
  • Hint Conflicts: Hints referencing CTEs can conflict with hints discovered when the CTE accesses underlying tables, similar to views, potentially leading to query errors.

Diving into Recursive CTEs

Recursive CTEs are designed to handle hierarchical or recursive data structures, such as organizational charts or bill of materials. They achieve this by repeatedly executing a query until a condition is met.

Here are the specific guidelines for defining recursive CTEs:

  • Anchor and Recursive Members: A recursive CTE must have at least two query definitions: an anchor member and a recursive member. Anchor members (one or more) come first and establish the base case of the recursion. Recursive members then reference the CTE itself to iterate.
  • Set Operators: Anchor members are combined using UNION ALL, UNION, INTERSECT, or EXCEPT. Crucially, UNION ALL is the only allowed operator between the last anchor member and the first recursive member, and also between multiple recursive members.
  • Column Consistency: Anchor and recursive members must have the same number of columns.
  • Data Type Compatibility: Corresponding columns in the anchor and recursive members must have compatible data types.
  • Single Self-Reference: The FROM clause in a recursive member must reference the CTE expression_name only once.
  • Recursive Member Restrictions: The CTE_query_definition of a recursive member cannot contain:
    • SELECT DISTINCT
    • GROUP BY
    • PIVOT (Compatibility level 110+)
    • HAVING
    • Scalar aggregation functions
    • TOP
    • LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed)
    • Subqueries
    • Hints applied to the recursive CTE reference within its definition.

And here are guidelines for using recursive CTEs:

  • Nullability: All columns returned by a recursive CTE are nullable, irrespective of the nullability of columns from the constituent SELECT statements.
  • Infinite Loops: Incorrectly formed recursive CTEs can lead to infinite loops. This can occur if the recursive member query definition returns the same values for parent and child columns.
    • MAXRECURSION Hint: To prevent infinite loops, use the MAXRECURSION hint in the OPTION clause of the INSERT, UPDATE, DELETE, or SELECT statement following the CTE. Set a limit between 0 and 32,767 to control recursion levels and halt execution if needed. The server default is 100; 0 means no limit. Only one MAXRECURSION hint can be used per statement.
  • View Updates: Views containing recursive CTEs cannot be used for data updates.
  • Cursor Types: Cursors defined on queries using recursive CTEs are restricted to fast forward-only and static (snapshot) types. Other cursor types will be implicitly converted to static.
  • Remote Server References: Remote tables can be referenced in recursive CTEs. If a remote table is in the recursive member, a spool is created for local repeated access. In query plans, look for “Index Spool/Lazy Spools” with the WITH STACK predicate to confirm recursion.
  • Analytic and Aggregate Functions: Analytic and aggregate functions within the recursive part of a CTE operate on the current recursion level’s set, not the entire CTE result set. Functions like ROW_NUMBER() apply only to the data subset passed in the current recursion level.

Practical Examples of CTEs in SQL Server

Let’s explore some examples to illustrate the use of CTEs.

Example A: Basic Non-Recursive CTE for Sales Data

This example demonstrates a simple CTE to calculate the total sales orders per year for each sales representative.

-- Define the CTE expression name and column list.
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear) AS
(
    -- Define the CTE query.
    SELECT
        SalesPersonID,
        SalesOrderID,
        YEAR(OrderDate) AS SalesYear
    FROM
        Sales.SalesOrderHeader
    WHERE
        SalesPersonID IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT
    SalesPersonID,
    COUNT(SalesOrderID) AS TotalSales,
    SalesYear
FROM
    Sales_CTE
GROUP BY
    SalesYear,
    SalesPersonID
ORDER BY
    SalesPersonID,
    SalesYear;

This CTE, Sales_CTE, simplifies the query by first selecting the necessary sales data (SalesPersonID, SalesOrderID, SalesYear) and then the outer query aggregates and presents this data.

Example B: CTE for Calculating Averages

Building on the previous example, this CTE calculates the average number of sales orders per sales representative across all years.

WITH Sales_CTE (SalesPersonID, NumberOfOrders) AS
(
    SELECT
        SalesPersonID,
        COUNT(*)
    FROM
        Sales.SalesOrderHeader
    WHERE
        SalesPersonID IS NOT NULL
    GROUP BY
        SalesPersonID
)
SELECT
    AVG(NumberOfOrders) AS "Average Sales Per Person"
FROM
    Sales_CTE;

Here, Sales_CTE first determines the number of orders for each salesperson, and then the main query easily calculates the average from this CTE.

Example C: Multiple CTEs in a Single Query

This example shows how to use two CTEs within one query to compare total sales against sales quotas.

WITH Sales_CTE (SalesPersonID, TotalSales, SalesYear) AS
(
    -- Define the first CTE query.
    SELECT
        SalesPersonID,
        SUM(TotalDue) AS TotalSales,
        YEAR(OrderDate) AS SalesYear
    FROM
        Sales.SalesOrderHeader
    WHERE
        SalesPersonID IS NOT NULL
    GROUP BY
        SalesPersonID,
        YEAR(OrderDate)
)
, -- Use a comma to separate multiple CTE definitions.
Sales_Quota_CTE (BusinessEntityID, SalesQuota, SalesQuotaYear) AS
(
    -- Define the second CTE query, which returns sales quota data by year for each sales person.
    SELECT
        BusinessEntityID,
        SUM(SalesQuota)AS SalesQuota,
        YEAR(QuotaDate) AS SalesQuotaYear
    FROM
        Sales.SalesPersonQuotaHistory
    GROUP BY
        BusinessEntityID,
        YEAR(QuotaDate)
)
-- Define the outer query by referencing columns from both CTEs.
SELECT
    s.SalesPersonID,
    s.SalesYear,
    FORMAT(s.TotalSales,'C','en-us') AS TotalSales,
    sq.SalesQuotaYear,
    FORMAT (sq.SalesQuota,'C','en-us') AS SalesQuota,
    FORMAT (s.TotalSales - sq.SalesQuota, 'C','en-us') AS Amt_Above_or_Below_Quota
FROM
    Sales_CTE s
JOIN
    Sales_Quota_CTE sq ON sq.BusinessEntityID = s.SalesPersonID AND s.SalesYear = sq.SalesQuotaYear
ORDER BY
    s.SalesPersonID,
    s.SalesYear;

Partial result set showing sales performance against quota.

This example uses Sales_CTE to get sales totals and Sales_Quota_CTE for quota data, then joins them to produce a comparative report.

Example D: Recursive CTE for Hierarchical Data

This example demonstrates a recursive CTE to display an organizational hierarchy.

-- Create an Employee table.
CREATE TABLE dbo.MyEmployees (
    EmployeeID SMALLINT NOT NULL,
    FirstName NVARCHAR(30) NOT NULL,
    LastName NVARCHAR(40) NOT NULL,
    Title NVARCHAR(50) NOT NULL,
    DeptID SMALLINT NOT NULL,
    ManagerID SMALLINT NULL,
    CONSTRAINT PK_EmployeeID PRIMARY KEY CLUSTERED (EmployeeID ASC),
    CONSTRAINT FK_MyEmployees_ManagerID_EmployeeID FOREIGN KEY (ManagerID) REFERENCES dbo.MyEmployees (EmployeeID)
);
-- Populate the table with values.
INSERT INTO dbo.MyEmployees
VALUES
    (1, N'Ken', N'Sánchez', N'Chief Executive Officer',16, NULL)
    ,(273, N'Brian', N'Welcker', N'Vice President of Sales', 3, 1)
    ,(274, N'Stephen', N'Jiang', N'North American Sales Manager', 3, 273)
    ,(275, N'Michael', N'Blythe', N'Sales Representative', 3, 274)
    ,(276, N'Linda', N'Mitchell', N'Sales Representative', 3, 274)
    ,(285, N'Syed', N'Abbas', N'Pacific Sales Manager', 3, 273)
    ,(286, N'Lynn', N'Tsoflias', N'Sales Representative', 3, 285)
    ,(16, N'David', N'Bradley', N'Marketing Manager', 4, 273)
    ,(23, N'Mary', N'Gibson', N'Marketing Specialist', 4, 16);
WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
    -- Anchor member: Select top-level managers (no ManagerID).
    SELECT
        ManagerID,
        EmployeeID,
        Title,
        0 AS EmployeeLevel
    FROM
        dbo.MyEmployees
    WHERE
        ManagerID IS NULL
    UNION ALL
    -- Recursive member: Join employees to their managers in the CTE.
    SELECT
        e.ManagerID,
        e.EmployeeID,
        e.Title,
        EmployeeLevel + 1
    FROM
        dbo.MyEmployees AS e
    INNER JOIN
        DirectReports AS d ON e.ManagerID = d.EmployeeID
)
-- Select from the CTE to display the hierarchy.
SELECT
    ManagerID,
    EmployeeID,
    Title,
    EmployeeLevel
FROM
    DirectReports
ORDER BY
    ManagerID;

This recursive CTE, DirectReports, starts with top-level managers (anchor member) and recursively adds employees reporting to each manager (recursive member), building a hierarchical list.

Example E: Recursive CTE for Bill of Materials

This example uses a recursive CTE to explore the components of a product assembly.

USE AdventureWorks2022;
GO
WITH Parts(AssemblyID, ComponentID, PerAssemblyQty, EndDate, ComponentLevel) AS
(
    -- Anchor member: Top-level components for ProductAssemblyID = 800.
    SELECT
        b.ProductAssemblyID,
        b.ComponentID,
        b.PerAssemblyQty,
        b.EndDate,
        0 AS ComponentLevel
    FROM
        Production.BillOfMaterials AS b
    WHERE
        b.ProductAssemblyID = 800
        AND b.EndDate IS NULL
    UNION ALL
    -- Recursive member: Find components of components.
    SELECT
        bom.ProductAssemblyID,
        bom.ComponentID,
        p.PerAssemblyQty,
        bom.EndDate,
        ComponentLevel + 1
    FROM
        Production.BillOfMaterials AS bom
    INNER JOIN
        Parts AS p ON bom.ProductAssemblyID = p.ComponentID
        AND bom.EndDate IS NULL
)
-- Select from the CTE to display the bill of materials.
SELECT
    AssemblyID,
    ComponentID,
    Name,
    PerAssemblyQty,
    EndDate,
    ComponentLevel
FROM
    Parts AS p
INNER JOIN
    Production.Product AS pr ON p.ComponentID = pr.ProductID
ORDER BY
    ComponentLevel,
    AssemblyID,
    ComponentID;

Hierarchical list of product assemblies and components.

The Parts CTE recursively navigates the BillOfMaterials table to list all parts needed for product assembly 800 and their sub-components.

Example F: Updating Data with a Recursive CTE

CTEs can also be used in UPDATE statements. This example modifies the PerAssemblyQty for components of ‘Road-550-W Yellow, 44’ (ProductAssemblyID 800).

USE AdventureWorks2022;
GO
WITH Parts(AssemblyID, ComponentID, PerAssemblyQty, EndDate, ComponentLevel) AS
(
    SELECT
        b.ProductAssemblyID,
        b.ComponentID,
        b.PerAssemblyQty,
        b.EndDate,
        0 AS ComponentLevel
    FROM
        Production.BillOfMaterials AS b
    WHERE
        b.ProductAssemblyID = 800
        AND b.EndDate IS NULL
    UNION ALL
    SELECT
        bom.ProductAssemblyID,
        bom.ComponentID,
        p.PerAssemblyQty,
        bom.EndDate,
        ComponentLevel + 1
    FROM
        Production.BillOfMaterials AS bom
    INNER JOIN
        Parts AS p ON bom.ProductAssemblyID = p.ComponentID
        AND bom.EndDate IS NULL
)
UPDATE Production.BillOfMaterials
SET PerAssemblyQty = c.PerAssemblyQty * 2
FROM
    Production.BillOfMaterials AS c
JOIN
    Parts AS d ON c.ProductAssemblyID = d.AssemblyID
WHERE
    d.ComponentLevel = 0;

This example leverages the Parts CTE to identify the relevant components and then updates their PerAssemblyQty in the Production.BillOfMaterials table.

CTEs in Azure Synapse Analytics and Analytics Platform System (PDW)

In Azure Synapse Analytics and Analytics Platform System (PDW), CTEs have specific features and limitations:

  • Supported Statements: CTEs are supported in SELECT, CREATE VIEW, CREATE TABLE AS SELECT (CTAS), CREATE REMOTE TABLE AS SELECT (CRTAS), and CREATE EXTERNAL TABLE AS SELECT (CETAS) statements.
  • External and Remote Tables: CTEs can reference both remote and external tables.
  • Multiple CTE Definitions: Multiple CTE query definitions are allowed.
  • Usage in DML: CTEs can be followed by SELECT, INSERT, UPDATE, DELETE, or MERGE statements.
  • No Recursive CTEs: Recursive CTEs (CTEs that reference themselves) are not supported in Azure Synapse Analytics and PDW.
  • Single WITH Clause: Nested WITH clauses are disallowed. Subqueries within a CTE cannot contain nested WITH clauses.
  • ORDER BY Restriction: ORDER BY is not permitted in CTE_query_definition unless a TOP clause is also specified.
  • Semicolon Requirement: A semicolon must precede a WITH clause if the CTE is part of a batch and not the first statement.
  • Prepared Statements: CTEs behave like other SELECT statements when used with sp_prepare, but CETAS with CTEs prepared by sp_prepare might exhibit behavior differences compared to SQL Server due to binding implementation. Error detection for incorrect column references in CTEs within CETAS prepared statements might occur during sp_execute rather than sp_prepare.

Conclusion: Leveraging CTEs for Efficient SQL Server Queries

Sql Server Common Table Expressions are invaluable tools for writing cleaner, more understandable, and efficient SQL queries. They simplify complex logic, especially when dealing with hierarchical data or multi-step data transformations. While non-recursive CTEs enhance query structure and readability, recursive CTEs open doors to powerful traversal and manipulation of hierarchical datasets. Understanding and effectively utilizing CTEs is a crucial skill for any SQL Server developer or database professional aiming to write robust and maintainable database queries.

Related Content

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *