Mastering SQL Server Pivot and Unpivot for Data Transformation

The PIVOT and UNPIVOT relational operators in SQL Server are powerful tools for reshaping data within table-valued expressions. PIVOT transforms unique values from a column into distinct columns in the output, effectively rotating the table. It also performs aggregations on any remaining column values to produce a summarized result. Conversely, UNPIVOT performs the inverse operation, converting columns into rows of values.

For tasks like generating cross-tabulation reports or preparing data for specific applications, understanding and utilizing PIVOT and UNPIVOT can significantly simplify your SQL queries. Instead of resorting to complex SELECT...CASE statement series, these operators offer a more readable and efficient syntax.

Note: While PIVOT and UNPIVOT are incredibly useful, excessive use within a single T-SQL statement can impact query performance. It’s important to use them judiciously and consider alternative approaches for very complex transformations.

The examples in this article are based on the AdventureWorks2022 or AdventureWorksDW2022 sample databases, readily available for download from Microsoft SQL Server Samples.

Understanding the Syntax of PIVOT and UNPIVOT

Let’s break down the syntax for both PIVOT and UNPIVOT operators to understand their structure and components.

PIVOT Syntax:

SELECT [ <non-pivoted column> [ AS <column name> ] , ] ...
       [ <first column pivoted> [ AS <column name> ] ,
         [ <second column pivoted> [ AS <column name> ] , ] ...
         [ <last column pivoted> [ AS <column name> ] ] ]
FROM   ( <table-valued expression> ) AS <alias for the source query>
PIVOT  ( <aggregation function>(<value column>)
         FOR <pivot column>
         IN ( [ <first column> ], [ <second column> ],
              [ <last column> ] ) ) AS <alias for the pivot table>
[ ORDER BY <column name> ];

UNPIVOT Syntax:

SELECT [ <non-pivoted column> [ AS <column name> ] , ] ...
       <pivot column name> [ AS <column name> ],
       <value column name> [ AS <column name> ]
FROM   ( <table-valued expression> ) AS <alias for the source query>
UNPIVOT
       ( <value column name>
         FOR <pivot column name>
         IN ( [ <first pivoted column> ], [ <second pivoted column> ],
              [ <last pivoted column> ] ) ) AS <alias for the unpivot table>
[ ORDER BY <column name> ];

Key Considerations for UNPIVOT Column Identifiers:

It’s important to note that column identifiers within the UNPIVOT clause adhere to the catalog collation.

  • In Azure SQL Database, the collation is consistently SQL_Latin1_General_CP1_CI_AS.
  • For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC.

When combining these columns with others, you might need to use a COLLATE DATABASE_DEFAULT clause to prevent collation conflicts and ensure consistent behavior.

Also, in Microsoft Fabric and Azure Synapse Analytics pools, using PIVOT with a GROUP BY clause on the nonpivot column output might lead to query failures. A workaround is to remove the redundant GROUP BY clause, as it often doesn’t affect the query results.

Basic PIVOT Operation: Transforming Rows to Columns

Let’s start with a simple example to illustrate how PIVOT works. Consider the Production.Product table in the AdventureWorks2022 database. We want to find the average standard cost for products based on their DaysToManufacture.

First, let’s execute a basic GROUP BY query to see the initial data:

USE AdventureWorks2022;
GO
SELECT DaysToManufacture, AVG(StandardCost) AS AverageCost
FROM Production.Product
GROUP BY DaysToManufacture;

This query yields a table with two columns and four rows, showing the average cost for each distinct DaysToManufacture value.

DaysToManufacture AverageCost
------------------ ------------
0                  5.0885
1                  223.88
2                  359.1082
4                  949.4105

Notice that there are no products with DaysToManufacture equal to 3.

Now, let’s use PIVOT to transform the DaysToManufacture values into column headers.

-- Pivot table to transform DaysToManufacture values into columns
SELECT 'AverageCost' AS CostSortedByProductionDays,
    [0], [1], [2], [3], [4]
FROM
(
    SELECT DaysToManufacture, StandardCost
    FROM Production.Product
) AS SourceTable
PIVOT
(
    AVG(StandardCost)
    FOR DaysToManufacture
    IN ([0], [1], [2], [3], [4])
) AS PivotTable;

In this PIVOT query:

  • We select a descriptive column 'AverageCost' and specify the desired pivoted columns: [0], [1], [2], [3], [4], representing the DaysToManufacture values.
  • The SourceTable subquery retrieves the DaysToManufacture and StandardCost from Production.Product.
  • The PIVOT operator then aggregates the StandardCost using AVG() for each DaysToManufacture value, specified in the IN clause.

Here’s the pivoted result:

CostSortedByProductionDays 0         1         2         3         4
--------------------------- ----------- ----------- ----------- ----------- -----------
AverageCost               5.0885      223.88      359.1082    NULL      949.4105

As you can see, the DaysToManufacture values (0, 1, 2, 3, 4) are now column headers, and the corresponding average costs are displayed under each column. Even though there were no products with DaysToManufacture = 3, a column [3] is still included, with a NULL value indicating no data for that category.

Advanced PIVOT for Cross-Tabulation Reports

PIVOT is particularly useful for creating cross-tabulation reports that summarize data in a matrix format. Let’s consider a more complex scenario using the PurchaseOrderHeader table in AdventureWorks2022. Suppose we want to analyze the number of purchase orders placed by different employees, categorized by vendor.

The following query generates a report showing the count of purchase orders for specific employees, broken down by VendorID.

USE AdventureWorks2022;
GO
SELECT VendorID, [250] AS Emp1, [251] AS Emp2, [256] AS Emp3, [257] AS Emp4, [260] AS Emp5
FROM
(
    SELECT PurchaseOrderID, EmployeeID, VendorID
    FROM Purchasing.PurchaseOrderHeader
) AS PurchaseOrders
PIVOT
(
    COUNT(PurchaseOrderID)
    FOR EmployeeID
    IN ([250], [251], [256], [257], [260])
) AS PivotTable
ORDER BY PivotTable.VendorID;

In this query:

  • We aim to pivot the results based on EmployeeID to see purchase order counts per employee and vendor.
  • The subquery PurchaseOrders selects PurchaseOrderID, EmployeeID, and VendorID from Purchasing.PurchaseOrderHeader.
  • The PIVOT clause aggregates PurchaseOrderID using COUNT() for each EmployeeID listed in the IN clause ([250], [251], [256], [257], [260]).

Here’s a snippet of the resulting cross-tabulation report:

VendorID    Emp1        Emp2        Emp3        Emp4        Emp5
----------- ----------- ----------- ----------- ----------- -----------
1492        2           5           4           4           4
1494        2           5           4           5           4
1496        2           4           4           5           5
1498        2           5           4           4           4
1500        3           4           4           5           4
...

This output clearly shows the number of purchase orders (counts) placed by each employee (Emp1 to Emp5, corresponding to EmployeeIDs 250, 251, 256, 257, and 260) for each VendorID. This is a typical cross-tabulation, summarizing data across two dimensions.

Important: When using aggregate functions with PIVOT, any NULL values in the value column (in this case, PurchaseOrderID) are not considered in the aggregation. This is a crucial point to remember when interpreting pivoted results.

UNPIVOT: Transforming Columns Back to Rows

UNPIVOT is the reverse of PIVOT. It rotates columns back into rows. Let’s take the pivoted table pvt from the previous example and transform the employee columns (Emp1, Emp2, Emp3, Emp4, Emp5) back into rows. We’ll create a table pvt mimicking the pivoted output and then unpivot it.

-- Create a sample pivoted table
CREATE TABLE pvt (
    VendorID INT,
    Emp1 INT,
    Emp2 INT,
    Emp3 INT,
    Emp4 INT,
    Emp5 INT
);
GO
INSERT INTO pvt VALUES (1, 4, 3, 5, 4, 4);
INSERT INTO pvt VALUES (2, 4, 1, 5, 5, 5);
INSERT INTO pvt VALUES (3, 4, 3, 5, 4, 4);
INSERT INTO pvt VALUES (4, 4, 2, 5, 5, 4);
INSERT INTO pvt VALUES (5, 5, 1, 5, 5, 5);
GO

-- Unpivot the table
SELECT VendorID, Employee, Orders
FROM
(
    SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
    FROM pvt
) AS p
UNPIVOT
(
    Orders
    FOR Employee IN (Emp1, Emp2, Emp3, Emp4, Emp5)
) AS unpvt;
GO

In this UNPIVOT query:

  • We want to unpivot the columns Emp1, Emp2, Emp3, Emp4, and Emp5.
  • The UNPIVOT operator takes the values from these columns and puts them into a new column named Orders (the value_column).
  • It also creates a new column named Employee (the pivot_column) to store the original column names (Emp1, Emp2, etc.).

Here’s a partial result set after unpivoting:

VendorID    Employee    Orders
----------- ----------- ------
1           Emp1        4
1           Emp2        3
1           Emp3        5
1           Emp4        4
1           Emp5        4
2           Emp1        4
2           Emp2        1
2           Emp3        5
2           Emp4        5
2           Emp5        5
...

The result is a table where each employee column is transformed into rows, making it easier to analyze data in a normalized format.

Important: UNPIVOT is not a perfect reversal of PIVOT. PIVOT involves aggregation, potentially merging multiple rows into one. UNPIVOT cannot reconstruct the original table-valued expression exactly, especially if aggregations were involved. Also, NULL values in the input of UNPIVOT will disappear in the output.

Real-world Example: Sales Data Analysis

The Sales.vSalesPersonSalesByFiscalYears view in AdventureWorks2022 itself uses PIVOT to display sales data. It shows total sales for each salesperson across different fiscal years. You can examine the view’s script in SQL Server Management Studio to see a practical application of PIVOT in a reporting context. In Object Explorer, navigate to AdventureWorks2022 database, then Views, and find Sales.vSalesPersonSalesByFiscalYears. Right-click and select Script View as to inspect its definition.

Conclusion

PIVOT and UNPIVOT are essential SQL Server operators for data transformation. PIVOT allows you to rotate rows into columns, creating summarized, cross-tabular reports, while UNPIVOT does the opposite, converting columns into rows for normalized data views. Understanding and effectively using these operators can significantly enhance your ability to manipulate and analyze data in SQL Server. By leveraging these tools, you can write more concise, readable, and powerful SQL queries for various data transformation needs.

Related Content

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *