The MERGE
statement in SQL Server is a powerful Transact-SQL feature that allows you to perform insert, update, and delete operations on a target table based on the results of a join with a source table in a single statement. This capability is particularly useful for tasks such as synchronizing two tables, performing ETL (Extract, Transform, Load) operations, or implementing upsert logic (update if exists, insert if not).
This article provides an in-depth exploration of the MERGE
statement, covering its syntax, arguments, practical applications, performance considerations, and best practices for optimal usage in SQL Server, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, and SQL database in Microsoft Fabric.
Understanding the MERGE Statement Syntax
The MERGE
statement offers a flexible syntax to handle various data manipulation scenarios. Here’s the general syntax for SQL Server and Azure SQL Database:
[ WITH <common_table_expression> [,...n] ]
MERGE [ TOP ( expression ) [ PERCENT ] ]
[ INTO ] <target_table> [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias ]
USING <table_source> [ [ AS ] table_alias ]
ON <merge_search_condition>
[ WHEN MATCHED [ AND <clause_search_condition> ] THEN <merge_matched> ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ] THEN <merge_not_matched> ]
[ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ] THEN <merge_matched> ] [ ...n ]
[ <output_clause> ]
[ OPTION ( <query_hint> [ ,...n ] ) ]
;
<target_table> ::=
{
[ database_name . schema_name . | schema_name . ] [ [ AS ] target_table ]
| @variable [ [ AS ] target_table ]
| common_table_expression_name [ [ AS ] target_table ]
}
<merge_hint>::=
{
{ [ <table_hint_limited> [ ,...n ] ] [ [ , ] { INDEX ( index_val [ ,...n ] ) | INDEX = index_val } ] }
}
<merge_search_condition> ::= <search_condition>
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
<merge_not_matched>::=
{ INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES } }
<clause_search_condition> ::= <search_condition>
For Azure Synapse Analytics, the syntax is slightly different:
[ WITH <common_table_expression> [,...n] ]
MERGE [ INTO ] <target_table> [ [ AS ] table_alias ]
USING <table_source> [ [ AS ] table_alias ]
ON <merge_search_condition>
[ WHEN MATCHED [ AND <clause_search_condition> ] THEN <merge_matched> ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ] THEN <merge_not_matched> ]
[ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ] THEN <merge_matched> ] [ ...n ]
[ OPTION ( <query_hint> [ ,...n ] ) ]
; -- The semi-colon is required, or the query will return a syntax error.
<target_table> ::=
{
[ database_name . schema_name . | schema_name . ] target_table
}
<merge_search_condition> ::= <search_condition>
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
<merge_not_matched>::=
{ INSERT [ ( column_list ) ] VALUES ( values_list ) }
<clause_search_condition> ::= <search_condition>
Key Arguments Explained
Let’s break down the essential components of the MERGE
statement:
-
WITH <common_table_expression>
: Allows you to define a temporary named result set (CTE) for use within theMERGE
statement, enhancing readability and query organization. -
TOP (expression) [PERCENT]
: Limits the number or percentage of rows affected by theMERGE
operation. It’s important to note thatTOP
is applied after the join and filtering, and the affected rows are unordered. -
INTO <target_table>
: Specifies the table to be modified. This is the table where inserts, updates, or deletes will be applied. The target table cannot be a remote table, a memory-optimized table or have rules defined on it. -
USING <table_source>
: Defines the data source used to match rows in the target table. This can be another table, a view, a derived table, or even a remote table. The table source provides the data to compare against and potentially insert or update into the target table. -
ON <merge_search_condition>
: Crucially defines the join condition between thetarget_table
andtable_source
. This condition determines how rows are matched for update or identified for insertion or deletion. Important: Only use columns from the target table in theON
clause for matching purposes. Avoid filtering target table rows within theON
clause as it can lead to incorrect results. -
WHEN MATCHED [ AND <clause_search_condition> ] THEN <merge_matched>
: Specifies the action to take when rows in thetarget_table
match rows from thetable_source
based on theON
condition and optional additional conditions (AND <clause_search_condition>
). The<merge_matched>
action can be eitherUPDATE SET <set_clause>
orDELETE
. You can have up to twoWHEN MATCHED
clauses, with the first requiring anAND
clause. If two are present, one must beUPDATE
and the otherDELETE
. An error occurs if multiple source rows match a single target row when usingUPDATE
. -
WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ] THEN <merge_not_matched>
: Defines the action when rows from thetable_source
do not have a match in thetarget_table
based on theON
condition and optional additional conditions. The<merge_not_matched>
action must beINSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES }
. Only oneWHEN NOT MATCHED [ BY TARGET ]
clause is allowed. -
WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ] THEN <merge_matched>
: Specifies the action for rows in thetarget_table
that do not have a corresponding match in thetable_source
according to theON
condition and optional additional conditions. The<merge_matched>
action can be eitherUPDATE SET <set_clause>
orDELETE
. Similar toWHEN MATCHED
, you can have at most twoWHEN NOT MATCHED BY SOURCE
clauses with specific conditions for two clauses. Important: Only columns from the target table can be referenced in the<clause_search_condition>
. Source table columns cannot be accessed in theWHEN NOT MATCHED BY SOURCE
clause. -
<merge_hint>
: Allows specifying table hints to optimize theMERGE
operation, such asTABLOCK
,INDEX
, orFORCESEEK
.NOLOCK
andREADUNCOMMITTED
hints are not permitted. -
<output_clause>
: Returns information about the rows affected by theMERGE
statement (inserted, updated, or deleted). The$action
column in theOUTPUT
clause indicates the type of operation performed on each row. -
OPTION (<query_hint>)
: Provides query optimizer hints to influence how the Database Engine processes theMERGE
statement.
Practical Applications of MERGE
The MERGE
statement is invaluable in various data management scenarios:
-
Data Synchronization: Keeping two tables in sync is a common requirement.
MERGE
excels at this, allowing you to efficiently update a target table to reflect changes in a source table, inserting new records and updating or deleting existing ones as needed. -
ETL Processes: In Extract, Transform, Load workflows,
MERGE
can streamline the process of updating data warehouses or data marts with new or changed data from source systems. It simplifies the loading of incremental data updates. -
Upsert Operations: Implementing “upsert” logic (update if a row exists, insert if it doesn’t) is straightforward with
MERGE
. This is particularly useful in applications where you receive data that may or may not already exist in your database. -
Historical Data Management: When tracking changes over time,
MERGE
can be used to update current records and insert new records representing changes, effectively maintaining a history of data modifications.
Performance Considerations and Best Practices
While MERGE
offers significant advantages, optimizing its performance is crucial, especially when dealing with large datasets. Consider these best practices:
-
Indexing: Proper indexing is paramount for
MERGE
performance.- Source Table Index: Create an index on the join columns of the source table, ideally unique and covering the join logic.
- Target Table Index: Create an index on the join columns of the target table, preferably a unique clustered index. These indexes facilitate efficient joins and data comparison.
-
Join Optimization:
- Precise
ON
Clause: Ensure theON
clause accurately reflects the matching criteria and only uses columns for comparison. - Filtering in
WHEN
Clauses: Filter rows using conditions within theWHEN MATCHED
,WHEN NOT MATCHED
, orWHEN NOT MATCHED BY SOURCE
clauses instead of theON
clause for better performance and correct results. - Avoid Hash Joins: In general, avoid using
HASH JOIN
query hints withMERGE
as they don’t leverage indexes effectively. Merge joins are often more efficient when indexes are in place.
- Precise
-
Concurrency: Be mindful of concurrency implications, especially in high-volume environments.
MERGE
uses different locking mechanisms compared to separateINSERT
,UPDATE
, andDELETE
statements. For very high concurrency scenarios, discrete DML statements might sometimes offer better performance and less blocking. Thorough testing under production-like load is recommended. -
Parameterization: Unlike individual
INSERT
,UPDATE
, orDELETE
statements,MERGE
does not automatically benefit from simple parameterization. For optimal plan reuse and performance, especially with literal values, consider parameterizing yourMERGE
statements. -
TOP
Clause Usage: UsingTOP
for batch processing large tables can impact I/O performance as it still scans the entire source and target tables in each batch. Ensure subsequent batches target new rows to avoid unintended consequences like duplicate inserts. -
Bulk Loading with
OPENROWSET(BULK...)
: For bulk loading data from files, useOPENROWSET(BULK...)
as the source table. Optimize bulk merge performance by:- Clustering index on target table join columns.
- Disabling non-unique non-clustered indexes during the bulk load and re-enabling them afterward.
- Using
ORDER
andUNIQUE
hints inOPENROWSET(BULK...)
to reflect the data file’s sort order and uniqueness, enabling the query optimizer to generate more efficient plans.
-
Columnstore Indexes: Avoid using tables with columnstore indexes as the target of
MERGE
statements. For tables with columnstore indexes, consider staging updates to a rowstore table and then performing batchedDELETE
andINSERT
operations for better performance.
Troubleshooting Common Issues
-
Error 10713: “MERGE statement requires a semicolon (;) as a statement terminator.” Ensure your
MERGE
statement ends with a semicolon. -
Error “CREATE TABLE failed because column in table exceeds the maximum of 1024 columns.”: This error can occur even if tables don’t have 1024 columns under specific conditions: multiple columns in
DELETE
,UPDATE SET
, orINSERT
; nonclustered indexes on join columns; and hash-distributed target tables (in Azure Synapse Analytics). Workarounds may involve simplifying theMERGE
statement or restructuring indexes. -
Azure Synapse Analytics Inconsistent State (Older Builds): In older Azure Synapse Analytics builds (prior to 10.0.17829.0), using
MERGE
on hash-distributed tables with secondary indexes or unique constraints, or updating distribution key columns, could lead to data being placed in incorrect distributions. Upgrade to version 10.0.17829.0 or later, or use ANSIUPDATE FROM ... JOIN
as a workaround on older versions. Scripts are provided in the original documentation to check for and repair affected tables.
Trigger Behavior with MERGE
MERGE
statements trigger AFTER
triggers defined on the target table for each insert, update, or delete action. The order in which triggers fire for different actions is not guaranteed, but triggers for the same action respect the specified order.
If INSTEAD OF
triggers are defined on the target table for any of the actions performed by MERGE
, then INSTEAD OF
triggers must be defined for all actions in the MERGE
statement. INSTEAD OF UPDATE
or INSTEAD OF DELETE
triggers will prevent the actual update or delete operations, firing the triggers instead. INSTEAD OF INSERT
triggers will similarly prevent the insert operation.
@@ROWCOUNT
within a trigger triggered by a MERGE
statement reflects the total number of rows affected by the MERGE
statement (inserts, updates, and deletes combined), not just the rows affected by the specific action that triggered the trigger.
Permissions Required
Executing a MERGE
statement requires:
SELECT
permission on thetable_source
.INSERT
,UPDATE
, orDELETE
permissions on thetarget_table
, depending on the actions specified in theWHEN
clauses.
Examples of MERGE in Action
Let’s explore practical examples demonstrating the versatility of MERGE
.
Example 1: Upsert Operation
This example demonstrates a common upsert scenario, updating a row if it exists or inserting it if it doesn’t.
-- Create a sample target table
CREATE TABLE dbo.TargetProducts (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Price DECIMAL(10, 2)
);
-- Create a sample source table (or use a table variable/temp table)
CREATE TABLE #SourceProducts (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Price DECIMAL(10, 2)
);
INSERT INTO #SourceProducts (ProductID, ProductName, Price) VALUES
(1, 'Product A', 19.99),
(2, 'Product B', 29.99),
(3, 'Product C', 9.99);
-- MERGE statement for upsert
MERGE dbo.TargetProducts AS target
USING #SourceProducts AS source
ON target.ProductID = source.ProductID
WHEN MATCHED THEN
UPDATE SET
target.ProductName = source.ProductName,
target.Price = source.Price
WHEN NOT MATCHED THEN
INSERT (ProductID, ProductName, Price)
VALUES (source.ProductID, source.ProductName, source.Price);
-- Verify the results
SELECT * FROM dbo.TargetProducts;
-- Cleanup
DROP TABLE dbo.TargetProducts;
DROP TABLE #SourceProducts;
Example 2: Data Synchronization with DELETE
This example synchronizes TargetInventory
with SourceInventory
, updating quantities and deleting products no longer present in the source.
-- Sample Target Inventory Table
CREATE TABLE dbo.TargetInventory (
ProductID INT PRIMARY KEY,
Quantity INT
);
INSERT INTO dbo.TargetInventory (ProductID, Quantity) VALUES
(1, 100),
(2, 50),
(3, 200),
(4, 75);
-- Sample Source Inventory Table
CREATE TABLE #SourceInventory (
ProductID INT PRIMARY KEY,
Quantity INT
);
INSERT INTO #SourceInventory (ProductID, Quantity) VALUES
(1, 120),
(2, 40),
(3, 200); -- Product 4 is missing in source, should be deleted
-- MERGE for synchronization with delete
MERGE dbo.TargetInventory AS target
USING #SourceInventory AS source
ON target.ProductID = source.ProductID
WHEN MATCHED THEN
UPDATE SET target.Quantity = source.Quantity
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
-- Check synchronized TargetInventory
SELECT * FROM dbo.TargetInventory;
-- Cleanup
DROP TABLE dbo.TargetInventory;
DROP TABLE #SourceInventory;
Example 3: Using OUTPUT Clause for Auditing
This example uses the OUTPUT
clause to track changes made by the MERGE
statement for auditing purposes.
-- Sample tables (using TargetProducts from Example 1)
CREATE TABLE dbo.TargetProducts (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Price DECIMAL(10, 2)
);
CREATE TABLE #SourceProducts (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Price DECIMAL(10, 2)
);
INSERT INTO #SourceProducts (ProductID, ProductName, Price) VALUES
(1, 'Product A - Updated', 24.99), -- Update existing product 1
(4, 'Product D', 14.99); -- Insert new product 4
-- Audit table to capture changes
CREATE TABLE dbo.ProductAudit (
AuditID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT,
ProductName VARCHAR(255),
Price DECIMAL(10, 2),
ActionType VARCHAR(20),
AuditDate DATETIME DEFAULT GETDATE()
);
-- MERGE with OUTPUT clause to capture changes
MERGE dbo.TargetProducts AS target
USING #SourceProducts AS source
ON target.ProductID = source.ProductID
WHEN MATCHED THEN
UPDATE SET
target.ProductName = source.ProductName,
target.Price = source.Price
OUTPUT
$action,
DELETED.ProductID,
DELETED.ProductName,
DELETED.Price
INTO dbo.ProductAudit (ActionType, ProductID, ProductName, Price)
WHEN NOT MATCHED THEN
INSERT (ProductID, ProductName, Price)
VALUES (source.ProductID, source.ProductName, source.Price)
OUTPUT
$action,
INSERTED.ProductID,
INSERTED.ProductName,
INSERTED.Price
INTO dbo.ProductAudit (ActionType, ProductID, ProductName, Price);
-- View the audit log and updated TargetProducts
SELECT * FROM dbo.ProductAudit ORDER BY AuditID DESC;
SELECT * FROM dbo.TargetProducts;
-- Cleanup
DROP TABLE dbo.TargetProducts;
DROP TABLE #SourceProducts;
DROP TABLE dbo.ProductAudit;
Conclusion
The SQL Server MERGE
statement is a robust and efficient tool for complex data manipulation tasks. By combining insert, update, and delete operations into a single statement, it simplifies code, enhances performance in many scenarios, and improves data synchronization processes. Understanding its syntax, arguments, performance considerations, and best practices is essential for leveraging its full potential in your SQL Server database solutions. By applying the guidelines and examples presented in this article, you can effectively utilize MERGE
to streamline your data management workflows and build more efficient and maintainable database applications.
For further in-depth information and the most up-to-date details, always refer to the official Microsoft SQL Server documentation.
This article is for informational purposes based on the provided documentation and best practices for SQL Server MERGE statement as of the current knowledge cut-off date.
Please note: While every effort has been made to ensure accuracy, always test and validate SQL code in a non-production environment before deploying to production systems.