Efficiently loading large datasets into SQL Server databases is a critical task for database administrators and developers alike. SQL Server provides a powerful command, BULK INSERT
, designed for high-speed data imports. This article delves into the intricacies of BULK INSERT
, exploring its syntax, options, performance considerations, and best practices for optimized data loading.
Understanding SQL Server Bulk Insert
BULK INSERT
is a Transact-SQL statement that allows you to import data from a data file into a SQL Server table or view. It’s optimized for speed and efficiency, making it ideal for loading large volumes of data quickly. Unlike row-by-row inserts, BULK INSERT
minimizes logging and can bypass certain constraints, significantly accelerating the data loading process.
This command is available in various SQL Server environments, including:
- SQL Server: On-premises installations.
- Azure SQL Database: Microsoft’s cloud-based database service.
- Azure SQL Managed Instance: A fully managed SQL Server instance in Azure.
- Fabric Data Warehouse: Microsoft Fabric’s data warehousing solution.
While the core functionality remains consistent, certain features and options of BULK INSERT
may vary across these platforms, particularly concerning data source locations and authentication methods.
Syntax of BULK INSERT
The basic syntax of the BULK INSERT
statement is as follows:
BULK INSERT
{ database_name.schema_name.table_or_view_name | schema_name.table_or_view_name | table_or_view_name }
FROM 'data_file'
[ WITH
(
[ [ , ] DATA_SOURCE = 'data_source_name' ]
[ [ , ] FORMAT = 'CSV' ]
[ [ , ] FIELDQUOTE = 'quote_characters']
[ [ , ] CODEPAGE = { 'RAW' | 'code_page' | 'ACP' | 'OEM' } ]
[ [ , ] DATAFILETYPE = { 'char' | 'native' | 'widechar' | 'widenative' } ]
[ [ , ] ROWTERMINATOR = 'row_terminator' ]
[ [ , ] FIELDTERMINATOR = 'field_terminator' ]
[ [ , ] FIRSTROW = first_row ]
[ [ , ] LASTROW = last_row ]
[ [ , ] FORMATFILE = 'format_file_path' ]
[ [ , ] FORMATFILE_DATA_SOURCE = 'data_source_name' ]
[ [ , ] MAXERRORS = max_errors ]
[ [ , ] ERRORFILE = 'file_name' ]
[ [ , ] ERRORFILE_DATA_SOURCE = 'errorfile_data_source_name' ]
[ [ , ] KEEPIDENTITY ]
[ [ , ] KEEPNULLS ]
[ [ , ] FIRE_TRIGGERS ]
[ [ , ] CHECK_CONSTRAINTS ]
[ [ , ] TABLOCK ]
[ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ]
[ [ , ] ROWS_PER_BATCH = rows_per_batch ]
[ [ , ] KILOBYTES_PER_BATCH = kilobytes_per_batch ]
[ [ , ] BATCHSIZE = batch_size ]
)
]
Let’s break down the key arguments and options:
Target Table Specification
{ database_name.schema_name.table_or_view_name | schema_name.table_or_view_name | table_or_view_name }
: Specifies the target table or view for the bulk import. You can optionally include the database and schema names if they are different from the current context. Note that views must be simple views where all columns refer to a single base table.
Data Source
-
FROM 'data_file'
: Defines the full path to the data file.- SQL Server: Can be a local path, network path (UNC), or Azure Blob Storage path (starting with SQL Server 2017).
- Azure SQL Database & Managed Instance: Must be an Azure Blob Storage path.
- Fabric Data Warehouse: Must be an Azure Blob Storage path, supporting wildcard characters in the path for flexible file selection.
-- Example: Local file path BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.csv'; -- Example: UNC path BULK INSERT MyDatabase.dbo.MyTable FROM '\ServerNameShareNameDatadata.csv'; -- Example: Azure Blob Storage path (SQL Server 2017+) BULK INSERT MyDatabase.dbo.MyTable FROM 'yourcontainer/data.csv' WITH (DATA_SOURCE = 'MyAzureDataSource'); -- Example: Azure Blob Storage path (Fabric Data Warehouse with wildcard) BULK INSERT MyDatabase.dbo.MyTable FROM 'https://mystorageaccount.blob.core.windows.net/mycontainer/data*.csv';
File Format Options
These options control how BULK INSERT
interprets the data file:
FORMAT = 'CSV'
: Specifies that the data file is in Comma Separated Values (CSV) format, adhering to RFC 4180 standards. This is available from SQL Server 2017 onwards.FIELDQUOTE = 'quote_characters'
: Defines the quote character used in the CSV file. Defaults to double quote ("
).CODEPAGE = { 'RAW' | 'code_page' | 'ACP' | 'OEM' }
: Sets the code page for the data file, crucial for handling character encoding correctly, especially forchar
,varchar
, andtext
data types.'ACP'
: ANSI/Windows code page (ISO 1252).'OEM'
: System OEM code page (default).'RAW'
: No code page conversion (fastest).'code_page'
: Specific code page number (e.g., ‘65001’ for UTF-8).
DATAFILETYPE = { 'char' | 'native' | 'widechar' | 'widenative' }
: Specifies the overall data format of the file.'char'
: Character format (default).'native'
: Native SQL Server data types.'widechar'
: Unicode character format.'widenative'
: Native types with Unicode for character data.
ROWTERMINATOR = 'row_terminator'
: Specifies the row terminator. Default isrn
(newline). Common alternatives includen
(line feed) for UNIX-style files or custom terminators.FIELDTERMINATOR = 'field_terminator'
: Defines the field terminator. Default ist
(tab). For CSV files, this would typically be,
(comma) or;
(semicolon).FIRSTROW = first_row
: Specifies the starting row to import. Useful for skipping header rows (e.g.,FIRSTROW = 2
to skip the first row).LASTROW = last_row
: Specifies the last row to import.
-- Example: Importing a CSV file with semicolon delimiter and skipping the header row
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.csv'
WITH (
FORMAT = 'CSV',
FIELDTERMINATOR = ';',
ROWTERMINATOR = 'n',
FIRSTROW = 2
);
-- Example: Importing a UTF-8 encoded CSV file
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datautf8_data.csv'
WITH (
FORMAT = 'CSV',
CODEPAGE = '65001'
);
Format File
FORMATFILE = 'format_file_path'
: Provides the path to a format file. Format files are XML or non-XML files that describe the structure of the data file, including data types, column order, and terminators. They are essential when the data file’s structure doesn’t perfectly match the target table or for complex data type mappings. Format files are typically created using thebcp
utility.FORMATFILE_DATA_SOURCE = 'data_source_name'
: When using Azure Blob Storage for the format file, this option specifies the external data source name.
-- Example: Using a format file
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.dat'
WITH (
FORMATFILE = 'C:FormatFilesmy_format_file.xml'
);
Error Handling Options
MAXERRORS = max_errors
: Sets the maximum number of errors allowed before theBULK INSERT
operation is aborted. Default is 10.ERRORFILE = 'error_file_path'
: Specifies a file to store rows that could not be imported due to formatting or conversion errors.ERRORFILE_DATA_SOURCE = 'errorfile_data_source_name'
: When using Azure Blob Storage for the error file, this option specifies the external data source name.
-- Example: Setting error handling options
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata_with_potential_errors.csv'
WITH (
FORMAT = 'CSV',
MAXERRORS = 100,
ERRORFILE = 'C:ErrorLogsbulk_insert_errors.log'
);
Database Options
KEEPIDENTITY
: Specifies that identity values from the data file should be used for identity columns in the target table. By default, identity values are ignored, and SQL Server generates new values. RequiresALTER TABLE
permission.KEEPNULLS
: Indicates that empty columns in the data file should be imported as NULL values in the table. By default, if a column is nullable and the data file has no value, the column’s default value (if any) is used.FIRE_TRIGGERS
: Enables firing of INSERT triggers defined on the target table for each batch of data inserted. By default, triggers are not fired for performance reasons. RequiresALTER TABLE
permission.CHECK_CONSTRAINTS
: Forces checks of all constraints (CHECK and FOREIGN KEY constraints) on the target table during the bulk import. By default, constraints are not checked, and foreign key and check constraints are marked as not-trusted after the operation. UNIQUE and PRIMARY KEY constraints are always enforced. RequiresALTER TABLE
permission if constraints exist.TABLOCK
: Acquires a table-level lock for the duration of the bulk insert operation. This can significantly improve performance, especially for large tables without indexes, by reducing lock contention. However, it prevents concurrent access to the table. For columnstore indexes, the locking behavior is different, allowing parallel data load into rowsets but TABLOCK might limit concurrency.
-- Example: Using database options
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata_with_identities_and_nulls.csv'
WITH (
FORMAT = 'CSV',
KEEPIDENTITY,
KEEPNULLS,
TABLOCK
);
Source Options
ORDER ( { column [ ASC | DESC ] } [ ,...n ] )
: Specifies the sort order of data in the data file. If the data is sorted according to the clustered index of the target table, bulk import performance can be improved. SQL Server validates the sorted order for optimized imports.ROWS_PER_BATCH = rows_per_batch
: Indicates the approximate number of rows in the data file. Providing this value can help the query optimizer optimize the bulk import operation.KILOBYTES_PER_BATCH = kilobytes_per_batch
: Specifies the approximate data size in kilobytes per batch.BATCHSIZE = batch_size
: Defines the number of rows per batch. Each batch is processed as a separate transaction. Smaller batch sizes can reduce transaction log overhead but might increase the number of transactions. The default is to treat the entire data file as a single batch.
-- Example: Source options for performance tuning
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:SortedDatasorted_data.csv'
WITH (
FORMAT = 'CSV',
ORDER (CustomerID ASC),
ROWS_PER_BATCH = 10000,
TABLOCK
);
Data Types and Compatibility
BULK INSERT
supports a wide range of SQL Server data types. When importing data, especially string data into decimal or numeric columns, it’s important to be aware of data type conversion rules. BULK INSERT
uses the same conversion rules as the CONVERT
function in Transact-SQL. Scientific notation in numeric strings might be treated as invalid unless handled with a format file that explicitly maps the data type.
For importing or exporting XML data, BULK INSERT
can handle SQLXML
documents using SQLCHAR
, SQLVARCHAR
, SQLNCHAR
, SQLNVARCHAR
, SQLBINARY
, or SQLVARBIN
data types in the format file.
Performance Considerations
Optimizing BULK INSERT
performance is crucial for large data loads. Key factors to consider:
- Minimal Logging: Bulk import operations can be minimally logged if certain prerequisites are met, significantly reducing transaction log overhead and speeding up the process. Minimal logging is not supported in Azure SQL Database.
- Batch Size: Experiment with different
BATCHSIZE
,ROWS_PER_BATCH
, andKILOBYTES_PER_BATCH
values to find the optimal settings for your data and system. Smaller batches can reduce rollback overhead but might increase transaction count. - Table Locking (
TABLOCK
): UsingTABLOCK
can drastically improve performance by minimizing locking overhead, especially for tables without indexes. However, it blocks concurrent access. - Sorted Data (
ORDER
): If data is sorted according to the clustered index, use theORDER
clause for optimized imports. - Data File Location: Accessing data files locally is generally faster than over a network. For cloud environments, ensure efficient connectivity to Azure Blob Storage.
- Resource Limits: In Azure SQL Database, consider temporarily scaling up the performance level of the database during large import operations.
Security Considerations
Security is paramount when using BULK INSERT
:
- Permissions: Requires
INSERT
andADMINISTER BULK OPERATIONS
permissions. In Azure SQL Database,INSERT
andADMINISTER DATABASE BULK OPERATIONS
are needed.ADMINISTER BULK OPERATIONS
or thebulkadmin
role is not supported on SQL Server on Linux (onlysysadmin
can perform bulk inserts).ALTER TABLE
permission might be required depending on the options used (e.g.,KEEPIDENTITY
,FIRE_TRIGGERS
,CHECK_CONSTRAINTS
). - Security Account Delegation (Impersonation): When using SQL Server authentication, the security context of the SQL Server service account is used to access the data file. Ensure this account has the necessary permissions to read the data. Windows authentication uses the user’s credentials.
- Azure Blob Storage Security: For private Azure Blob Storage, use Shared Access Signatures (SAS) or Managed Identities for secure access. Create database-scoped credentials and external data sources to manage these securely.
Examples of BULK INSERT
Example 1: Basic Bulk Insert from a Pipe-Delimited File
BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
FROM 'f:orderslineitem.tbl'
WITH (
FIELDTERMINATOR = ' |',
ROWTERMINATOR = ' |n'
);
Example 2: Bulk Insert with FIRE_TRIGGERS
BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
FROM 'f:orderslineitem.tbl'
WITH (
FIELDTERMINATOR = ' |',
ROWTERMINATOR = ':n',
FIRE_TRIGGERS
);
Example 3: Bulk Insert with Line Feed Row Terminator (UNIX Files)
DECLARE @bulk_cmd VARCHAR(1000);
SET @bulk_cmd = 'BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
FROM ''<drive>:<path><filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')';
EXEC(@bulk_cmd);
Example 4: Bulk Insert Specifying Code Page
BULK INSERT MyTable
FROM 'D:data.csv'
WITH (
CODEPAGE = '65001', -- UTF-8 encoding
DATAFILETYPE = 'char',
FIELDTERMINATOR = ','
);
Example 5: Bulk Insert from a CSV File, Skipping Header
BULK INSERT Sales.Invoices
FROM '\shareinvoicesinv-2016-07-25.csv'
WITH (
FORMAT = 'CSV',
FIRSTROW = 2, -- Skip header row
FIELDQUOTE = '"',
FIELDTERMINATOR = ';',
ROWTERMINATOR = '0x0a' -- Line Feed
);
Example 6: Bulk Insert from Azure Blob Storage using SAS Key
-- Create Master Key (if not already exists)
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
GO
-- Create Database Scoped Credential for SAS Key
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************';
-- Create External Data Source
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://****************.blob.core.windows.net/invoices',
CREDENTIAL = MyAzureBlobStorageCredential
);
-- Bulk Insert from Azure Blob Storage
BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage');
Example 7: Bulk Insert from Azure Blob Storage with Error File
BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (
DATA_SOURCE = 'MyAzureInvoices',
FORMAT = 'CSV',
ERRORFILE = 'MyErrorFile',
ERRORFILE_DATA_SOURCE = 'MyAzureInvoices'
);
Conclusion
Sql Server Bulk Insert
is an indispensable tool for efficiently loading large volumes of data into your SQL Server databases. By understanding its syntax, options, and performance considerations, you can optimize your data loading processes, ensuring speed and reliability. Whether you’re dealing with on-premises SQL Server, Azure SQL Database, or Fabric Data Warehouse, mastering BULK INSERT
is a valuable skill for any data professional.