SQL Server Bulk Insert: High-Speed Data Loading Techniques

Efficiently loading large datasets into SQL Server databases is a critical task for database administrators and developers alike. SQL Server provides a powerful command, BULK INSERT, designed for high-speed data imports. This article delves into the intricacies of BULK INSERT, exploring its syntax, options, performance considerations, and best practices for optimized data loading.

Understanding SQL Server Bulk Insert

BULK INSERT is a Transact-SQL statement that allows you to import data from a data file into a SQL Server table or view. It’s optimized for speed and efficiency, making it ideal for loading large volumes of data quickly. Unlike row-by-row inserts, BULK INSERT minimizes logging and can bypass certain constraints, significantly accelerating the data loading process.

This command is available in various SQL Server environments, including:

  • SQL Server: On-premises installations.
  • Azure SQL Database: Microsoft’s cloud-based database service.
  • Azure SQL Managed Instance: A fully managed SQL Server instance in Azure.
  • Fabric Data Warehouse: Microsoft Fabric’s data warehousing solution.

While the core functionality remains consistent, certain features and options of BULK INSERT may vary across these platforms, particularly concerning data source locations and authentication methods.

Syntax of BULK INSERT

The basic syntax of the BULK INSERT statement is as follows:

BULK INSERT
{ database_name.schema_name.table_or_view_name | schema_name.table_or_view_name | table_or_view_name }
FROM 'data_file'
[ WITH
    (
        [ [ , ] DATA_SOURCE = 'data_source_name' ]
        [ [ , ] FORMAT = 'CSV' ]
        [ [ , ] FIELDQUOTE = 'quote_characters']
        [ [ , ] CODEPAGE = { 'RAW' | 'code_page' | 'ACP' | 'OEM' } ]
        [ [ , ] DATAFILETYPE = { 'char' | 'native' | 'widechar' | 'widenative' } ]
        [ [ , ] ROWTERMINATOR = 'row_terminator' ]
        [ [ , ] FIELDTERMINATOR = 'field_terminator' ]
        [ [ , ] FIRSTROW = first_row ]
        [ [ , ] LASTROW = last_row ]
        [ [ , ] FORMATFILE = 'format_file_path' ]
        [ [ , ] FORMATFILE_DATA_SOURCE = 'data_source_name' ]
        [ [ , ] MAXERRORS = max_errors ]
        [ [ , ] ERRORFILE = 'file_name' ]
        [ [ , ] ERRORFILE_DATA_SOURCE = 'errorfile_data_source_name' ]
        [ [ , ] KEEPIDENTITY ]
        [ [ , ] KEEPNULLS ]
        [ [ , ] FIRE_TRIGGERS ]
        [ [ , ] CHECK_CONSTRAINTS ]
        [ [ , ] TABLOCK ]
        [ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ]
        [ [ , ] ROWS_PER_BATCH = rows_per_batch ]
        [ [ , ] KILOBYTES_PER_BATCH = kilobytes_per_batch ]
        [ [ , ] BATCHSIZE = batch_size ]
    )
]

Let’s break down the key arguments and options:

Target Table Specification

  • { database_name.schema_name.table_or_view_name | schema_name.table_or_view_name | table_or_view_name }: Specifies the target table or view for the bulk import. You can optionally include the database and schema names if they are different from the current context. Note that views must be simple views where all columns refer to a single base table.

Data Source

  • FROM 'data_file': Defines the full path to the data file.

    • SQL Server: Can be a local path, network path (UNC), or Azure Blob Storage path (starting with SQL Server 2017).
    • Azure SQL Database & Managed Instance: Must be an Azure Blob Storage path.
    • Fabric Data Warehouse: Must be an Azure Blob Storage path, supporting wildcard characters in the path for flexible file selection.
      -- Example: Local file path
      BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.csv';
    
      -- Example: UNC path
      BULK INSERT MyDatabase.dbo.MyTable FROM '\ServerNameShareNameDatadata.csv';
    
      -- Example: Azure Blob Storage path (SQL Server 2017+)
      BULK INSERT MyDatabase.dbo.MyTable FROM 'yourcontainer/data.csv'
      WITH (DATA_SOURCE = 'MyAzureDataSource');
    
       -- Example: Azure Blob Storage path (Fabric Data Warehouse with wildcard)
      BULK INSERT MyDatabase.dbo.MyTable FROM 'https://mystorageaccount.blob.core.windows.net/mycontainer/data*.csv';

File Format Options

These options control how BULK INSERT interprets the data file:

  • FORMAT = 'CSV': Specifies that the data file is in Comma Separated Values (CSV) format, adhering to RFC 4180 standards. This is available from SQL Server 2017 onwards.
  • FIELDQUOTE = 'quote_characters': Defines the quote character used in the CSV file. Defaults to double quote (").
  • CODEPAGE = { 'RAW' | 'code_page' | 'ACP' | 'OEM' }: Sets the code page for the data file, crucial for handling character encoding correctly, especially for char, varchar, and text data types.
    • 'ACP': ANSI/Windows code page (ISO 1252).
    • 'OEM': System OEM code page (default).
    • 'RAW': No code page conversion (fastest).
    • 'code_page': Specific code page number (e.g., ‘65001’ for UTF-8).
  • DATAFILETYPE = { 'char' | 'native' | 'widechar' | 'widenative' }: Specifies the overall data format of the file.
    • 'char': Character format (default).
    • 'native': Native SQL Server data types.
    • 'widechar': Unicode character format.
    • 'widenative': Native types with Unicode for character data.
  • ROWTERMINATOR = 'row_terminator': Specifies the row terminator. Default is rn (newline). Common alternatives include n (line feed) for UNIX-style files or custom terminators.
  • FIELDTERMINATOR = 'field_terminator': Defines the field terminator. Default is t (tab). For CSV files, this would typically be , (comma) or ; (semicolon).
  • FIRSTROW = first_row: Specifies the starting row to import. Useful for skipping header rows (e.g., FIRSTROW = 2 to skip the first row).
  • LASTROW = last_row: Specifies the last row to import.
-- Example: Importing a CSV file with semicolon delimiter and skipping the header row
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.csv'
WITH (
    FORMAT = 'CSV',
    FIELDTERMINATOR = ';',
    ROWTERMINATOR = 'n',
    FIRSTROW = 2
);

-- Example: Importing a UTF-8 encoded CSV file
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datautf8_data.csv'
WITH (
    FORMAT = 'CSV',
    CODEPAGE = '65001'
);

Format File

  • FORMATFILE = 'format_file_path': Provides the path to a format file. Format files are XML or non-XML files that describe the structure of the data file, including data types, column order, and terminators. They are essential when the data file’s structure doesn’t perfectly match the target table or for complex data type mappings. Format files are typically created using the bcp utility.
  • FORMATFILE_DATA_SOURCE = 'data_source_name': When using Azure Blob Storage for the format file, this option specifies the external data source name.
-- Example: Using a format file
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata.dat'
WITH (
    FORMATFILE = 'C:FormatFilesmy_format_file.xml'
);

Error Handling Options

  • MAXERRORS = max_errors: Sets the maximum number of errors allowed before the BULK INSERT operation is aborted. Default is 10.
  • ERRORFILE = 'error_file_path': Specifies a file to store rows that could not be imported due to formatting or conversion errors.
  • ERRORFILE_DATA_SOURCE = 'errorfile_data_source_name': When using Azure Blob Storage for the error file, this option specifies the external data source name.
-- Example: Setting error handling options
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata_with_potential_errors.csv'
WITH (
    FORMAT = 'CSV',
    MAXERRORS = 100,
    ERRORFILE = 'C:ErrorLogsbulk_insert_errors.log'
);

Database Options

  • KEEPIDENTITY: Specifies that identity values from the data file should be used for identity columns in the target table. By default, identity values are ignored, and SQL Server generates new values. Requires ALTER TABLE permission.
  • KEEPNULLS: Indicates that empty columns in the data file should be imported as NULL values in the table. By default, if a column is nullable and the data file has no value, the column’s default value (if any) is used.
  • FIRE_TRIGGERS: Enables firing of INSERT triggers defined on the target table for each batch of data inserted. By default, triggers are not fired for performance reasons. Requires ALTER TABLE permission.
  • CHECK_CONSTRAINTS: Forces checks of all constraints (CHECK and FOREIGN KEY constraints) on the target table during the bulk import. By default, constraints are not checked, and foreign key and check constraints are marked as not-trusted after the operation. UNIQUE and PRIMARY KEY constraints are always enforced. Requires ALTER TABLE permission if constraints exist.
  • TABLOCK: Acquires a table-level lock for the duration of the bulk insert operation. This can significantly improve performance, especially for large tables without indexes, by reducing lock contention. However, it prevents concurrent access to the table. For columnstore indexes, the locking behavior is different, allowing parallel data load into rowsets but TABLOCK might limit concurrency.
-- Example: Using database options
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:Datadata_with_identities_and_nulls.csv'
WITH (
    FORMAT = 'CSV',
    KEEPIDENTITY,
    KEEPNULLS,
    TABLOCK
);

Source Options

  • ORDER ( { column [ ASC | DESC ] } [ ,...n ] ): Specifies the sort order of data in the data file. If the data is sorted according to the clustered index of the target table, bulk import performance can be improved. SQL Server validates the sorted order for optimized imports.
  • ROWS_PER_BATCH = rows_per_batch: Indicates the approximate number of rows in the data file. Providing this value can help the query optimizer optimize the bulk import operation.
  • KILOBYTES_PER_BATCH = kilobytes_per_batch: Specifies the approximate data size in kilobytes per batch.
  • BATCHSIZE = batch_size: Defines the number of rows per batch. Each batch is processed as a separate transaction. Smaller batch sizes can reduce transaction log overhead but might increase the number of transactions. The default is to treat the entire data file as a single batch.
-- Example: Source options for performance tuning
BULK INSERT MyDatabase.dbo.MyTable FROM 'C:SortedDatasorted_data.csv'
WITH (
    FORMAT = 'CSV',
    ORDER (CustomerID ASC),
    ROWS_PER_BATCH = 10000,
    TABLOCK
);

Data Types and Compatibility

BULK INSERT supports a wide range of SQL Server data types. When importing data, especially string data into decimal or numeric columns, it’s important to be aware of data type conversion rules. BULK INSERT uses the same conversion rules as the CONVERT function in Transact-SQL. Scientific notation in numeric strings might be treated as invalid unless handled with a format file that explicitly maps the data type.

For importing or exporting XML data, BULK INSERT can handle SQLXML documents using SQLCHAR, SQLVARCHAR, SQLNCHAR, SQLNVARCHAR, SQLBINARY, or SQLVARBIN data types in the format file.

Performance Considerations

Optimizing BULK INSERT performance is crucial for large data loads. Key factors to consider:

  • Minimal Logging: Bulk import operations can be minimally logged if certain prerequisites are met, significantly reducing transaction log overhead and speeding up the process. Minimal logging is not supported in Azure SQL Database.
  • Batch Size: Experiment with different BATCHSIZE, ROWS_PER_BATCH, and KILOBYTES_PER_BATCH values to find the optimal settings for your data and system. Smaller batches can reduce rollback overhead but might increase transaction count.
  • Table Locking (TABLOCK): Using TABLOCK can drastically improve performance by minimizing locking overhead, especially for tables without indexes. However, it blocks concurrent access.
  • Sorted Data (ORDER): If data is sorted according to the clustered index, use the ORDER clause for optimized imports.
  • Data File Location: Accessing data files locally is generally faster than over a network. For cloud environments, ensure efficient connectivity to Azure Blob Storage.
  • Resource Limits: In Azure SQL Database, consider temporarily scaling up the performance level of the database during large import operations.

Security Considerations

Security is paramount when using BULK INSERT:

  • Permissions: Requires INSERT and ADMINISTER BULK OPERATIONS permissions. In Azure SQL Database, INSERT and ADMINISTER DATABASE BULK OPERATIONS are needed. ADMINISTER BULK OPERATIONS or the bulkadmin role is not supported on SQL Server on Linux (only sysadmin can perform bulk inserts). ALTER TABLE permission might be required depending on the options used (e.g., KEEPIDENTITY, FIRE_TRIGGERS, CHECK_CONSTRAINTS).
  • Security Account Delegation (Impersonation): When using SQL Server authentication, the security context of the SQL Server service account is used to access the data file. Ensure this account has the necessary permissions to read the data. Windows authentication uses the user’s credentials.
  • Azure Blob Storage Security: For private Azure Blob Storage, use Shared Access Signatures (SAS) or Managed Identities for secure access. Create database-scoped credentials and external data sources to manage these securely.

Examples of BULK INSERT

Example 1: Basic Bulk Insert from a Pipe-Delimited File

BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
FROM 'f:orderslineitem.tbl'
WITH (
    FIELDTERMINATOR = ' |',
    ROWTERMINATOR = ' |n'
);

Example 2: Bulk Insert with FIRE_TRIGGERS

BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
FROM 'f:orderslineitem.tbl'
WITH (
    FIELDTERMINATOR = ' |',
    ROWTERMINATOR = ':n',
    FIRE_TRIGGERS
);

Example 3: Bulk Insert with Line Feed Row Terminator (UNIX Files)

DECLARE @bulk_cmd VARCHAR(1000);
SET @bulk_cmd = 'BULK INSERT AdventureWorks2022.Sales.SalesOrderDetail
                 FROM ''<drive>:<path><filename>''
                 WITH (ROWTERMINATOR = '''+CHAR(10)+''')';
EXEC(@bulk_cmd);

Example 4: Bulk Insert Specifying Code Page

BULK INSERT MyTable
FROM 'D:data.csv'
WITH (
    CODEPAGE = '65001', -- UTF-8 encoding
    DATAFILETYPE = 'char',
    FIELDTERMINATOR = ','
);

Example 5: Bulk Insert from a CSV File, Skipping Header

BULK INSERT Sales.Invoices
FROM '\shareinvoicesinv-2016-07-25.csv'
WITH (
    FORMAT = 'CSV',
    FIRSTROW = 2,       -- Skip header row
    FIELDQUOTE = '"',
    FIELDTERMINATOR = ';',
    ROWTERMINATOR = '0x0a' -- Line Feed
);

Example 6: Bulk Insert from Azure Blob Storage using SAS Key

-- Create Master Key (if not already exists)
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
GO

-- Create Database Scoped Credential for SAS Key
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************';

-- Create External Data Source
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH (
    TYPE = BLOB_STORAGE,
    LOCATION = 'https://****************.blob.core.windows.net/invoices',
    CREDENTIAL = MyAzureBlobStorageCredential
);

-- Bulk Insert from Azure Blob Storage
BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage');

Example 7: Bulk Insert from Azure Blob Storage with Error File

BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (
    DATA_SOURCE = 'MyAzureInvoices',
    FORMAT = 'CSV',
    ERRORFILE = 'MyErrorFile',
    ERRORFILE_DATA_SOURCE = 'MyAzureInvoices'
);

Conclusion

Sql Server Bulk Insert is an indispensable tool for efficiently loading large volumes of data into your SQL Server databases. By understanding its syntax, options, and performance considerations, you can optimize your data loading processes, ensuring speed and reliability. Whether you’re dealing with on-premises SQL Server, Azure SQL Database, or Fabric Data Warehouse, mastering BULK INSERT is a valuable skill for any data professional.

See Also

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *