SQL SUBSTRING: How to Extract Parts of Strings in SQL Server

The SQL SUBSTRING function is a powerful tool in SQL Server that allows you to extract a specific portion of a string. Whether you need to get the first few characters, the last part, or any segment in between, SUBSTRING provides the flexibility to manipulate string data precisely. This article will delve into the details of the SUBSTRING function, covering its syntax, arguments, return types, and providing practical examples to illustrate its usage.

Understanding the Syntax of SQL SUBSTRING

The syntax for the SUBSTRING function in SQL Server is straightforward:

SUBSTRING ( expression, start, length )

Let’s break down each part of this syntax:

  • expression: This is the string from which you want to extract a substring. It can be a column of a character, binary, text, or image data type, or a literal string value.
  • start: This integer or bigint value specifies the starting position of the substring within the expression. Crucially, SQL Server uses 1-based indexing, meaning the first character of the string is at position 1. If start is less than 1, the substring will begin from the first character of the expression.
  • length: This positive integer or bigint value determines the number of characters to be extracted from the expression, starting from the start position. If length is negative, SQL Server will throw an error. If the sum of start and length exceeds the total length of the expression, SUBSTRING will return the portion of the string from the start position to the end of the expression.

Arguments Explained

To effectively use the SUBSTRING function, it’s important to understand the nuances of each argument:

expression Argument

The expression argument can be of various data types. Here’s a summary of the supported types:

  • Character data types: char, varchar, text, nchar, nvarchar, ntext. When using character data types, start and length are counted in characters.
  • Binary data types: binary, varbinary, image. For binary data types, start and length are specified in bytes.

For very large string or binary expressions, expression can also be varchar(max) or varbinary(max), especially when start or length values are greater than 2147483647.

start Argument

The start argument dictates where the substring extraction begins. Remember these key points:

  • 1-based Indexing: SQL Server’s SUBSTRING function uses 1-based indexing. The first character is at position 1, not 0, which is different from some other programming languages.
  • Values less than 1: If start is less than 1, SUBSTRING treats it as 1, starting the extraction from the beginning of the string.
  • Values greater than expression length: If start is greater than the length of the expression, SUBSTRING returns an empty string.

length Argument

The length argument controls how many characters or bytes are extracted. Consider these points:

  • Positive Value Required: length must be a positive integer. A negative value will result in an error.
  • Exceeding Expression Length: If start + length is greater than the total length of the expression, SUBSTRING will return the substring from the start position to the end of the expression.

Return Types of SQL SUBSTRING

The SUBSTRING function returns a string value. The data type of the returned string depends on the data type of the input expression:

Specified Expression Type Return Type
char, varchar, text varchar
nchar, nvarchar, ntext nvarchar
binary, varbinary, image varbinary

Important Remarks for SQL SUBSTRING

  • Character vs. Byte Counting: For character data types (ntext, char, varchar), start and length are interpreted as the number of characters. For binary data types (text, image, binary, varbinary), they are interpreted as the number of bytes.
  • Surrogate Pairs: When working with supplementary characters (Unicode characters represented by surrogate pairs), SUBSTRING correctly counts each surrogate pair as a single character. This is important for handling international character sets.

Examples of Using SQL SUBSTRING

Let’s explore practical examples to understand how to use the SUBSTRING function effectively.

Example A: Extracting Substrings from Character Strings

This example demonstrates how to extract parts of database names from the sys.databases system table.

SELECT name,
       SUBSTRING(name, 1, 1) AS Initial, -- Get the first character
       SUBSTRING(name, 3, 2) AS ThirdAndFourthCharacters -- Get the third and fourth characters
FROM sys.databases
WHERE database_id < 5; -- Limit to system databases for brevity

This query will return the names of system databases, along with their first initial and the third and fourth characters.

Here’s the result set:

name Initial ThirdAndFourthCharacters
master m st
tempdb t mp
model m de
msdb m db

To extract specific characters from a string constant, you can directly use SUBSTRING:

SELECT SUBSTRING('abcdef', 2, 3) AS ExtractedString;

This will return 'bcd' as the extracted substring.

ExtractedString
---------------
bcd

Example B: Using SUBSTRING with text, ntext, and image Data Types

This example illustrates how to use SUBSTRING with different data types like text and image. We’ll use the pub_info table from the pubs database.

USE pubs;
SELECT pub_id,
       SUBSTRING(logo, 1, 10) AS logo_substring, -- First 10 bytes of logo (image)
       SUBSTRING(pr_info, 1, 10) AS pr_info_substring -- First 10 characters of pr_info (text)
FROM pub_info
WHERE pub_id = '1756';

In this query, we extract the first 10 bytes from the logo column (which is of image type) and the first 10 characters from the pr_info column (which is of text type).

The result set shows the substring of both logo and pr_info:

pub_id logo_substring pr_info_substring
1756 0x4749463839 This is sa

The logo data is returned as varbinary, while pr_info data is returned as varchar.

Example C: SUBSTRING in Azure Synapse Analytics and Analytics Platform System (PDW)

This example demonstrates SUBSTRING usage in Azure Synapse Analytics. We’ll extract the initial of the first name from the dbo.DimEmployee table in the AdventureWorksDW2019 database (or a compatible database).

-- Uses AdventureWorksDW2019 database (or similar)
SELECT LastName,
       SUBSTRING(FirstName, 1, 1) AS Initial
FROM dbo.DimEmployee
WHERE LastName LIKE 'Bar%'
ORDER BY LastName;

This query retrieves last names starting with ‘Bar’ and their first initial from the first name.

The result set will look like this:

LastName Initial
Barbariol A
Barber D
Barreto de Mattos P

For string constants in Azure Synapse Analytics, the usage is similar:

SELECT TOP 1 SUBSTRING('abcdef', 2, 3) AS ExtractedString
FROM dbo.DimCustomer; -- Using DimCustomer just to execute the query

This will return 'bcd'.

ExtractedString
---------------
bcd

Conclusion

The SQL SUBSTRING function is an essential string manipulation tool in SQL Server. It provides a simple yet powerful way to extract portions of strings based on starting position and length. Understanding its syntax, arguments, return types, and the nuances of character versus byte counting is crucial for effectively using SUBSTRING in your SQL queries. By leveraging the examples provided, you can confidently apply SUBSTRING to solve various string manipulation tasks in SQL Server, Azure SQL Database, Azure Synapse Analytics, and other SQL Server environments.

See Also

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *