The SQL SUBSTRING
function is a powerful tool in SQL Server that allows you to extract a specific portion of a string. Whether you need to get the first few characters, the last part, or any segment in between, SUBSTRING
provides the flexibility to manipulate string data precisely. This article will delve into the details of the SUBSTRING
function, covering its syntax, arguments, return types, and providing practical examples to illustrate its usage.
Understanding the Syntax of SQL SUBSTRING
The syntax for the SUBSTRING
function in SQL Server is straightforward:
SUBSTRING ( expression, start, length )
Let’s break down each part of this syntax:
expression
: This is the string from which you want to extract a substring. It can be a column of a character, binary, text, or image data type, or a literal string value.start
: This integer or bigint value specifies the starting position of the substring within theexpression
. Crucially, SQL Server uses 1-based indexing, meaning the first character of the string is at position 1. Ifstart
is less than 1, the substring will begin from the first character of theexpression
.length
: This positive integer or bigint value determines the number of characters to be extracted from theexpression
, starting from thestart
position. Iflength
is negative, SQL Server will throw an error. If the sum ofstart
andlength
exceeds the total length of theexpression
,SUBSTRING
will return the portion of the string from thestart
position to the end of theexpression
.
Arguments Explained
To effectively use the SUBSTRING
function, it’s important to understand the nuances of each argument:
expression
Argument
The expression
argument can be of various data types. Here’s a summary of the supported types:
- Character data types:
char
,varchar
,text
,nchar
,nvarchar
,ntext
. When using character data types,start
andlength
are counted in characters. - Binary data types:
binary
,varbinary
,image
. For binary data types,start
andlength
are specified in bytes.
For very large string or binary expressions, expression
can also be varchar(max)
or varbinary(max)
, especially when start
or length
values are greater than 2147483647.
start
Argument
The start
argument dictates where the substring extraction begins. Remember these key points:
- 1-based Indexing: SQL Server’s
SUBSTRING
function uses 1-based indexing. The first character is at position 1, not 0, which is different from some other programming languages. - Values less than 1: If
start
is less than 1,SUBSTRING
treats it as 1, starting the extraction from the beginning of the string. - Values greater than expression length: If
start
is greater than the length of theexpression
,SUBSTRING
returns an empty string.
length
Argument
The length
argument controls how many characters or bytes are extracted. Consider these points:
- Positive Value Required:
length
must be a positive integer. A negative value will result in an error. - Exceeding Expression Length: If
start + length
is greater than the total length of theexpression
,SUBSTRING
will return the substring from thestart
position to the end of theexpression
.
Return Types of SQL SUBSTRING
The SUBSTRING
function returns a string value. The data type of the returned string depends on the data type of the input expression
:
Specified Expression Type | Return Type |
---|---|
char , varchar , text |
varchar |
nchar , nvarchar , ntext |
nvarchar |
binary , varbinary , image |
varbinary |
Important Remarks for SQL SUBSTRING
- Character vs. Byte Counting: For character data types (
ntext
,char
,varchar
),start
andlength
are interpreted as the number of characters. For binary data types (text
,image
,binary
,varbinary
), they are interpreted as the number of bytes. - Surrogate Pairs: When working with supplementary characters (Unicode characters represented by surrogate pairs),
SUBSTRING
correctly counts each surrogate pair as a single character. This is important for handling international character sets.
Examples of Using SQL SUBSTRING
Let’s explore practical examples to understand how to use the SUBSTRING
function effectively.
Example A: Extracting Substrings from Character Strings
This example demonstrates how to extract parts of database names from the sys.databases
system table.
SELECT name,
SUBSTRING(name, 1, 1) AS Initial, -- Get the first character
SUBSTRING(name, 3, 2) AS ThirdAndFourthCharacters -- Get the third and fourth characters
FROM sys.databases
WHERE database_id < 5; -- Limit to system databases for brevity
This query will return the names of system databases, along with their first initial and the third and fourth characters.
Here’s the result set:
name | Initial | ThirdAndFourthCharacters |
---|---|---|
master | m | st |
tempdb | t | mp |
model | m | de |
msdb | m | db |
To extract specific characters from a string constant, you can directly use SUBSTRING
:
SELECT SUBSTRING('abcdef', 2, 3) AS ExtractedString;
This will return 'bcd'
as the extracted substring.
ExtractedString
---------------
bcd
Example B: Using SUBSTRING with text
, ntext
, and image
Data Types
This example illustrates how to use SUBSTRING
with different data types like text
and image
. We’ll use the pub_info
table from the pubs
database.
USE pubs;
SELECT pub_id,
SUBSTRING(logo, 1, 10) AS logo_substring, -- First 10 bytes of logo (image)
SUBSTRING(pr_info, 1, 10) AS pr_info_substring -- First 10 characters of pr_info (text)
FROM pub_info
WHERE pub_id = '1756';
In this query, we extract the first 10 bytes from the logo
column (which is of image
type) and the first 10 characters from the pr_info
column (which is of text
type).
The result set shows the substring of both logo
and pr_info
:
pub_id | logo_substring | pr_info_substring |
---|---|---|
1756 | 0x4749463839 | This is sa |
The logo
data is returned as varbinary
, while pr_info
data is returned as varchar
.
Example C: SUBSTRING in Azure Synapse Analytics and Analytics Platform System (PDW)
This example demonstrates SUBSTRING
usage in Azure Synapse Analytics. We’ll extract the initial of the first name from the dbo.DimEmployee
table in the AdventureWorksDW2019
database (or a compatible database).
-- Uses AdventureWorksDW2019 database (or similar)
SELECT LastName,
SUBSTRING(FirstName, 1, 1) AS Initial
FROM dbo.DimEmployee
WHERE LastName LIKE 'Bar%'
ORDER BY LastName;
This query retrieves last names starting with ‘Bar’ and their first initial from the first name.
The result set will look like this:
LastName | Initial |
---|---|
Barbariol | A |
Barber | D |
Barreto de Mattos | P |
For string constants in Azure Synapse Analytics, the usage is similar:
SELECT TOP 1 SUBSTRING('abcdef', 2, 3) AS ExtractedString
FROM dbo.DimCustomer; -- Using DimCustomer just to execute the query
This will return 'bcd'
.
ExtractedString
---------------
bcd
Conclusion
The SQL SUBSTRING
function is an essential string manipulation tool in SQL Server. It provides a simple yet powerful way to extract portions of strings based on starting position and length. Understanding its syntax, arguments, return types, and the nuances of character versus byte counting is crucial for effectively using SUBSTRING
in your SQL queries. By leveraging the examples provided, you can confidently apply SUBSTRING
to solve various string manipulation tasks in SQL Server, Azure SQL Database, Azure Synapse Analytics, and other SQL Server environments.