SQL Server String Processing – Part 1

FIRST OF ALL

I strongly recommend that DO NOT process string in a database.

Database is designed to find and retrieve information from huge data rather than CPU intensive operations like text processing. For the best performance of a database, data should be structured and operation should specify a set of results instead of how to get data.

Text processing with SQL is a last resort.


Text processing with TSQL

In this article, I’ll show you string processing technique. Read string from field and split it into multiple smaller parts. After some processing of each string, merge them into a string to replace original string from field.

However, be careful. Database is not appropriate for CPU intensive operations like test processing as I mentioned in the first section.

If your database does CPU intensive processing instead of your application server, a database will suffer from heavy load because database utilize CPU not only your text processing but tasks such as compilation and processing of SQL statement. It makes a database hard to respond to requests quickly.

I used this technique for the last resort to prevent an online system from system failure caused by wrong string typed value. You can also use the same technique if you application retrieves data from a database via a stored procedure. It would be useless if your application retrieves data from a database by SQL statement because there is no chance to intercept the query result before an application get the query result.

This article consists of three topics.

  1. About how to split string
  2. About how to store temporary multiple variables
  3. About how to use a table-valued procedure with table
  4. Merge temporary multiple variable into a string

Spit string into multiple values


In SQL Server 2016 or above, I bet you STRING_SPLIT is the most efficient to split string.


In SQL Server, there is no string tokenizer function. So, you must write a string tokenizer function by yourself.

You can find useful codes to tokenize string from ‘TSQL split string (Stack overflow)’.
Passing a list/array to an SQL Server stored procedure’ shows a useful technique to pass multiple values into stored procedure and using those values in stored procedure by split into multiple values via CSV, comma-separated value, or XML.

Among them, using XML is only available with SQL server.

For your convenience, here is my code to split CSV string into multiple values.

CREATE FUNCTION dbo.splitstring (
@stringToSplit nvarchar(MAX)
)
RETURNS
@returnList TABLE ([name] [nvarchar] (500))
AS
BEGIN
DECLARE @name nvarchar(255)
DECLARE @pos int
WHILE CHARINDEX(',', @stringToSplit) > 0
BEGIN
SET @pos  = CHARINDEX(',', @stringToSplit)
SET @name = SUBSTRING(@stringToSplit, 1, @pos-1)
INSERT INTO @returnList VALUES (@name)
SET @stringToSplit = SUBSTRING(@stringToSplit,
@pos+1, LEN(@stringToSplit)-@pos)
END
INSERT INTO @returnList VALUES (@stringToSplit)
RETURN
END

In splitstring function, each values separated by comma should not exceed 255 bytes and are returned as a table data type.

I’ll explain about table value type in the next section.

Temporary Table and Table Type

Database has no primitive type containing multiple values. You can use a temporary table to pass multiple values into a procedure. A temporary table is meaningful during only a specific period and it’s data with not persistent, we called it a temporary table. A temporary table can not be passed into a procedure, rather it is referenced inside stored procedure (or batch program).

Besides a temporary table, you can use table value type from SQL server 2000. Information of a temporary table is managed by a database and it is a more expensive object than primitive type. Table value type is more light than a table and has some characteristics resembled a temporary table.

you can choose table value and temporary table by considering these.

  • How many data a table or table value containing?
  • Do you process complex statement with table value?

Characteristics of temporary table such as Index, statistics, and locking mechanism is the same with a normal table. However, a table value type is not a table. Because table data type does have neither index nor statistics, it isn’t as efficient as a table for searching and join operation.

Nonetheless, table value type needs less locking, resources, and recompilation of a stored procedure.

If your process requires very larger data set in the degree of millions of data, you should consider a temporary table. If you have a complex operation like join, it is better to use a temporary table, also.

How to Use splitstring

Now, we have stored procure to split string into multiple rows. To use this string function, you must place it to FROM clause.

See, below code.

SELECT name FROM splitstring('a,b,c,d')
name
——
a
b
c
d

How can use splitstring with a table?

In a relational database, you can use JOIN operator to merge two or more tables into one. Because SQL Server doesn’t permit to join table value parameter with table, you can not use splitstring with join operation. You should use cross apply to combine table and table value parameters.

Here is a sample table composite of two columns. The first column is a key column, and The second column is a string type column that contains multiple values separated by comma.

CREATE TABLE MyRecord
(
id int not null,
info nvarchar(512) not null,
constraint PK_MyRecord PRIMARY KEY (id)
)
GO
INSERT INTO MyRecord
VALUES (1, 'a,b,c,d')
GO
SELECT id, info from MyRecord
GO
id  info
------------
1   a,b,c,d
SELECT [name]
FROM MyRecord t cross apply splitstring(t.info)
name
----
a
b
c
d

CROSS APPLY merges table and table value parameter, but the result is yet multiple values, not a single string. To combine multiple rows into string, use FOR XML.

FOR XML statement

To combine multiple row values into a single string value, you can use XML statement in SQL Server.

XML statement creates an XML document from a table and each record of a table become XML elements. You can also specify the name of XML elements.

When blank, '' is set for element’ name, FOR XML generates no element tag. With the previous example, you can obtain a single string typed row from 4 rows via FOR XML .

SELECT (
SELECT ''  + s.[name]
FROM MyRecord t cross apply splitstring(t.info) as s
FOR XML PATH ('') ) v -- no name for element
v
---
abcd

You should involve an unnamed column name using string concatenation to produce no name for the XML element at line 2. Without this, XML statement will produce “ element from column name.

For more about FOR XML statement, you’d better to see FOR XML from Microsoft Docs

Conclusion

TSQL misses many string processing functions including string split function. It is not so bad considering the efficient performance of a database, but chances are that you need such functions to resolve some problems with your application.

In this article, I showed you a simple example of string split function and how to use it solely and join with a table. With FOR XML statement, you can combine multiple values returned from stringsplit into a single string type.
You can also make a good choice between temporary table and table value for processing multiple data.

Next article, I’ll discuss more string processing in SQL server and what you should care about return type of string function.

Comments are closed.

Website Built by WordPress.com.

Up ↑