String Manipulation -- REPLACE, PATINDEX and Regular Expressions

Shannon_Lowder
CERTIFIED EXPERT
Published:
In the previous SQL posts, I showed you SUBSTRING, then I showed you REPLACE. Now, with REPLACE, you had to know what characters you wanted to replace.  What if you only knew what characters you wanted to keep?  Well, let me introduce you to PATINDEX.  This is an advanced command that is easy to learn, difficult to master.  I’ll bring this tool out of my tool belt in future posts to provide you more ways to use it.

But first, I need to include my standard disclaimer:

I would like to point out that you have to be careful when and how often you use the techniques below.  A good rule to keep in mind is text manipulation is slow and painful to a server.  If you can leave the text manipulation to your middle-ware or front end, that would be better.  But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.
The Problem
phone
605-555-2862
(561)555-2700
904-555-5680
N/A
580-555-5371
2815558368
(254)555-8430
336-555-2797
3365557233
592-555-3181/4951
96615551222 Ext. 249
+44 7930 555271

Looking at the examples in the column we know there characters in there other than 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0.  How can we easily remove the non numeric values from this column?  We can use regular expressions.  We can say, if it isn’t 0-9, remove it.

Now, the solution to this problem will use a WHILE loop.  This is something I’m not really going to go into too much detail on, but using a while loop can really eat up processor time and memory.  This is one of those times where this solution should only be used as a last resort, and only during non-peak server usage.  Before implementing this solution, I’d be sure to spend time putting together a fix on the front end, to scrub out these bad characters before they got into my database.  But, let’s come off the soapbox and discuss the solution, shall we?
The Solution

Let’s first look at the syntax for the PATINDEX function.

PATINDEX ('%pattern%', sourceString)

The PATINDEX function will search for a pattern in the SourceString, and return a bigint of where that pattern is found in the sourceString.  If it’s not found, then you get 0.  With this you can also provide a pattern '%[^0-9]%' and it will search for characters that are not numbers.

But that’s only part of the solution.  You then need to couple that with REPLACE, so you can remove anything that is not a number.

--variable to hold the current phone number being updated
DECLARE @phone VARCHAR(20)
--variable to hold the original value (so we can update it)
DECLARE @originalPhone VARCHAR(20)
--if there is a phone number with something other than a number in it
WHILE EXISTS (
        SELECT TOP 1
               phone
        FROM sourceTable (READCOMMITTED)
        WHERE
               PATINDEX('%[^0-9]%', phone) ) > 0)
BEGIN
        --store the original number
        SELECT TOP 1
               @originalPhone = phone
        FROM sourceTable (READCOMMITTED)
        WHERE
               PATINDEX('%[^0-9]%', phone)
        --then copy it to a temporary variable
        SET @phone = @originalPhone
        --spin through the phone number one character at a time
        -- if there is a character other than a number
        WHILE(PATINDEX('%[^0-9]%', @phone) ) > 0
        BEGIN
               --then remove that one character, then continue
         SET @phone = REPLACE(@phone
               , SUBSTRING(@phone, PATINDEX('%[^0-9]%', @phone), 1)
              , '')
        END
      --once all the non numeric characters are gone, update
        UPDATE sourceTable SET
               Phone = @phone
        WHERE
               Phone = @originalPhone
END

If you’re having difficulties with the WHILE loop, search my blog for the entry on WHILE loops.  They’re pretty easy once you get to know them.

If you wanted to get fancy you could make the replace portion of this code a scalar function, and call it in an update step.  It would get rid of the explicit outer loop.  In future versions of SQL this may improve, but for SQL 2000, this is the best I’ve come up with.

There are many more string manipulation techniques to cover, so stay tuned to this same bat-channel!
References

http://msdn.microsoft.com/en-us/library/ms188395.aspx

http://msdn.microsoft.com/en-us/library/ms178642.aspx
1
8,778 Views
Shannon_Lowder
CERTIFIED EXPERT

Comments (1)

Jim HornSQL Server Data Dude
CERTIFIED EXPERT
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Very good tutorial on PATINDEX, which I'm sure if I read six months ago would have gotten me past the PATINDEX question on the 70-433 exam.  Voted Yes.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.