String Manipulation -- SUBSTRING

Shannon_Lowder
CERTIFIED EXPERT
Published:
Before I go into how to go altering strings in SQL, I would like to point out that you have to be careful when and how often you use the techniques below.  A good rule to keep in mind is text manipulation is slow and painful to a server.  If you can leave the text manipulation to your middle-ware or front end, that would be better.  But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.
The Problem
phone
                      605-555-2862
                      (561)555-2700
                      904-555-5680
                      N/A
                      580-555-5371
                      2815558368
                      (254)555-8430
                      336-555-2797
                      3365557233
                      592-555-3181/4951
                      96615551222 Ext. 249
                      +44 7930 555271

Open in new window


Let’s say you have a column in a database that collects phone numbers from a web form and that form doesn’t have any JavaScript parsing the code into the three standard parts of a phone number (area code, prefix, and telroot).  This data is being dumped into a leads file and the manager from the sales department comes down and asks you to help split up the leads by area code.

The first thing you need to do in a situation like this is become aware of what kind of data you’re looking at.  Select out the first 100 or so rows and get an idea of what shape the data is in.  In this case you can see there are several types of phone numbers entered.

I’ve limited the example above to just a few of the types of numbers you might find if you allowed users free form access to storing their phone numbers.

3,3,4
Parenthesis, dashes, and 3,3,4
Blank, null, or N/A
Parenthesis, dashes, spaces, and 3,3,4
10 numbers
Extensions
International numbers

The Solution

So how can you deal with all these situations?  Use SUBSTRING.

SUBSTRING (string, startPosition, length)

Using this function you can check out the first three characters of each phone number.

SELECT TOP 100
                         phone, SUBSTRING(phone, 1, 3) AS [area code]
                      FROM sourceTable 

Open in new window


When looking at the results, you will see that this only shows you the first three characters of the phone number.  While the results appear to be accurate in some cases, you should immediately see the exception cases that this doesn’t handle.  This solution cannot handle cases with parenthesis, international numbers, NULL, blank, or invalid numbers.

To proceed on this problem you’ll need a few more tools in your SQL tool belt.  Check out other articles I’ve published on string manipulation.  There you will discover how REPLACE, LTRIM, RTRIM, and more techniques for manipulating text strings can help solve this problem.

References

http://msdn.microsoft.com/en-us/library/ms187748.aspx

¿
1
2,877 Views
Shannon_Lowder
CERTIFIED EXPERT

Comments (1)

CERTIFIED EXPERT

Author

Commented:
I was hoping to add a few more articles to build out a tool kit (substring, patindex, left, right, etc) then tie them all together to show how to handle poorly formatted inputs.  If I combine them all into one article, would that be too much information for a new tsql programmer to handle?  I didn't want to overrun them with information.

Thank you for the feedback so far... I am really looking forward to improving my training skills.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.