<

String Manipulation -- SUBSTRING

Published on
8,684 Points
2,584 Views
1 Endorsement
Last Modified:
Approved
Before I go into how to go altering strings in SQL, I would like to point out that you have to be careful when and how often you use the techniques below.  A good rule to keep in mind is text manipulation is slow and painful to a server.  If you can leave the text manipulation to your middle-ware or front end, that would be better.  But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.
The Problem
phone
605-555-2862
(561)555-2700
904-555-5680
N/A
580-555-5371
2815558368
(254)555-8430
336-555-2797
3365557233
592-555-3181/4951
96615551222 Ext. 249
+44 7930 555271

Open in new window


Let’s say you have a column in a database that collects phone numbers from a web form and that form doesn’t have any JavaScript parsing the code into the three standard parts of a phone number (area code, prefix, and telroot).  This data is being dumped into a leads file and the manager from the sales department comes down and asks you to help split up the leads by area code.

The first thing you need to do in a situation like this is become aware of what kind of data you’re looking at.  Select out the first 100 or so rows and get an idea of what shape the data is in.  In this case you can see there are several types of phone numbers entered.

I’ve limited the example above to just a few of the types of numbers you might find if you allowed users free form access to storing their phone numbers.

3,3,4
Parenthesis, dashes, and 3,3,4
Blank, null, or N/A
Parenthesis, dashes, spaces, and 3,3,4
10 numbers
Extensions
International numbers

The Solution

So how can you deal with all these situations?  Use SUBSTRING.

SUBSTRING (string, startPosition, length)

Using this function you can check out the first three characters of each phone number.

SELECT TOP 100
   phone, SUBSTRING(phone, 1, 3) AS [area code]
FROM sourceTable 

Open in new window


When looking at the results, you will see that this only shows you the first three characters of the phone number.  While the results appear to be accurate in some cases, you should immediately see the exception cases that this doesn’t handle.  This solution cannot handle cases with parenthesis, international numbers, NULL, blank, or invalid numbers.

To proceed on this problem you’ll need a few more tools in your SQL tool belt.  Check out other articles I’ve published on string manipulation.  There you will discover how REPLACE, LTRIM, RTRIM, and more techniques for manipulating text strings can help solve this problem.

References

http://msdn.microsoft.com/en-us/library/ms187748.aspx

¿
1
Comment
1 Comment
LVL 9

Author Comment

by:Shannon_Lowder
I was hoping to add a few more articles to build out a tool kit (substring, patindex, left, right, etc) then tie them all together to show how to handle poorly formatted inputs.  If I combine them all into one article, would that be too much information for a new tsql programmer to handle?  I didn't want to overrun them with information.

Thank you for the feedback so far... I am really looking forward to improving my training skills.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Join & Write a Comment

Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month