How do I remove punctuation to group by similar field values?

Posted on 2004-11-16
Last Modified: 2010-08-05

Hi Experts,

I thought this would be easy and would find an existing function to do this, but I'm having a hard time finding it.  Perhaps there is not an easy way to do what I need?  Anyway, I need to run a Select statement to group by a field:  Owner, and return the top 5 with the greatest value.  This would not be tough if all the owners were data entered the same way, but many are not.  For example, I need to group all of these together:

Sample Inc
Sample, Inc
Sample, Inc.
Sample,  Inc

I do not want to Include Sample1 Inc.

I tried to use Soundex to do this, but because many owner names may start off the same - they get the same Soundex value, but essentially they are different.  I then tried to do the same using a combination of soundex on the owner and billing address, but early analysis shows it won't work the way I need.

What I think would work +90% of the time with the data I have, is if I can group on the Owner Name with no punctuation or spaces, but I could not find a function to do this.  I found:  replace(Owner,',',''), but this will only replace a certain character - and I need to replace any characters that are not A-Z 0-9 with ''.  

I have come across some discussions that use user defined functions, or cursors that examine each character, but I am a novice with these things and don't really know how to use them, or understand if they could accomplish what I need to do.

I would really appreciate your advice on the most straightforward way to accomplish this task.

Question by:janmarini
    LVL 26

    Accepted Solution

    This function should do the trick

    create function ufn_remove_punctuation(@strIn varchar(2000))
    returns varchar(2000) as
    declare @strOut varchar(2000), @pos int
    select @pos=1, @strOut=''
    while patindex('%[0-9A-Z]%', @strIn) > 0
          select  @strOut      = @strOut + substring(@strIn, patindex('%[0-9A-Z]%', @strIn), 1),
                @pos      = patindex('%[0-9A-Z]%', @strIn),
                @strIn      = right(@strIn, datalength(@strIn)-@pos)
    return  @strOut

    -- don't omit the owner (dbo) prefix
    select dbo.ufn_remove_punctuation('Sample, Inc') as t1, dbo.ufn_remove_punctuation('Sample-Inc') as t2


    LVL 9

    Expert Comment

    Nice Hilaire!
    I think I'll keep that one!  Thanks! :)
    LVL 7

    Author Comment

    Worked perfectly.  Thanks!

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
    Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
    Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
    Viewers will learn how the fundamental information of how to create a table.

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    22 Experts available now in Live!

    Get 1:1 Help Now