How do I remove punctuation to group by similar field values?



Hi Experts,

I thought this would be easy and would find an existing function to do this, but I'm having a hard time finding it.  Perhaps there is not an easy way to do what I need?  Anyway, I need to run a Select statement to group by a field:  Owner, and return the top 5 with the greatest value.  This would not be tough if all the owners were data entered the same way, but many are not.  For example, I need to group all of these together:

Sample Inc
Sample, Inc
Sample, Inc.
Sample,  Inc
Sample-Inc

I do not want to Include Sample1 Inc.

I tried to use Soundex to do this, but because many owner names may start off the same - they get the same Soundex value, but essentially they are different.  I then tried to do the same using a combination of soundex on the owner and billing address, but early analysis shows it won't work the way I need.

What I think would work +90% of the time with the data I have, is if I can group on the Owner Name with no punctuation or spaces, but I could not find a function to do this.  I found:  replace(Owner,',',''), but this will only replace a certain character - and I need to replace any characters that are not A-Z 0-9 with ''.  

I have come across some discussions that use user defined functions, or cursors that examine each character, but I am a novice with these things and don't really know how to use them, or understand if they could accomplish what I need to do.

I would really appreciate your advice on the most straightforward way to accomplish this task.

Thanks,
Jan
LVL 7
janmariniAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

HilaireCommented:
This function should do the trick

create function ufn_remove_punctuation(@strIn varchar(2000))
returns varchar(2000) as
begin
declare @strOut varchar(2000), @pos int
select @pos=1, @strOut=''
while patindex('%[0-9A-Z]%', @strIn) > 0
      select  @strOut      = @strOut + substring(@strIn, patindex('%[0-9A-Z]%', @strIn), 1),
            @pos      = patindex('%[0-9A-Z]%', @strIn),
            @strIn      = right(@strIn, datalength(@strIn)-@pos)
return  @strOut
end

-- don't omit the owner (dbo) prefix
select dbo.ufn_remove_punctuation('Sample, Inc') as t1, dbo.ufn_remove_punctuation('Sample-Inc') as t2

HTH

Hilaire
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
SoftEng007Commented:
Nice Hilaire!
I think I'll keep that one!  Thanks! :)
0
janmariniAuthor Commented:
Worked perfectly.  Thanks!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.