[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 233
  • Last Modified:

How do I remove punctuation to group by similar field values?

Hi Experts,

I thought this would be easy and would find an existing function to do this, but I'm having a hard time finding it.  Perhaps there is not an easy way to do what I need?  Anyway, I need to run a Select statement to group by a field:  Owner, and return the top 5 with the greatest value.  This would not be tough if all the owners were data entered the same way, but many are not.  For example, I need to group all of these together:

Sample Inc
Sample, Inc
Sample, Inc.
Sample,  Inc

I do not want to Include Sample1 Inc.

I tried to use Soundex to do this, but because many owner names may start off the same - they get the same Soundex value, but essentially they are different.  I then tried to do the same using a combination of soundex on the owner and billing address, but early analysis shows it won't work the way I need.

What I think would work +90% of the time with the data I have, is if I can group on the Owner Name with no punctuation or spaces, but I could not find a function to do this.  I found:  replace(Owner,',',''), but this will only replace a certain character - and I need to replace any characters that are not A-Z 0-9 with ''.  

I have come across some discussions that use user defined functions, or cursors that examine each character, but I am a novice with these things and don't really know how to use them, or understand if they could accomplish what I need to do.

I would really appreciate your advice on the most straightforward way to accomplish this task.

1 Solution
This function should do the trick

create function ufn_remove_punctuation(@strIn varchar(2000))
returns varchar(2000) as
declare @strOut varchar(2000), @pos int
select @pos=1, @strOut=''
while patindex('%[0-9A-Z]%', @strIn) > 0
      select  @strOut      = @strOut + substring(@strIn, patindex('%[0-9A-Z]%', @strIn), 1),
            @pos      = patindex('%[0-9A-Z]%', @strIn),
            @strIn      = right(@strIn, datalength(@strIn)-@pos)
return  @strOut

-- don't omit the owner (dbo) prefix
select dbo.ufn_remove_punctuation('Sample, Inc') as t1, dbo.ufn_remove_punctuation('Sample-Inc') as t2


Nice Hilaire!
I think I'll keep that one!  Thanks! :)
janmariniAuthor Commented:
Worked perfectly.  Thanks!

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now