Solved

Using Soundex in SQL Server to match ONE word within string of MANY words.

Posted on 2009-04-10
6
1,452 Views
Last Modified: 2012-05-06
I need to build a query that will return records from a table where a text field, which often consists of a string of several words, contains within it a single word that is similar to some other supplied word.  I know that Soundex is limited, but it is probably good enough for my purposes.  Take for instance the following string:

"This string has several words"

The set of Soundex values for these five words looks like this:

"T200 S365 H200 S164 W632"

...but using the entire text string as the argument for the Soundex function returns only the value of the first word, T200, so I don't have a ready-made function to actually produce the above list of all five Soundex codes for searching.

Again, my goal is to return only those records where one or more words in the entire string contains the same Soundex value as the single word supplied for the search.  In this sample case, if supplied with the word "Haze", which has a Soundex value of H200, I would have a match because the third word, "has", is the same Soundex value.

I am working in SQL Server 2005.
0
Comment
Question by:AutomateMyOffice
  • 3
  • 3
6 Comments
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 24116094
please find the function dbo.ParmsToList on EE, which I use all the time.

for your needs, it could be used like this:


select value,  soundex(value)
  from dbo.parmstolist('This string has several words', ' ') l


in a function, to reconcat the values:

create function dbo.TextToSoundEx(@text varchar(max)) returns varchar(max)
as 
begin 
declare @res varchar(max)
select @res = coalesce(@res + ' ', '') + soundex(value) 
  from dbo.parmstolist('This string has several words', ' ') l 
return @res
end 

or in a "simple" query:

Open in new window

0
 

Author Comment

by:AutomateMyOffice
ID: 24117160
Thanks, angelIII, for your quick response.  I am admittedly a non-expert in SQL Server (much more proficient in Access and working my way up the SQL learning curve) so I have had to scratch my head over your answer a bit, especially since the code snippet seems to end prematurely with a colon.  

The first hurdle was in finding a version of ParmsToList that contained the second argument (delimiter).  There is another version floating around that only has a single argument, and that, of course, did not work very well.

I have been able to run ParmsToList, supplying a string of words as the argument, and getting a table in return, but this does not get me where I want to go.  I will need a little more handholding to get the actual results I am after.  Specifically, I need to start with a table that contains two columns, RowID and RowText, as shown in the code snippet (to retain spacing).  

I need to know if it is possible to construct a query (view), such that if I know I am looking for Soundex matches for the word "Haze", the query will return three rows, each having only the RowID field.  The values returned would be A, B and D, since those rows contain words (has, hose and he's) whose Soundex code (H200) matches that of Haze.

RowID    RowText

-----    -------

A        This string has several words

B        The garden hose is too short

C        This will not match

D        I think he's right

Open in new window

0
 

Author Comment

by:AutomateMyOffice
ID: 24119177
OK.  After a few hours of head scratching I did finally figure out how to do this using your two functions.  Your TextToSoundEx function needed to be modified such that the @text parameter took the place of the string 'This string has several words' as the first argument of ParmsToList, and with that in place, I was able to get the desired output using:

SELECT    RowID
FROM         MyTable
WHERE     (dbo.TextToSoundEx(RowText) LIKE '%' + SOUNDEX('haze') + '%')

It is pretty slow, taking about 15 seconds to return 42 records out of about 14,000, but it works.  Any help on streamlining this would be appreciated, but I am glad to at least have a working system.  Thanks!
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24119309
glad I could help.

for the efficiency part, I would create an additional column in the table, and on insert/update of the record, update the soundex value immediately ...
0
 

Author Comment

by:AutomateMyOffice
ID: 24120817
So you are suggesting a calculated third column in MyTable (say, RowSoundex?) that would contain the Soundex values for the words in RowText, storing "T200 S365 H200 S164 W632" in the same record as "This string has several words"?  I suppose I would add these values now for all existing records, just using an update query, and then create a trigger to manage the new and updated records in the future...  That will be my first trigger!  Any pointers on how to do that?  I'm sure it's covered in BOL, but maybe you have a link to a good beginner's tutorial or something?  Thanks again.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24121704
yes, exactly.

trigger is simply this:
create trigger trg_update_rowsoundindex

 on yourtable

 for insert, update

as

  update t  

    set RowSoundex = dbo.TextToSoundEx(i.RowText) 

   from yourtable t

   join inserted i

     on i.rowid = t.rowid

Open in new window

0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Suggested Solutions

Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now