Solved

Using SOUNDEX to match individual words within Text fields

Posted on 2007-04-10
6
742 Views
Last Modified: 2008-01-09
I have a table which contains  a Text fields containing course information.

I am constructing a 'fuzzy search' stored procedure which will match words within this field based on what was entered into a web front end.

The part I am finding difficulty with is matching on mis-spelled words

I thought I could use SOUNDEX for this but it only appears to be finding matches where the first word of the Text field is the word I am looking for.

So, for example, my search value is 'compter course' (note it is mis-spelled)

There are records in my DB which have text field value of 'Computers for beginners' and 'An introduction to computers'

My SOUNDEX query partly does the job but only the first record is matched :(

Is there a way to use SOUNDEX across all words in the text field or is ther another approach I can look at?

I am using SQL 2005

Thanks
0
Comment
Question by:WNottsC
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 11

Expert Comment

by:Ved Prakash Agrawal
ID: 18881525
http://databases.about.com/od/development/l/aasoundex.htm

just try to look this may help you.. not sure it wil solve your problem or not..
I am looking your problem and give the feedback very soon...

0
 
LVL 143

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 200 total points
ID: 18881529
you cannot use soundex efficiently for this.
you might consider using full-text search for this kind of query.
look up
CONTAINS
CONTAINSTABLE
FREETEXTTABLE
in the books online.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18881547
<<you cannot use soundex efficiently for this.>>
Agreed.  I would also ask: why use soundex at all.
0
Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

 
LVL 50

Accepted Solution

by:
Lowfatspread earned 300 total points
ID: 18881639
you can use soundex for some limited "misspelled word"  problems but i agree with angelliii and racimo its usage will not be easy ...

you need to filter for noise words and build soundex values for each remaining word and compare those...

full text searching is probably a better solution these days...

however your problem really seems  to stem from allowing a free format input in the firstplace... if your are really trying to allow for searching a course catalogue
why is a simple cross reference / category selection not the simplest approach...

ie type the course to an agreed set of course key words   COMPUTING / BASIC / PROGRAMMING / THEORY / ....

then just let the userr select from the defined set of Keywords...

0
 

Author Comment

by:WNottsC
ID: 18881815
Thanks

My SP uses all the other methods mentioned above already, none of them produce a match on 'compting' except soundex.  I already do some tidying up of noise words etc as well, the SP is growing into a bit of a monster.

I agree the design of the DB & application is not perfect but its what I've been given to work with
0
 

Author Comment

by:WNottsC
ID: 18896180
Update

Fortunately the data used here is only refreshed every few weeks so Ive created an overnight which splits the words in the search field into separate fields in a new table, then a soundex match is performed on each field in turn.

Hopefully this will catch enough typos to let us escape creating a thesaurus which I really, really want to avoid
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Using rowversion for incremental load of datawarehouse. 12 36
rolling count by date, hour query 7 31
get count of orders by customer Sql Server table. 3 48
t-sql left join 2 34
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to shrink a transaction log file down to a reasonable size.

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question