"Fuzzy" string comparison
Posted on 1998-01-30
This query is more concerned with programming techniques than VB specifically.
I've recently had a large database dumped on me and job one is to clean up misspelt and miskeyed entries. For instance, the database contains a field called "PC Model" and within this field "Compaq Deskpro" has been variously entered as "Coqmap Deksrops" "Compack Desqproo", etc.
I have coded a number of sort routines to compare actual entries to ideal entries on the basis of "like" comparisons, number and order of consonants, etc.
This is working after a fashion but what I really need is a robust alogrithm/programming device to make sense of actual data as compared to ideal data. The ideal tool would have an output based on probabilities, e.g.
"Copmaq Diskrop" is most likely to be "Compaq Deskpro"... and least likely to be "IBM AT"
Does anyone know of any resource where I can review theories, flow-charts, metalanguage code, or anything that will allow me to code the kind of solution I am looking for?