Haven't really dived into it yet but I figured as many techniques as I can leverage here, the better, so I'm looking for concepts.
I've got to do a text to text match between lists of 3M and 50,000 records so clearly I want to automate it as much as possible. They are company names, and thus long, which I figure eliminates soundex as a tool. They are US so that excludes the international designations like UK, GMBH, SA, etc
Here are some of the techniques I figure will help:
Eliminate leading "The " and "A "
Convert Inc to Incorporated, Corp to Corporation, Ltd to limited, PC to PersonalCorporation
Convert & to "and"
Convert numbers to their english equivalents
Remove spaces
Remove punctuation
Flag matches
Compare first 4 chars. Remove any with no matches
then remove incorporated, limited, corporation and rematch
This was just my knee jerk list. Any other ideas out there?
Thanks
Start Free Trial