Sounds like you have your work cut out. I have a similar issue (fortunately, mine's a little smaller subset of data, and my results will all be numeric).
What I've done, is create a new column for the correct value. Then I started cleaning up everything I could and putting the correct values in the new column. You may be able to write a job to parse out the numeric portions of the field to at least give you that. If there's another column that identifies the vendor, you can set up a regex in a separate table to match each vendor's part number pattern.
The second column I created is used by using "Isnull(newcolumn, oldcolumn)" in my queries, to give me the good number, and if it doesn't exist, then the dirty number. That along with training and strict constraints.
Main Topics
Browse All Topics





by: Allister_ReidPosted on 2009-10-30 at 20:25:35ID: 25708227
Hi,
What makes you think Soundex will help here? Do the part numbers sound like each other?
It doesn't sound like this is going to be an option with so many suppliers - and presumably a highly heterogenous sampling of part numbers... is the "P/N" or "part #" regular to all records?
Regular expression may be a possible solution.
However, there probably is going to be no easy answer here :S
Regards