Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

how to design and impelemnt a search algorithm

Posted on 2006-11-30
5
Medium Priority
?
187 Views
Last Modified: 2013-12-04
I have a postgres database with one table storing:
street number
pre_dir (such as north, south, etc)
a field for a street name
post_dir
suffix (pulled from a list of options.  such as beach, blvd, street etc)
unit #
city
state
zip
borrower first name
and last name

i would like to write something that would catch possible dupes, and i was thinking of using levenshtein distance.  does anybody have any suggestions on what else can be done?
0
Comment
Question by:tansofun
  • 2
5 Comments
 
LVL 85

Accepted Solution

by:
ozo earned 2000 total points
ID: 18050540
You might want to start by normalizing the address, making sure any abbreviations are consistent
Then you might check it the address against the post office data base
http://www.usps.com/ncsc/addressservices/addressqualityservices/addresscorrection.htm
(I'm guessing based on state zip that these are USA addresses)
Do you want to catch typographical errors and misspellings?
0
 
LVL 2

Author Comment

by:tansofun
ID: 18050606
usps is a great suggestion.  it'll work most of the time;we also deal with addresses that are new and usps may not have them yet, oddly enough.

do you have any suggestions on typo's?  I don't really want to build any big n-gram tables and things of that nature.
0
 
LVL 85

Expert Comment

by:ozo
ID: 18050764
Levenshtein distance basically counts typos, so it could be useful.
A more sophisticated model may give higer probabilities to substititions of adjacent keys,
or letters that sound alike.
If your last names are European, soundex may be useful for matching them
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
When you discover the power of the R programming language, you are going to wonder how you ever lived without it! Learn why the language merits a place in your programming arsenal.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Loops Section Overview

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question