Solved

how to design and impelemnt a search algorithm

Posted on 2006-11-30
5
179 Views
Last Modified: 2013-12-04
I have a postgres database with one table storing:
street number
pre_dir (such as north, south, etc)
a field for a street name
post_dir
suffix (pulled from a list of options.  such as beach, blvd, street etc)
unit #
city
state
zip
borrower first name
and last name

i would like to write something that would catch possible dupes, and i was thinking of using levenshtein distance.  does anybody have any suggestions on what else can be done?
0
Comment
Question by:tansofun
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 18050540
You might want to start by normalizing the address, making sure any abbreviations are consistent
Then you might check it the address against the post office data base
http://www.usps.com/ncsc/addressservices/addressqualityservices/addresscorrection.htm
(I'm guessing based on state zip that these are USA addresses)
Do you want to catch typographical errors and misspellings?
0
 
LVL 2

Author Comment

by:tansofun
ID: 18050606
usps is a great suggestion.  it'll work most of the time;we also deal with addresses that are new and usps may not have them yet, oddly enough.

do you have any suggestions on typo's?  I don't really want to build any big n-gram tables and things of that nature.
0
 
LVL 84

Expert Comment

by:ozo
ID: 18050764
Levenshtein distance basically counts typos, so it could be useful.
A more sophisticated model may give higer probabilities to substititions of adjacent keys,
or letters that sound alike.
If your last names are European, soundex may be useful for matching them
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
wefewf 2 43
Replace a tag with sed 2 42
bunnyEars2 challenge 6 68
allStar challenge 1 41
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now