Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

how to design and impelemnt a search algorithm

Posted on 2006-11-30
5
Medium Priority
?
184 Views
Last Modified: 2013-12-04
I have a postgres database with one table storing:
street number
pre_dir (such as north, south, etc)
a field for a street name
post_dir
suffix (pulled from a list of options.  such as beach, blvd, street etc)
unit #
city
state
zip
borrower first name
and last name

i would like to write something that would catch possible dupes, and i was thinking of using levenshtein distance.  does anybody have any suggestions on what else can be done?
0
Comment
Question by:tansofun
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 18050540
You might want to start by normalizing the address, making sure any abbreviations are consistent
Then you might check it the address against the post office data base
http://www.usps.com/ncsc/addressservices/addressqualityservices/addresscorrection.htm
(I'm guessing based on state zip that these are USA addresses)
Do you want to catch typographical errors and misspellings?
0
 
LVL 2

Author Comment

by:tansofun
ID: 18050606
usps is a great suggestion.  it'll work most of the time;we also deal with addresses that are new and usps may not have them yet, oddly enough.

do you have any suggestions on typo's?  I don't really want to build any big n-gram tables and things of that nature.
0
 
LVL 84

Expert Comment

by:ozo
ID: 18050764
Levenshtein distance basically counts typos, so it could be useful.
A more sophisticated model may give higer probabilities to substititions of adjacent keys,
or letters that sound alike.
If your last names are European, soundex may be useful for matching them
0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Progress

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question