Solved

how to design and impelemnt a search algorithm

Posted on 2006-11-30
5
180 Views
Last Modified: 2013-12-04
I have a postgres database with one table storing:
street number
pre_dir (such as north, south, etc)
a field for a street name
post_dir
suffix (pulled from a list of options.  such as beach, blvd, street etc)
unit #
city
state
zip
borrower first name
and last name

i would like to write something that would catch possible dupes, and i was thinking of using levenshtein distance.  does anybody have any suggestions on what else can be done?
0
Comment
Question by:tansofun
  • 2
5 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 18050540
You might want to start by normalizing the address, making sure any abbreviations are consistent
Then you might check it the address against the post office data base
http://www.usps.com/ncsc/addressservices/addressqualityservices/addresscorrection.htm
(I'm guessing based on state zip that these are USA addresses)
Do you want to catch typographical errors and misspellings?
0
 
LVL 2

Author Comment

by:tansofun
ID: 18050606
usps is a great suggestion.  it'll work most of the time;we also deal with addresses that are new and usps may not have them yet, oddly enough.

do you have any suggestions on typo's?  I don't really want to build any big n-gram tables and things of that nature.
0
 
LVL 84

Expert Comment

by:ozo
ID: 18050764
Levenshtein distance basically counts typos, so it could be useful.
A more sophisticated model may give higer probabilities to substititions of adjacent keys,
or letters that sound alike.
If your last names are European, soundex may be useful for matching them
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This article will show, step by step, how to integrate R code into a R Sweave document
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question