Rating a Text Match in SQL

Steve Krile
Steve Krile used Ask the Experts™
on
Hey experts.  I'm working on a project that will consume a CSV.  I need to match the records in the CSV with a record in my Employee table.  I am using a "key" (provided by the client) to do the matching.  What I would like to do is, once I find the Key, inspect the First and Last name fields in the db and compare them to the first and last name fed from the csv.  I want a bit more than exact matches (or LIKE) that would return only things that match exactly.  I want the comparison to be be a bit flexible and then return a "confidence" index of how close the two match.  I've done this as a binary "It Matches" or "It doesn't match" approach like this:

DECLARE @FirstName nvarchar(50)
DECLARE @Lastname nvarchar(50)
DECLARE @EmployeeNumber nvarchar(50)

SET @Firstname = 'Joe'
SET @LastName = 'User'
SET @EmployeeNumber = '01343'

SELECT
      E.EmployeeID,
      E.LastName,
      E.FirstName,
      CASE WHEN E.Lastname = @LastName then 1
      ELSE 0
      END as [LastNameConfidence],
      CASE WHEN E.Firstname = @Firstname then 1
      ELSE 0
      END As [FirstNameConfidence]
FROM
      Employee as E
WHERE
      E.EmployeeNumber like @EmployeeNumber


Is there any way I can replace those last two columns of my select with a more grainular inspection of the text?  I don't need to search all records in the table (I am going to only do this inspection if the EmployeeNumber matches exactly).  I just want to know how close the names match.

For instance, if the CSV contained Joseph User rather than Joe User I want to be able to see that while not exact, the first name is pretty close (containing two consecutive letters to the target record in succession).
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Senior Software Developer
Commented:
You may want to look at the SOUNDEX() and DIFFERENCE() function ...

The snippet below returns ...

J000      J500      J210      2      2      3      3      3      3

The 2 suggests a not so good match.
The 3 suggests a good match.

The range is from 0 to 4.

SOUNDEX() : http://msdn.microsoft.com/en-us/library/ms187384.aspx
DIFFERENCE() : http://msdn.microsoft.com/en-us/library/ms188753.aspx
SELECT
	SOUNDEX('Joe User'),
	SOUNDEX('John User'),
	SOUNDEX('Joseph User'),
 
	-- These return a 2 - note that the order is important.
	DIFFERENCE('Joseph User','John User'),
	DIFFERENCE('Joseph User','Joe User'),
 
	-- These return a 3 - note that the order is important.
	DIFFERENCE('John User','Joseph User'),
	DIFFERENCE('Joe User','John User'),
	DIFFERENCE('John User','Joe User'),
	DIFFERENCE('Joe User','Joseph User')

Open in new window

Author

Commented:
Sweet....that's what I was looking for.  Here is what I will be using:

SELECT
      E.EmployeeNumber,
      E.Firstname,
      E.Lastname,
      Difference(E.Firstname, @FirstName) as FirstNameCompare,
      DIFFERENCE(E.LastName, @LastName) as LastNameCompare
FROM
      Employee as E
WHERE
      E.EmployeeNumber = @EmployeeNumber




---------EXAMPLE MATCH---------------
SET @Firstname = 'Steve'
SET @LastName = 'Krile'
SET @EmployeeNumber = '016781'

EmployeeNumber  Firstname  Lastname  FirstNameCompare LastNameCompare
016781                  Steve        Krile           4                             4



--------------EXAMPLE CLOSE MATCH----------------
SET @Firstname = 'Steven'
SET @LastName = 'Krile'
SET @EmployeeNumber = '016781'

EmployeeNumber  Firstname  Lastname  FirstNameCompare LastNameCompare
016781                  Steve        Krile           3                             4



----------------EXAMPLE NON MATCH--------------------------
SET @Firstname = 'Alex'
SET @LastName = 'Krile'
SET @EmployeeNumber = '016781'

EmployeeNumber  Firstname  Lastname  FirstNameCompare LastNameCompare
016781                  Steve        Krile           1                             4
Richard QuadlingSenior Software Developer

Commented:
Excellent. I've never used these myself. I needed to verify codes which are entered from a fax system.

The codes represent date/time/page#.

5512VTOP
559YIWB8
55IU5ZPR
55RPSZZ3
560LG07I
569H30GU
56ICQ1HO
56R8D2IR
570405XY
578ZN66E
57HVA6FL
57QQX6O1
57ZMK6XC
588I775S
58HDU87M
58Q9H8G2
58Z548PF
5980R8XV
59GWE96Z
59PS19GB
59YNO9OQ
5A7JB9Y2
5AGEYCJO
5APALCT0
5AY68D1H
5B71VDAT
5BFXIF40
5BOT5FDC
5BXOSFLT
5C6KFFV5
5CFG2GW3
5COBPHXW
5CX7CI6D
5D62ZIFP
5DEYMIOA
5DNU9IXM
5DWPWJ63
5E5LJJFF
5EEH6KG9
5ENCTLI2
5EW8GLQJ
5F543LZV
5FDZQM8C
5FMVDMHO
5FVR0PW0
5G4MNQ4G
5GDIAQDV
5GMDXQMB
5GV9KQVM
5H457R42
5HD0URD9
5HLWHRML
5HUS4RV0
5I3NRS4C
5ICJESCV

But they mistype the 1/I/l, 0/O, 2Z, 3E, 4A, 5S, 6G, 8B, 9Pg, etc.

Or double type something or just miss a letter out!

Users!

Author

Commented:
Actually, I've ammended my select statement to be something like this now - call it my High Confidence return:

SELECT
      E.EmployeeID,
      E.EmployeeNumber,
      E.Firstname,
      E.Lastname,
      Difference(E.Firstname, @FirstName) as FirstNameCompare,
      DIFFERENCE(E.LastName, @LastName) as LastNameCompare
FROM
      Employee as E
      INNER JOIN Facility as F on e.FacilityID = F.FacilityID
WHERE
      F.FacilityID = @FacilityID
      and (Difference(E.Firstname, @FirstName) > 2
      or DIFFERENCE(E.LastName, @LastName) > 2)
      and E.EmployeeNumber = @EmployeeNumber

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial