Solved

Name variation matching

Posted on 2009-05-15
3
485 Views
Last Modified: 2012-05-07
We have a number of jobs that perform matches across different tables. The initial match is based on the forename and surname of the person in question being equal.  If the forename and surname is the same and, a record is created in another table and further matching is performed to give a final match rating.

We have just purchased a database containing million of name variations.  These are in two tables (one for forenames and one for surnames) with the following columns:
"      Source_Name  This contains a list of Names that you can search on (E.g. Steve)
"      Name_Variation  This list the variation of the sources name. Each variation is on a new row. (I.e. there may be 2 rows with the source name Steve. One with the variation of Stephen and the other with  the variation Steven)
"      Score  This is a rating system with 100 being a close match and 75 being a loose match.

I am having trouble getting the matching script to return results in an efficient manner.  I believe that this is caused by the cross join that I have used in the below script.  Is there a more efficient way of doing it?

SELECT Distinct

       AVP.ID,

	   PER.PersonID,

	   1,

	   1

FROM PERSON P 

Cross JOIN C6..PERSON_SUBSET PER

WHERE P.Forename <> ''

       AND

       P.Surname <> ''

And (P.Forename = PER.EnglishForename  Or PER.PERSONID in (

			Select PersonID

			From C6..Person_Subset	

			Where EnglishForename In 

				(Select Name_Variation 

				from Names_Database..Forename   

				Where [Source_Name] = P.Forename and Score >= 95)))

AND (P.Surname = PER.EnglishSurname OR PER.PERSONID in (

			Select PersonID 

			From C6..Person_Subset

			Where EnglishSurname in

				(Select Name_Variation

				from Names_Database..Surname 

				Where [Source_Name] = P.Surname and Score >= 95)))

Open in new window

0
Comment
Question by:iis-it
  • 2
3 Comments
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24401360
I've done my best to make sure the syntax is correct, but it's a little tough without actually being able to execute the query.  Something like the following should yield the same result set.
SELECT 

  P.ID, -- I changed AVP to P here... not sure what AVP was an alias to...?

  X.PersonID, 

  1, 

  1 

FROM 

  PERSON P 

  INNER JOIN (

    SELECT 

      PS.PersonID, 

      PS.EnglishForename, 

      PS.EnglishSurname, 

      F.Source_Name [AlternateForename], 

      S.Source_Name [AlternateSurname] 

    FROM   

      C6..PERSON_SUBSET PS 

      INNER JOIN Names_Database..Forename F ON 

        F.Name_Variation = PS.EnglishForename 

          AND F.Score >= 95 

      INNER JOIN Names_Database..Surname S ON 

        S.Name_Variation = PS.EnglishSurname 

          AND S.Score >= 95 

  ) X ON 

    LEN(P.Forename) != 0 

      AND LEN(P.Surname) != 0 

      AND (

        X.EnglishForename = P.Forename 

        OR X.AlternateForename = P.Forename 

      )

      AND (

        X.EnglishSurname = P.Surname 

        OR X.AlternateSurname = P.Surname 

      )

GROUP BY 

  P.ID, 

  X.PersonID 

Open in new window

0
 

Accepted Solution

by:
iis-it earned 0 total points
ID: 24412418
Thanks for your answer. A member of the SQL Server central forums (RBarryYoung) came up with the most efficient solution.
create proc spSurnameMatching_TempTAnswer

 

AS

--====== Create the 95% Forename table + all Person names

Create table #Forename95_U_P(

  

Source_Name varchar(20)

 

, Name_Variation varchar(20)

 

, Primary Key (Source_name, Name_Variation)

)

INSERT into #Forename95_U_P

 

Select Source_name, Name_variation

  

--From Names_Database..Forename

  

From Forename

  

Where Score >= 95

INSERT into #Forename95_U_P

 

Select distinct Forename, Forename

  

From PERSON P

  

Left Join #Forename95_U_P U

     

ON P.Forename = U.Source_name

        

And P.Forename = U.Name_variation

  

Where U.Source_name IS NULL

----====== Create the 95% Surname table + all Person names

Create table #Surname95_U_P(

   

Source_Name varchar(20)

 

, Name_Variation varchar(20)

 

, Primary Key (Source_name, Name_Variation)

)

INSERT into #Surname95_U_P

 

Select Source_name, Name_variation

  

--From Names_Database..Surname

  

From Surname

  

Where Score >= 95

INSERT into #Surname95_U_P

 

Select distinct Surname, Surname

  

From PERSON P

    

Left Join #Surname95_U_P U

      

ON P.Surname = U.Source_name

         

And P.Surname = U.Name_variation

  

Where U.Source_name IS NULL

--====== Find all Person-subsets that 95% match persons

;WITH ctePersToPersSubs as (

   

Select *

     

FROM PERSON P

    

INNER JOIN PERSON_SUBSET PER

      

ON PER.EnglishForename IN (

        

Select fn.Name_Variation

         

From #Forename95_U_P fn

         

Where fn.Source_Name = P.Forename

         

)

      

AND PER.EnglishSurname IN (

           

Select fn.Name_Variation

         

From #Surname95_U_P fn

          

Where fn.Source_Name = P.Surname

          

)

)

SELECT Distinct

   

P.ID,

   

P.PersonID,

   

1,

   

1

 

From ctePersToPersSubs P

 

WHERE P.Forename <> ''

  

AND  P.Surname <> ''

--====== clean-up temp tables

drop table #Forename95_U_P

drop table #Surname95_U_P

--==================

Open in new window

0
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24413276
I'm surprised that the temp table solution is faster since it is both reading and writing the records, where my example only reads...  you still might get better result from the query I provided by adding indexes...
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now