Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Name variation matching

Posted on 2009-05-15
3
Medium Priority
?
499 Views
Last Modified: 2012-05-07
We have a number of jobs that perform matches across different tables. The initial match is based on the forename and surname of the person in question being equal.  If the forename and surname is the same and, a record is created in another table and further matching is performed to give a final match rating.

We have just purchased a database containing million of name variations.  These are in two tables (one for forenames and one for surnames) with the following columns:
"      Source_Name  This contains a list of Names that you can search on (E.g. Steve)
"      Name_Variation  This list the variation of the sources name. Each variation is on a new row. (I.e. there may be 2 rows with the source name Steve. One with the variation of Stephen and the other with  the variation Steven)
"      Score  This is a rating system with 100 being a close match and 75 being a loose match.

I am having trouble getting the matching script to return results in an efficient manner.  I believe that this is caused by the cross join that I have used in the below script.  Is there a more efficient way of doing it?

SELECT Distinct
       AVP.ID,
	   PER.PersonID,
	   1,
	   1
FROM PERSON P 
Cross JOIN C6..PERSON_SUBSET PER
WHERE P.Forename <> ''
       AND
       P.Surname <> ''
And (P.Forename = PER.EnglishForename  Or PER.PERSONID in (
			Select PersonID
			From C6..Person_Subset	
			Where EnglishForename In 
				(Select Name_Variation 
				from Names_Database..Forename   
				Where [Source_Name] = P.Forename and Score >= 95)))
AND (P.Surname = PER.EnglishSurname OR PER.PERSONID in (
			Select PersonID 
			From C6..Person_Subset
			Where EnglishSurname in
				(Select Name_Variation
				from Names_Database..Surname 
				Where [Source_Name] = P.Surname and Score >= 95)))

Open in new window

0
Comment
Question by:iis-it
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24401360
I've done my best to make sure the syntax is correct, but it's a little tough without actually being able to execute the query.  Something like the following should yield the same result set.
SELECT 
  P.ID, -- I changed AVP to P here... not sure what AVP was an alias to...?
  X.PersonID, 
  1, 
  1 
FROM 
  PERSON P 
  INNER JOIN (
    SELECT 
      PS.PersonID, 
      PS.EnglishForename, 
      PS.EnglishSurname, 
      F.Source_Name [AlternateForename], 
      S.Source_Name [AlternateSurname] 
    FROM   
      C6..PERSON_SUBSET PS 
      INNER JOIN Names_Database..Forename F ON 
        F.Name_Variation = PS.EnglishForename 
          AND F.Score >= 95 
      INNER JOIN Names_Database..Surname S ON 
        S.Name_Variation = PS.EnglishSurname 
          AND S.Score >= 95 
  ) X ON 
    LEN(P.Forename) != 0 
      AND LEN(P.Surname) != 0 
      AND (
        X.EnglishForename = P.Forename 
        OR X.AlternateForename = P.Forename 
      )
      AND (
        X.EnglishSurname = P.Surname 
        OR X.AlternateSurname = P.Surname 
      )
GROUP BY 
  P.ID, 
  X.PersonID 

Open in new window

0
 

Accepted Solution

by:
iis-it earned 0 total points
ID: 24412418
Thanks for your answer. A member of the SQL Server central forums (RBarryYoung) came up with the most efficient solution.
create proc spSurnameMatching_TempTAnswer
 
AS
--====== Create the 95% Forename table + all Person names
Create table #Forename95_U_P(
  
Source_Name varchar(20)
 
, Name_Variation varchar(20)
 
, Primary Key (Source_name, Name_Variation)
)
INSERT into #Forename95_U_P
 
Select Source_name, Name_variation
  
--From Names_Database..Forename
  
From Forename
  
Where Score >= 95
INSERT into #Forename95_U_P
 
Select distinct Forename, Forename
  
From PERSON P
  
Left Join #Forename95_U_P U
     
ON P.Forename = U.Source_name
        
And P.Forename = U.Name_variation
  
Where U.Source_name IS NULL
----====== Create the 95% Surname table + all Person names
Create table #Surname95_U_P(
   
Source_Name varchar(20)
 
, Name_Variation varchar(20)
 
, Primary Key (Source_name, Name_Variation)
)
INSERT into #Surname95_U_P
 
Select Source_name, Name_variation
  
--From Names_Database..Surname
  
From Surname
  
Where Score >= 95
INSERT into #Surname95_U_P
 
Select distinct Surname, Surname
  
From PERSON P
    
Left Join #Surname95_U_P U
      
ON P.Surname = U.Source_name
         
And P.Surname = U.Name_variation
  
Where U.Source_name IS NULL
--====== Find all Person-subsets that 95% match persons
;WITH ctePersToPersSubs as (
   
Select *
     
FROM PERSON P
    
INNER JOIN PERSON_SUBSET PER
      
ON PER.EnglishForename IN (
        
Select fn.Name_Variation
         
From #Forename95_U_P fn
         
Where fn.Source_Name = P.Forename
         
)
      
AND PER.EnglishSurname IN (
           
Select fn.Name_Variation
         
From #Surname95_U_P fn
          
Where fn.Source_Name = P.Surname
          
)
)
SELECT Distinct
   
P.ID,
   
P.PersonID,
   
1,
   
1
 
From ctePersToPersSubs P
 
WHERE P.Forename <> ''
  
AND  P.Surname <> ''
--====== clean-up temp tables
drop table #Forename95_U_P
drop table #Surname95_U_P
--==================

Open in new window

0
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24413276
I'm surprised that the temp table solution is faster since it is both reading and writing the records, where my example only reads...  you still might get better result from the query I provided by adding indexes...
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question