Best Approach to Comparing Address With No Matching Account Numbers Using Temp Tables

I am trying to do a comparison to see if the same customers are in 2 tables for different sales databases. In most cases, there is a matching account number in both tables, if the number has been entered properly or not null.

When there is no matching account number I need to look at the address for similarities.

Right now I am using CTE's but I am not getting the correct results to match by address similarity when there are no account number matches.

Would it be better to use temp tables and if so how?

Here is my code:

--first cte pulls in all not null account and reformat account numbers to match formatting of account numbers in 2nd table 

WITH Table1_Reformat_AccountNum_CTE
AS
(
  Select 
		  t1.account_number
		 ,t1.account_name
		 ,t1.address_1 
		 ,t1.address_2
         ,t1.city
		 ,t1.state
         ,t1.zip
		 ,CASE
			WHEN t1.account_number LIKE '%xyz-%' THEN LTRIM(RTRIM(RIGHT(t1.account_number, LEN(t1.account_number)-9)))
			WHEN t1.account_number LIKE 'qrs:% -lmo' THEN REPLACE(LTRIM(RTRIM(RIGHT(t1.account_number, LEN(t1.account_number)-5))), '-lmo', '')
			ELSE REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RIGHT(t1.account_number, LEN(t1.account_number)-5), '-', ''), ' ', ''), '.', ''), ',', ''), 'out', ''), '#', ''), '0ut', ''), '''', ''), '_', ''), '*', ''),'<',''),'>',''),'^','')
			END AS account_num_formatted
  From [database_1].[dbo].[sales_table_1] t1
  WHERE t1.account_number IS NOT NULL
       
         
)
-- 2nd cte matched reformatted account numbers to account numbers in table 2
,Table2_AccountNum_Matches_CTE
AS
(
SELECT 
          t1.account_num_formatted 
         ,t2.account_number
		 ,t2.account_name
		 ,t2.address_1 
		 ,t2.address_2
         ,t2.city
		 ,t2.state
         ,t2.zip
     
FROM Table1_Reformat_AccountNum_CTE ref
INNER JOIN [database_2].[dbo].[sales_table_2] t2
ON t2.account_number = t1.account_num_formatted 
)
-- 3rd cte finds all left over accounts in table 1 that have no match in table 2

,Table1_Non_Matches_Cte
AS
(
SELECT 
		  t1.account_num_formatted
		 ,t1.account_name
		 ,t1.address_1 
		 ,t1.address_2
         ,t1.city
		 ,t1.state
         ,t1.zip 
	FROM Table1_Reformat_AccountNum_CTE ref
	WHERE t1.account_num_formatted NOT IN (SELECT acm.account_num_formatted FROM Table2_AccountNum_Matches_CTE acm)
)
-- 3rd cte finds all left over accounts in table 2 that have no match in table 1

,Table2_Non_Matches_Cte
AS
(
SELECT  
          t2.account_number
		 ,t2.account_name
		 ,t2.address_1 
		 ,t2.address_2
         ,t2.city
		 ,t2.state
         ,t2.zip
	FROM [database_2].[dbo].[sales_table_2] t2
    WHERE t2.account_number NOT IN (SELECT acm.account_num_formatted FROM Table2_AccountNum_Matches_CTE acm)
)

-- attemtps to find all accounts in table 2 that have no match in account number in table 1 but match on address 
SELECT 
          t1.account_num_formatted
		 ,t1.account_name
		 ,t1.address_1 
		 ,t1.address_2
         ,t1.city
		 ,t1.state
         ,t1.zip 
	     ,t2.account_number
		 ,t2.account_name
		 ,t2.address_1 
		 ,t2.address_2
         ,t2.city
		 ,t2.state
         ,t2.zip
    FROM Table1_Reformat_AccountNum_CTE ref
	FULL OUTER JOIN [database_2].[dbo].[sales_table_2] t2
	ON t1.account_num_formatted = t2.account_number
	WHERE t2.account_number IN (SELECT  non.account_number FROM Table2_Non_Matches_Cte non )
	AND t1.address_1  LIKE '%'+ t2.address_1 +'%'
	AND t1.city  LIKE '%'+ t2.city+'%'
	AND t2.state LIKE '%'+ t2.state+'%'
	AND t2.zip  = '%'+t2.zip+'%'

Open in new window


 Sample Data:

Table 1
——————
Acc | Name | Add 1| Add 2 | City | State | Zip
————————————————————-
1       store1   113 st lane    mulberry tx   12344
Null   store2   114 st lane    mulberry tx   34254
6       store3   113 st lane    mulberry tx   98764
7       store4   115 st lane    mulberry tx   74631
8.1    store5   116 st lane    mulberry tx   12347

Open in new window


Table 2 

——————
Acc | Name | Add 1| Add 2 | City | State | Zip
————————————————————-
1       store1   113 st lane   mulberry tx   12344
Null   store2   114 st lane   mulberry tx   34254
9       store8   213 ha ave   juniper    ct   46738
5      store9   217 wa ave  juniper    ct  98376
8     store5   116 st lane    mulberry    tx   12347

Open in new window


Expected Results:
Acc | Name | Add 1| Add 2 | City | State | Zip
————————————————————-
1        store1   113 st lane    mulberry tx   12344
Null   store2   114 st lane    mulberry tx   34254
8       store5   116 st lane   mulberry tx   12347

Open in new window

TechIsaacIT Contracting Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Doug BishopDatabase DeveloperCommented:
I have a question in regard to your CASE expression for account_num_formatted, with respect to the second WHEN and the ELSE.
The first thing you are doing (innermost ()), is taking the rightmost 5 characters, THEN doing the replacement. Is this correct, or do you want to do the replacement first, then take the rightmost 5 characters?

or example (and I don't know the structure of your account numbers, so this may not even apply, but something to consider), the account_numer '<_345- #.^out' would return an empty string. Make sure your REPLACE() functions are performing as desired. Since we do't have both before and after examples of account numbers (from both tables) it is hard to say.

Also, be careful of using NOT IN () when there might be NULL values in the subquery (see here). If you use NOT IN () filter a WHERE clause:
WHERE t1.account_num_formatted NOT IN (SELECT acm.account_num_formatted FROM Table2_AccountNum_Matches_CTE acm WHERE acm.account_num_formatted IS NOT NULL)

Open in new window

TechIsaacIT Contracting Author Commented:
@Doug Bishop

Thank you, I will update my subquery with a 'IS NOT NULL'  in the where clause.

In regard to the case statement...

All of the account numbers have a prefix comprised of 3 characters a colon and a space which is 5 characters in total. For example: 'XYX: '.  I want to remove that prefix on every account number, then I remove any other random characters due to data entry errors.
Doug BishopDatabase DeveloperCommented:
btw, you show sample data and expected results, but what results are you actually getting? I'm working on an issue, so might be awhile before I can get back to you, so maybe someone else will jump in. It would help to have some full sample data (e.g. including original account_number from sales_table_1 before you cleanse it).
Price Your IT Services for Profit

Managed service contracts are great - when they're making you money. Yes, you’re getting paid monthly, but is it actually profitable? Learn to calculate your hourly overhead burden so you can master your IT services pricing strategy.

Doug BishopDatabase DeveloperCommented:
Also, some inconsistencies I've found in your code:
You eliminate NULL account_numbers from sales_table_1 in your first CTE, so you would never match on address any accounts in that table that don't have an account number (if that is a possiblity).
You've got a number of alias names of ref on a previous CTE that should be either t1 or t2. Example:
...
FROM Table1_Reformat_AccountNum_CTE ref
	FULL OUTER JOIN [database_2].[dbo].[sales_table_2] t2
	ON t1.account_num_formatted = t2.account_number

Open in new window

Either the alias needs to be t1 or the JOIN predicate needs to use ref as the alias.
Doug BishopDatabase DeveloperCommented:
Try this. After you make the above alias changes, change your final SELECT. Your JOIN predicate has
ON t1.account_num_formatted = t2.account_number
but you are doing this matching because there are no matching account numbers.
SELECT 
          t1.account_num_formatted
		 ,t1.account_name
		 ,t1.address_1 
		 ,t1.address_2
         ,t1.city
		 ,t1.state
         ,t1.zip 
	     ,t2.account_number
		 ,t2.account_name
		 ,t2.address_1 
		 ,t2.address_2
         ,t2.city
		 ,t2.state
         ,t2.zip
    FROM Table1_Reformat_AccountNum_CTE t1
	FULL OUTER JOIN #t2 t2
	ON t1.address_1  LIKE '%'+ t2.address_1 +'%'
	AND t1.city  LIKE '%'+ t2.city+'%'
	AND t2.state LIKE '%'+ t2.state+'%'
	AND t2.zip  = '%'+t2.zip+'%'
	WHERE t2.account_number IN (SELECT  non.account_number FROM Table2_Non_Matches_Cte non )

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Doug BishopDatabase DeveloperCommented:
Also, unless there is an absolute reason for have LIKE in the JOIN predicates (e.g. AND t1.city  LIKE '%'+ t2.city+'%), remove them and make them AND t1.city  = t2.city. I doubt if you've got the address columns covered in an index anyway, but you are making SQL Server do a whole lot more work in matching addresses.
TechIsaacIT Contracting Author Commented:
Thanks, I will implement the suggestions you gave and follow up. I eliminated nulls in the first query to trim down the record set because it's around 5 million or more at times.

I was thinking I could match the non-null accounts in one pass then load the null account numbers in a CTE to match on another pass.
TechIsaacIT Contracting Author Commented:
Thanks this worked out. It looks like I may have to use temp tables instead because I want to merge the results from each CTE.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.