Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Find unmatched records between 2 datasets

Posted on 2015-01-27
19
Medium Priority
?
201 Views
Last Modified: 2015-04-09
Hello,

Am having a 'memory is full' moment.

I have 2 datasets with the exact same columns and data format, but one contains more records than the other. I am attempting to identify the records that are present in one but not the other.

My data is an output from Alteryx, but have brought it into Access to attempt to resolve this.

There are 32 columns in both tables. First one is Compare1 with 26364 records, the second is Compare 2 with 26260 records.

There is not a primary key, nor any unique set of columns to do a comparison on.

Your assistance is greatly appreciated.
0
Comment
Question by:SASnewbie
  • 10
  • 3
  • 3
  • +2
19 Comments
 
LVL 18

Accepted Solution

by:
Simon earned 2000 total points
ID: 40573845
Do a left join on all of the 32 colums then a right join.

This assumes you have a column that doesn't allow nulls that you can use to check for existence of a row in the tables.

SELECT * FROM COMPARE1 C1 LEFT JOIN COMPARE2 C2 ON C1.COL1=C2.COL1 AND C1.COL2=C2.COL2 (etc)
WHERE C2.COL1 IS NULL.

then

SELECT * FROM COMPARE1 C1 RIGHT JOIN COMPARE2 C2 ON C1.COL1=C2.COL1 AND C1.COL2=C2.COL2 (etc)
WHERE C1.COL1 IS NULL.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 40573851
how many field does your table have?
0
 

Author Comment

by:SASnewbie
ID: 40573865
Hi and thank you for responding so quickly. There are 62 fields in each, not 32 as I listed before.
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 

Author Comment

by:SASnewbie
ID: 40573871
Also, I do have NULLS
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 40573878
can you pick / select (3 or more) the fields so that when joined together will comprise as a unique field?
0
 
LVL 75
ID: 40573882
"memory is full' moment."
LOL ... good one!

Humm .... maybe create a UNION query with UNION ALL ... which will include duplicates ... a and maybe from that you can deduce what is not duplicated, ie missing from the one recordset.

mx
0
 

Author Comment

by:SASnewbie
ID: 40573883
no.... but I am going to rename the fields col1-col62
0
 

Author Comment

by:SASnewbie
ID: 40573893
:) Hi mx. I'll try that. Thank you!
0
 

Author Comment

by:SASnewbie
ID: 40573901
mx, that returns the 26260 records. Worth a try!!

Sure seems like a method would be available by now to bump one table against another to find differences without having to identify unique fields.
0
 

Author Comment

by:SASnewbie
ID: 40573914
Sorry mx, I should have mentioned that all of the 26260 records are contained in the dataset with 26364 records.
0
 
LVL 40

Expert Comment

by:PatHartman
ID: 40573926
If you don't want to specify the fields to match on, how would you expect the magic process to figure it out?

Start with a select distinct on each table to see if there are duplicates that will be filtered away.

If that doesn't solve the problem, then you can write VBA code if you don't want to create a query.  Without a unique identifier, you are reduced to comparing every column in set a to every column in set b.  Rey's suggestion to choose a few of the most likely candidates for uniqueness is also a good start.

What do you hope to accomplish by renaming all the columns?

I was just reminded that FMS offers a product that compares tables.  You might start with that.  

http://www.fmsinc.com/microsoftaccess/difference/properties-fields/tables.asp
0
 

Author Comment

by:SASnewbie
ID: 40573969
by the order of the names and order of fields
0
 

Author Comment

by:SASnewbie
ID: 40573976
renaming the columns shortens and simplifies the left join and then right join sql that Rey provided:
SELECT * FROM COMPARE1 C1 LEFT JOIN COMPARE2 C2
ON C1.Col1=C2.Col1
AND C1.Col2=C2.Col2
AND C1.Col3=C2.Col3
AND C1.Col4=C2.Col4
AND C1.Col5=C2.Col5
AND C1.Col6=C2.Col6
AND C1.Col7=C2.Col7
AND C1.Col8=C2.Col8
AND C1.Col9=C2.Col9
AND C1.Col10=C2.Col10
AND C1.Col11=C2.Col11
AND C1.Col12=C2.Col12
AND C1.Col13=C2.Col13
AND C1.Col14=C2.Col14
AND C1.Col15=C2.Col15
AND C1.Col16=C2.Col16
AND C1.Col17=C2.Col17
AND C1.Col18=C2.Col18
AND C1.Col19=C2.Col19
AND C1.Col20=C2.Col20
AND C1.Col21=C2.Col21
AND C1.Col22=C2.Col22
AND C1.Col23=C2.Col23
AND C1.Col24=C2.Col24
AND C1.Col25=C2.Col25
AND C1.Col26=C2.Col26
AND C1.Col27=C2.Col27
AND C1.Col28=C2.Col28
AND C1.Col29=C2.Col29
AND C1.Col30=C2.Col30
AND C1.Col31=C2.Col31
AND C1.Col32=C2.Col32
AND C1.Col33=C2.Col33
AND C1.Col34=C2.Col34
AND C1.Col35=C2.Col35
AND C1.Col36=C2.Col36
AND C1.Col37=C2.Col37
AND C1.Col38=C2.Col38
AND C1.Col39=C2.Col39
AND C1.Col40=C2.Col40
AND C1.Col41=C2.Col41
AND C1.Col42=C2.Col42
AND C1.Col43=C2.Col43
AND C1.Col44=C2.Col44
AND C1.Col45=C2.Col45
AND C1.Col46=C2.Col46
AND C1.Col47=C2.Col47
AND C1.Col48=C2.Col48
AND C1.Col49=C2.Col49
AND C1.Col50=C2.Col50
AND C1.Col51=C2.Col51
AND C1.Col52=C2.Col52
AND C1.Col53=C2.Col53
AND C1.Col54=C2.Col54
AND C1.Col55=C2.Col55
AND C1.Col56=C2.Col56
AND C1.Col57=C2.Col57
AND C1.Col58=C2.Col58
AND C1.Col59=C2.Col59
AND C1.Col60=C2.Col60
AND C1.Col61=C2.Col61
AND C1.Col62=C2.Col62

WHERE C2.Col1 IS NULL
0
 
LVL 75
ID: 40574046
"mx, that returns the 26260 r"
Did you use UNION ALL

Here is an example
Table aTblA (odd and even numbers)
F1
1
2
3
4
5
6
7
8
9

Table aTblA  (odd numbers only)
F1
1
3
5
7
9


Union SQL
SELECT aTblA.*
FROM aTblA
UNION ALL SELECT aTblB.*
FROM aTblB

ORDER BY 1

Returns

F1
1
1
2  **
3
3
4  **
5
5
6 **
7
7
8  **
9
9

Notice that only the Even Numbers are NOT duplicated, so this shows  you what is not in the other table.

Save the UNION query as some name ( query83 in my example)

Then create this query

SELECT Query83.F1, Count(Query83.F1) AS CountOfF1
FROM Query83
GROUP BY Query83.F1
HAVING (((Count(Query83.F1))=1))
ORDER BY Query83.F1;

Result is records NOT in the other table.

F1      CountOfF1
2      1
4      1
6      1
8      1
0
 
LVL 18

Expert Comment

by:Simon
ID: 40574062
@SASNewbie, what result did you get from the left join on all columns?

You'd need to pick a column without nulls for this clause

WHERE C2.Col1 IS NULL

i.e. it could be C2.Col2 or C2.Col3...
0
 
LVL 75
ID: 40574086
Of course ... I forgot you have so MANY columns :-)
0
 

Author Comment

by:SASnewbie
ID: 40575033
Thanks everyone! I am working through your suggestions.

@mx I did use Union All, and am attempting your last example. Can you suggest how to determine the column to choose. I used Col1 and returned 4 records (which is a great start), but I know there are a total of 104 records that are different.

@simon, I used the date fieild, Col1, which contains no nulls.
0
 

Author Comment

by:SASnewbie
ID: 40575037
@mx, well it did return 4 the first time I ran it... hmm...
0
 
LVL 18

Assisted Solution

by:Simon
Simon earned 2000 total points
ID: 40575661
@SAS, I tested this in case NULLs messed it up.

SELECT tbl1.*
FROM tbl1 LEFT JOIN tbl2 ON (tbl1.col5 = tbl2.col5) AND (tbl1.col4 = tbl2.col4) AND (tbl1.col1 = tbl2.col1) AND (tbl1.col2 = tbl2.col2) AND (tbl1.col3 = tbl2.col3)
WHERE (((tbl2.col1) Is Null));

I can confirm that it does return every row from tbl1 that is not exactly duplicated in tbl2.

One other thought though is that if there are a number of exact duplicate rows within each table the above query will not show that one of the tables had 4 copies of that row whilst the other table only had 1 copy of it.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you need a simple but flexible process for maintaining an audit trail of who created, edited, or deleted data from a table, or multiple tables, and you can do all of your work from within a form, this simple Audit Log will work for you.
Recursive SQL is one of the most fascinating and powerful and yet dangerous feature offered in many modern databases today using a Common Table Expression (CTE) first introduced in the ANSI SQL 99 standard. The first implementations of CTE began ap…
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question