Solved

Find unmatched records between 2 datasets

Posted on 2015-01-27
19
171 Views
Last Modified: 2015-04-09
Hello,

Am having a 'memory is full' moment.

I have 2 datasets with the exact same columns and data format, but one contains more records than the other. I am attempting to identify the records that are present in one but not the other.

My data is an output from Alteryx, but have brought it into Access to attempt to resolve this.

There are 32 columns in both tables. First one is Compare1 with 26364 records, the second is Compare 2 with 26260 records.

There is not a primary key, nor any unique set of columns to do a comparison on.

Your assistance is greatly appreciated.
0
Comment
Question by:SASnewbie
  • 10
  • 3
  • 3
  • +2
19 Comments
 
LVL 18

Accepted Solution

by:
SimonAdept earned 500 total points
ID: 40573845
Do a left join on all of the 32 colums then a right join.

This assumes you have a column that doesn't allow nulls that you can use to check for existence of a row in the tables.

SELECT * FROM COMPARE1 C1 LEFT JOIN COMPARE2 C2 ON C1.COL1=C2.COL1 AND C1.COL2=C2.COL2 (etc)
WHERE C2.COL1 IS NULL.

then

SELECT * FROM COMPARE1 C1 RIGHT JOIN COMPARE2 C2 ON C1.COL1=C2.COL1 AND C1.COL2=C2.COL2 (etc)
WHERE C1.COL1 IS NULL.
0
 
LVL 119

Expert Comment

by:Rey Obrero
ID: 40573851
how many field does your table have?
0
 

Author Comment

by:SASnewbie
ID: 40573865
Hi and thank you for responding so quickly. There are 62 fields in each, not 32 as I listed before.
0
 

Author Comment

by:SASnewbie
ID: 40573871
Also, I do have NULLS
0
 
LVL 119

Expert Comment

by:Rey Obrero
ID: 40573878
can you pick / select (3 or more) the fields so that when joined together will comprise as a unique field?
0
 
LVL 75
ID: 40573882
"memory is full' moment."
LOL ... good one!

Humm .... maybe create a UNION query with UNION ALL ... which will include duplicates ... a and maybe from that you can deduce what is not duplicated, ie missing from the one recordset.

mx
0
 

Author Comment

by:SASnewbie
ID: 40573883
no.... but I am going to rename the fields col1-col62
0
 

Author Comment

by:SASnewbie
ID: 40573893
:) Hi mx. I'll try that. Thank you!
0
 

Author Comment

by:SASnewbie
ID: 40573901
mx, that returns the 26260 records. Worth a try!!

Sure seems like a method would be available by now to bump one table against another to find differences without having to identify unique fields.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:SASnewbie
ID: 40573914
Sorry mx, I should have mentioned that all of the 26260 records are contained in the dataset with 26364 records.
0
 
LVL 34

Expert Comment

by:PatHartman
ID: 40573926
If you don't want to specify the fields to match on, how would you expect the magic process to figure it out?

Start with a select distinct on each table to see if there are duplicates that will be filtered away.

If that doesn't solve the problem, then you can write VBA code if you don't want to create a query.  Without a unique identifier, you are reduced to comparing every column in set a to every column in set b.  Rey's suggestion to choose a few of the most likely candidates for uniqueness is also a good start.

What do you hope to accomplish by renaming all the columns?

I was just reminded that FMS offers a product that compares tables.  You might start with that.  

http://www.fmsinc.com/microsoftaccess/difference/properties-fields/tables.asp
0
 

Author Comment

by:SASnewbie
ID: 40573969
by the order of the names and order of fields
0
 

Author Comment

by:SASnewbie
ID: 40573976
renaming the columns shortens and simplifies the left join and then right join sql that Rey provided:
SELECT * FROM COMPARE1 C1 LEFT JOIN COMPARE2 C2
ON C1.Col1=C2.Col1
AND C1.Col2=C2.Col2
AND C1.Col3=C2.Col3
AND C1.Col4=C2.Col4
AND C1.Col5=C2.Col5
AND C1.Col6=C2.Col6
AND C1.Col7=C2.Col7
AND C1.Col8=C2.Col8
AND C1.Col9=C2.Col9
AND C1.Col10=C2.Col10
AND C1.Col11=C2.Col11
AND C1.Col12=C2.Col12
AND C1.Col13=C2.Col13
AND C1.Col14=C2.Col14
AND C1.Col15=C2.Col15
AND C1.Col16=C2.Col16
AND C1.Col17=C2.Col17
AND C1.Col18=C2.Col18
AND C1.Col19=C2.Col19
AND C1.Col20=C2.Col20
AND C1.Col21=C2.Col21
AND C1.Col22=C2.Col22
AND C1.Col23=C2.Col23
AND C1.Col24=C2.Col24
AND C1.Col25=C2.Col25
AND C1.Col26=C2.Col26
AND C1.Col27=C2.Col27
AND C1.Col28=C2.Col28
AND C1.Col29=C2.Col29
AND C1.Col30=C2.Col30
AND C1.Col31=C2.Col31
AND C1.Col32=C2.Col32
AND C1.Col33=C2.Col33
AND C1.Col34=C2.Col34
AND C1.Col35=C2.Col35
AND C1.Col36=C2.Col36
AND C1.Col37=C2.Col37
AND C1.Col38=C2.Col38
AND C1.Col39=C2.Col39
AND C1.Col40=C2.Col40
AND C1.Col41=C2.Col41
AND C1.Col42=C2.Col42
AND C1.Col43=C2.Col43
AND C1.Col44=C2.Col44
AND C1.Col45=C2.Col45
AND C1.Col46=C2.Col46
AND C1.Col47=C2.Col47
AND C1.Col48=C2.Col48
AND C1.Col49=C2.Col49
AND C1.Col50=C2.Col50
AND C1.Col51=C2.Col51
AND C1.Col52=C2.Col52
AND C1.Col53=C2.Col53
AND C1.Col54=C2.Col54
AND C1.Col55=C2.Col55
AND C1.Col56=C2.Col56
AND C1.Col57=C2.Col57
AND C1.Col58=C2.Col58
AND C1.Col59=C2.Col59
AND C1.Col60=C2.Col60
AND C1.Col61=C2.Col61
AND C1.Col62=C2.Col62

WHERE C2.Col1 IS NULL
0
 
LVL 75
ID: 40574046
"mx, that returns the 26260 r"
Did you use UNION ALL

Here is an example
Table aTblA (odd and even numbers)
F1
1
2
3
4
5
6
7
8
9

Table aTblA  (odd numbers only)
F1
1
3
5
7
9


Union SQL
SELECT aTblA.*
FROM aTblA
UNION ALL SELECT aTblB.*
FROM aTblB

ORDER BY 1

Returns

F1
1
1
2  **
3
3
4  **
5
5
6 **
7
7
8  **
9
9

Notice that only the Even Numbers are NOT duplicated, so this shows  you what is not in the other table.

Save the UNION query as some name ( query83 in my example)

Then create this query

SELECT Query83.F1, Count(Query83.F1) AS CountOfF1
FROM Query83
GROUP BY Query83.F1
HAVING (((Count(Query83.F1))=1))
ORDER BY Query83.F1;

Result is records NOT in the other table.

F1      CountOfF1
2      1
4      1
6      1
8      1
0
 
LVL 18

Expert Comment

by:SimonAdept
ID: 40574062
@SASNewbie, what result did you get from the left join on all columns?

You'd need to pick a column without nulls for this clause

WHERE C2.Col1 IS NULL

i.e. it could be C2.Col2 or C2.Col3...
0
 
LVL 75
ID: 40574086
Of course ... I forgot you have so MANY columns :-)
0
 

Author Comment

by:SASnewbie
ID: 40575033
Thanks everyone! I am working through your suggestions.

@mx I did use Union All, and am attempting your last example. Can you suggest how to determine the column to choose. I used Col1 and returned 4 records (which is a great start), but I know there are a total of 104 records that are different.

@simon, I used the date fieild, Col1, which contains no nulls.
0
 

Author Comment

by:SASnewbie
ID: 40575037
@mx, well it did return 4 the first time I ran it... hmm...
0
 
LVL 18

Assisted Solution

by:SimonAdept
SimonAdept earned 500 total points
ID: 40575661
@SAS, I tested this in case NULLs messed it up.

SELECT tbl1.*
FROM tbl1 LEFT JOIN tbl2 ON (tbl1.col5 = tbl2.col5) AND (tbl1.col4 = tbl2.col4) AND (tbl1.col1 = tbl2.col1) AND (tbl1.col2 = tbl2.col2) AND (tbl1.col3 = tbl2.col3)
WHERE (((tbl2.col1) Is Null));

I can confirm that it does return every row from tbl1 that is not exactly duplicated in tbl2.

One other thought though is that if there are a number of exact duplicate rows within each table the above query will not show that one of the tables had 4 copies of that row whilst the other table only had 1 copy of it.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
In Microsoft Access, learn how to “cascade” or have the displayed data of one combo control depend upon what’s entered in another. Base the dependent combo on a query for its row source: Add a reference to the first combo on the form as criteria i…
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now