asked on

SQL Query Challenge

SQL Server 2012 -

I have a challenging query... trying to best optimize it. Would love some help!

I have two tables (called BEFORE and AFTER) with the same structure; two columns, a GROUP_ID and a VALUE.
I need to compare the VALUES for each GROUP_ID and if they are DIFFERENT, add the GROUP_ID to another table.

So given the BEFORE and AFTER tables below.
GROUP ID = 1 is different between the two tables because value 11 and 12 are not in AFTER
GROUP ID = 2 is different between the two tables because value 23 is not in BEFORE
GROUP ID = 3 is the same in both tables

Therefore, the RESULT table should contain GROUP ID with 1 and 2

BEFORE
Group ID Value
1 10
1 11
1 12
1 13
2 21
2 22
2 24
2 25
3 30
3 31
3 32

AFTER
Group ID Value
1 10
1 13
2 21
2 22
2 23
2 24
2 25
3 30
3 31
3 32

RESULT table
GROUP ID (unique key, no duplicates)
1
2

SO basically, it would have to be something like this...

INSERT INTO RESULTS (GROUP_ID)
select distinct GROUP_ID
from BEFORE
( join ) AFTER ON B.GROUP_ID = A.GROUP_ID
and B.VALUE = A.VALUE

WHERE the BEFORE and AFTER tables do not contain EXACTLY the same VALUES

So to recap, I need to compare the values of each GROUP_ID
If any thing is in BEFORE and not in AFTER or visa versa, I need to put the GROUP_ID into RESULTS
If they are exactly the same, the GROUP_ID does not go into RESULTS.

What's the best way to do this??

ASKER CERTIFIED SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

gdemaria

ASKER

Thanks very much Paul!
This solution does work, so that's great!
Because my tables are big, it takes 45 seconds to run. I do have good indexes, so I was just wondering if you had an ideas for another format that might run faster?

What I'm thinking is that this needs to match every record and the reduce the record set with a distinct. Is there a way to just find a non-matching record and then stop processing using perhaps TOP 1 in a sub query or maybe with a CTE or something else?

Just wondering about improving performance; otherwise it's great!

PortletPaul

full outer joins are expensive
can you attach a .sqlplan file (execution plan) perhaps?

PortletPaul

You could try this alternative:

select
     group_id
from before b
where not exists (
  select null from after a
  where b.group_id = a.group_id and b.value = a.value 
  )
union
select
     group_id
from after a
where not exists (
  select null from before b
  where b.group_id = a.group_id and b.value = a.value 
  )

Open in new window

Scott Pletcher

I do have good indexes

How are the tables clustered? In particular, for this specific query, best would be clustered on ( GroupID, Value ).

gdemaria

ASKER

Hey Scott - thanks for joining. Right, clustered index on the group and then value

Scott Pletcher

Great, glad it helped.

Btw, if you have a non-clus index(es) that start with column "GroupID", you can get rid of it(them) now, because now the clus index starts with GroupID.