Solved

Large Marking task

Posted on 2011-02-18
6
229 Views
Last Modified: 2012-05-11
Hello experts,

I am comparing two tables that contain many records that have the same information but slightly different values in most cases.  In order to work out what is similar and what is not similar I have a script that compares the data in eight columns reports out the match type then deletes the values.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #EXACT
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 = b.col8

delete from mytable1 where col1 in (select a1 from #EXACT)
delete from mytable2 where col1 in (select b1 from #EXACT)

What I then need to do is go through each degree of potential difference between the remaining rows. so work out where there is a difference in col8 where everything else is an match, col7 where everything else is a match and so on (except for col1 as this is always the same).  Then I need to match where all pairs of columns, don't match, all groups of three columns, all groups of fours, fives, sixes don't match

What I want is a stored procedure or a function to populate the matching criteria automatically rather than write out over 100 select statements into #DIFFS.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #Diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 <> b.col8

Insert into #diffs ( a1, b1, a2, b2, a3, b3, a4, b4, a5, b5, a6, b6, a7, b7, a.8, b8)
select a.col1, b.col1, a.col2, b.col2, a.col3, b.col3, a.col4, b.col4, a.col5, b.col5, a.col6, b.col6, a.col7, b.col7, a.col8, b.col8 into #diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 <> b.col7 and
a.col8 = b.col8

Would this be possible or shall I get cutting and pasting ?
0
Comment
Question by:SMACD
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 10

Expert Comment

by:John Claes
ID: 34924691
First remark : You are not keeping track of the differencedegree
See beneath for a Difference Calcualting Query :

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8


try the Code beneath as example
create table mytable1
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)

create table mytable2
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)


insert into mytable1 select '1','2','3','4','5','6','7','8'
insert into mytable1 select '1','1','3','4','5','6','7','8'
insert into mytable1 select '1','1','2','4','5','6','7','8'
insert into mytable1 select '1','1','2','3','5','6','7','8'
insert into mytable1 select '1','1','2','3','4','6','7','8'
insert into mytable1 select '1','1','2','3','4','5','7','8'
insert into mytable1 select '1','1','2','3','4','5','6','8'
insert into mytable1 select '1','1','2','3','4','5','6','7'

insert into mytable2 select '1','2','3','4','5','6','7','8'
insert into mytable2 select '1','1','3','4','5','6','7','8'
insert into mytable2 select '1','1','2','4','5','6','7','8'
insert into mytable2 select '1','1','2','3','5','6','7','8'
insert into mytable2 select '1','1','2','3','4','6','7','8'
insert into mytable2 select '1','1','2','3','4','5','7','8'
insert into mytable2 select '1','1','2','3','4','5','6','8'
insert into mytable2 select '1','1','2','3','4','5','6','7'

select 
	-- keeping values for Logging ;-)
	a.col1 as ACOL1,b.col1 as BCOL1,  
	a.col2 as ACOL2,b.col2 as BCOL2,  
	a.col3 as ACOL3,b.col3 as BCOL3,  
	a.col4 as ACOL4,b.col4 as BCOL4,  
	a.col5 as ACOL5,b.col5 as BCOL5,  
	a.col6 as ACOL6,b.col6 as BCOL6,  
	a.col7 as ACOL7,b.col7 as BCOL7,  
	a.col8 as ACOL8,b.col8 as BCOL8, 
	getdate() as [DIFF_DATE],
	Case when a.col1 <> b.col1 then '1|' else '' end + 
	Case when a.col2 <> b.col2 then '2|' else '' end + 
	Case when a.col3 <> b.col3 then '3|' else '' end + 
	Case when a.col4 <> b.col4 then '4|' else '' end + 
	Case when a.col5 <> b.col5 then '5|' else '' end + 
	Case when a.col6 <> b.col6 then '6|' else '' end + 
	Case when a.col7 <> b.col7 then '7|' else '' end + 
	Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8

select * from #DIFF_TABLE

drop table mytable1
drop table mytable2
drop table #DIFF_TABLE

Open in new window

0
 

Author Comment

by:SMACD
ID: 34924843
poor beggar

Thanks for that

Noted I do want to capture the difference value, but your example (unless I've misread it) has no mismatches as all rows are the same in both tables and would generate a list matching everything against everything else.  I have member records in two tables, some of which have been updated in one table and not the other. I also have some records missing in one table as they have been added (this can happen on either table).   If I ran this against by 500,000+ record set I would be in trouble.

Can feel that cut and paste coming on

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 18

Accepted Solution

by:
sventhan earned 250 total points
ID: 34925095
the code from the above link is posted here
CREATE TABLE TABLEA (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)
CREATE TABLE TABLEB (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)

INSERT INTO TABLEA VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEA VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEA VALUES (3, NULL, NULL, 2, 'a', 'b', 'c', NULL, '2010-03-03', '2010-04-25')
INSERT INTO TABLEA VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-01-01', '2010-07-10', '2010-01-03')
INSERT INTO TABLEA VALUES (5, 2, 4, 5, 'a', 'b', 'c', '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEA VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEA VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEA VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEA VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEA VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEA VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEA VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEA VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEA VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEA VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

INSERT INTO TABLEB VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEB VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEB VALUES (3, 2, 4, 2, 'a', 'b', 'c', '2010-05-11', '2010-03-03', '2010-04-25')
INSERT INTO TABLEB VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-05-30', '2010-07-10', '2010-01-03')
INSERT INTO TABLEB VALUES (5, 2, 4, 5, 'a', 'b', NULL, '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEB VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEB VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEB VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEB VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEB VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEB VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEB VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEB VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEB VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEB VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

/* First query, finding mismatches (ignoring where both are NULL) */

SELECT 'TABLEA' AS Source, a.primarykey, a.Col2, a.Col3, a.Col4, a.Col5, a.Col6, a.Col7, 
    a.Col8, a.Col9, a.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
UNION ALL
SELECT 'TABLEB' AS Source, b.primarykey, b.Col2, b.Col3, b.Col4, b.Col5, b.Col6, b.Col7, 
    b.Col8, b.Col9, b.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
ORDER BY 2, 1

/* returns:

Source primarykey Col2 Col3 Col4 Col5 Col6 Col7 Col8                    Col9                    Col10
-----------------------------------------------------------------------------------------------------------------------
TABLEA 3          NULL NULL 2    a    b    c    NULL                    2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEB 3          2    4    2    a    b    c    2010-05-11 00:00:00.000 2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEA 4          3    3    2    a    b    c    2010-01-01 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEB 4          3    3    2    a    b    c    2010-05-30 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEA 5          2    4    5    a    b    c    2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000
TABLEB 5          2    4    5    a    b    NULL 2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000

*/


/* 2nd query, counting diffs (ignoring where both are null) by column */

SELECT SUM(CASE WHEN COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) THEN 1 ELSE 0 END) AS Col2Count,
    SUM(CASE WHEN COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) THEN 1 ELSE 0 END) AS Col3Count,
    SUM(CASE WHEN COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) THEN 1 ELSE 0 END) AS Col4Count,
    SUM(CASE WHEN COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col5Count,
    SUM(CASE WHEN COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col6Count,
    SUM(CASE WHEN COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col7Count,
    SUM(CASE WHEN COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col8Count,
    SUM(CASE WHEN COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col9Count,
    SUM(CASE WHEN COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col10Count
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey

/* returns:

Col2Count Col3Count Col4Count Col5Count Col6Count Col7Count Col8Count Col9Count Col10Count
------------------------------------------------------------------------------------------
1         1         0         0         0         1         2         0         0

*/

Open in new window

0
 
LVL 10

Assisted Solution

by:John Claes
John Claes earned 250 total points
ID: 34925625
SMACD:

I've indeed not used any primary Key in my script
You must indeed add it in the select Query

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.primarykey = b.primarykey and (
a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8)
0
 

Author Comment

by:SMACD
ID: 34950649
Thanks guys, this helped
0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question