Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Large Marking task

Posted on 2011-02-18
6
Medium Priority
?
244 Views
Last Modified: 2012-05-11
Hello experts,

I am comparing two tables that contain many records that have the same information but slightly different values in most cases.  In order to work out what is similar and what is not similar I have a script that compares the data in eight columns reports out the match type then deletes the values.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #EXACT
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 = b.col8

delete from mytable1 where col1 in (select a1 from #EXACT)
delete from mytable2 where col1 in (select b1 from #EXACT)

What I then need to do is go through each degree of potential difference between the remaining rows. so work out where there is a difference in col8 where everything else is an match, col7 where everything else is a match and so on (except for col1 as this is always the same).  Then I need to match where all pairs of columns, don't match, all groups of three columns, all groups of fours, fives, sixes don't match

What I want is a stored procedure or a function to populate the matching criteria automatically rather than write out over 100 select statements into #DIFFS.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #Diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 <> b.col8

Insert into #diffs ( a1, b1, a2, b2, a3, b3, a4, b4, a5, b5, a6, b6, a7, b7, a.8, b8)
select a.col1, b.col1, a.col2, b.col2, a.col3, b.col3, a.col4, b.col4, a.col5, b.col5, a.col6, b.col6, a.col7, b.col7, a.col8, b.col8 into #diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 <> b.col7 and
a.col8 = b.col8

Would this be possible or shall I get cutting and pasting ?
0
Comment
Question by:SMACD
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 10

Expert Comment

by:John Claes
ID: 34924691
First remark : You are not keeping track of the differencedegree
See beneath for a Difference Calcualting Query :

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8


try the Code beneath as example
create table mytable1
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)

create table mytable2
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)


insert into mytable1 select '1','2','3','4','5','6','7','8'
insert into mytable1 select '1','1','3','4','5','6','7','8'
insert into mytable1 select '1','1','2','4','5','6','7','8'
insert into mytable1 select '1','1','2','3','5','6','7','8'
insert into mytable1 select '1','1','2','3','4','6','7','8'
insert into mytable1 select '1','1','2','3','4','5','7','8'
insert into mytable1 select '1','1','2','3','4','5','6','8'
insert into mytable1 select '1','1','2','3','4','5','6','7'

insert into mytable2 select '1','2','3','4','5','6','7','8'
insert into mytable2 select '1','1','3','4','5','6','7','8'
insert into mytable2 select '1','1','2','4','5','6','7','8'
insert into mytable2 select '1','1','2','3','5','6','7','8'
insert into mytable2 select '1','1','2','3','4','6','7','8'
insert into mytable2 select '1','1','2','3','4','5','7','8'
insert into mytable2 select '1','1','2','3','4','5','6','8'
insert into mytable2 select '1','1','2','3','4','5','6','7'

select 
	-- keeping values for Logging ;-)
	a.col1 as ACOL1,b.col1 as BCOL1,  
	a.col2 as ACOL2,b.col2 as BCOL2,  
	a.col3 as ACOL3,b.col3 as BCOL3,  
	a.col4 as ACOL4,b.col4 as BCOL4,  
	a.col5 as ACOL5,b.col5 as BCOL5,  
	a.col6 as ACOL6,b.col6 as BCOL6,  
	a.col7 as ACOL7,b.col7 as BCOL7,  
	a.col8 as ACOL8,b.col8 as BCOL8, 
	getdate() as [DIFF_DATE],
	Case when a.col1 <> b.col1 then '1|' else '' end + 
	Case when a.col2 <> b.col2 then '2|' else '' end + 
	Case when a.col3 <> b.col3 then '3|' else '' end + 
	Case when a.col4 <> b.col4 then '4|' else '' end + 
	Case when a.col5 <> b.col5 then '5|' else '' end + 
	Case when a.col6 <> b.col6 then '6|' else '' end + 
	Case when a.col7 <> b.col7 then '7|' else '' end + 
	Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8

select * from #DIFF_TABLE

drop table mytable1
drop table mytable2
drop table #DIFF_TABLE

Open in new window

0
 

Author Comment

by:SMACD
ID: 34924843
poor beggar

Thanks for that

Noted I do want to capture the difference value, but your example (unless I've misread it) has no mismatches as all rows are the same in both tables and would generate a list matching everything against everything else.  I have member records in two tables, some of which have been updated in one table and not the other. I also have some records missing in one table as they have been added (this can happen on either table).   If I ran this against by 500,000+ record set I would be in trouble.

Can feel that cut and paste coming on

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 18

Accepted Solution

by:
sventhan earned 1000 total points
ID: 34925095
the code from the above link is posted here
CREATE TABLE TABLEA (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)
CREATE TABLE TABLEB (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)

INSERT INTO TABLEA VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEA VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEA VALUES (3, NULL, NULL, 2, 'a', 'b', 'c', NULL, '2010-03-03', '2010-04-25')
INSERT INTO TABLEA VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-01-01', '2010-07-10', '2010-01-03')
INSERT INTO TABLEA VALUES (5, 2, 4, 5, 'a', 'b', 'c', '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEA VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEA VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEA VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEA VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEA VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEA VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEA VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEA VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEA VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEA VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

INSERT INTO TABLEB VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEB VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEB VALUES (3, 2, 4, 2, 'a', 'b', 'c', '2010-05-11', '2010-03-03', '2010-04-25')
INSERT INTO TABLEB VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-05-30', '2010-07-10', '2010-01-03')
INSERT INTO TABLEB VALUES (5, 2, 4, 5, 'a', 'b', NULL, '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEB VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEB VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEB VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEB VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEB VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEB VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEB VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEB VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEB VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEB VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

/* First query, finding mismatches (ignoring where both are NULL) */

SELECT 'TABLEA' AS Source, a.primarykey, a.Col2, a.Col3, a.Col4, a.Col5, a.Col6, a.Col7, 
    a.Col8, a.Col9, a.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
UNION ALL
SELECT 'TABLEB' AS Source, b.primarykey, b.Col2, b.Col3, b.Col4, b.Col5, b.Col6, b.Col7, 
    b.Col8, b.Col9, b.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
ORDER BY 2, 1

/* returns:

Source primarykey Col2 Col3 Col4 Col5 Col6 Col7 Col8                    Col9                    Col10
-----------------------------------------------------------------------------------------------------------------------
TABLEA 3          NULL NULL 2    a    b    c    NULL                    2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEB 3          2    4    2    a    b    c    2010-05-11 00:00:00.000 2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEA 4          3    3    2    a    b    c    2010-01-01 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEB 4          3    3    2    a    b    c    2010-05-30 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEA 5          2    4    5    a    b    c    2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000
TABLEB 5          2    4    5    a    b    NULL 2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000

*/


/* 2nd query, counting diffs (ignoring where both are null) by column */

SELECT SUM(CASE WHEN COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) THEN 1 ELSE 0 END) AS Col2Count,
    SUM(CASE WHEN COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) THEN 1 ELSE 0 END) AS Col3Count,
    SUM(CASE WHEN COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) THEN 1 ELSE 0 END) AS Col4Count,
    SUM(CASE WHEN COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col5Count,
    SUM(CASE WHEN COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col6Count,
    SUM(CASE WHEN COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col7Count,
    SUM(CASE WHEN COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col8Count,
    SUM(CASE WHEN COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col9Count,
    SUM(CASE WHEN COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col10Count
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey

/* returns:

Col2Count Col3Count Col4Count Col5Count Col6Count Col7Count Col8Count Col9Count Col10Count
------------------------------------------------------------------------------------------
1         1         0         0         0         1         2         0         0

*/

Open in new window

0
 
LVL 10

Assisted Solution

by:John Claes
John Claes earned 1000 total points
ID: 34925625
SMACD:

I've indeed not used any primary Key in my script
You must indeed add it in the select Query

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.primarykey = b.primarykey and (
a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8)
0
 

Author Comment

by:SMACD
ID: 34950649
Thanks guys, this helped
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question