Solved

Large Marking task

Posted on 2011-02-18
6
215 Views
Last Modified: 2012-05-11
Hello experts,

I am comparing two tables that contain many records that have the same information but slightly different values in most cases.  In order to work out what is similar and what is not similar I have a script that compares the data in eight columns reports out the match type then deletes the values.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #EXACT
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 = b.col8

delete from mytable1 where col1 in (select a1 from #EXACT)
delete from mytable2 where col1 in (select b1 from #EXACT)

What I then need to do is go through each degree of potential difference between the remaining rows. so work out where there is a difference in col8 where everything else is an match, col7 where everything else is a match and so on (except for col1 as this is always the same).  Then I need to match where all pairs of columns, don't match, all groups of three columns, all groups of fours, fives, sixes don't match

What I want is a stored procedure or a function to populate the matching criteria automatically rather than write out over 100 select statements into #DIFFS.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #Diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 <> b.col8

Insert into #diffs ( a1, b1, a2, b2, a3, b3, a4, b4, a5, b5, a6, b6, a7, b7, a.8, b8)
select a.col1, b.col1, a.col2, b.col2, a.col3, b.col3, a.col4, b.col4, a.col5, b.col5, a.col6, b.col6, a.col7, b.col7, a.col8, b.col8 into #diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 <> b.col7 and
a.col8 = b.col8

Would this be possible or shall I get cutting and pasting ?
0
Comment
Question by:SMACD
  • 2
  • 2
  • 2
6 Comments
 
LVL 10

Expert Comment

by:John Claes
Comment Utility
First remark : You are not keeping track of the differencedegree
See beneath for a Difference Calcualting Query :

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8


try the Code beneath as example
create table mytable1
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)

create table mytable2
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)


insert into mytable1 select '1','2','3','4','5','6','7','8'
insert into mytable1 select '1','1','3','4','5','6','7','8'
insert into mytable1 select '1','1','2','4','5','6','7','8'
insert into mytable1 select '1','1','2','3','5','6','7','8'
insert into mytable1 select '1','1','2','3','4','6','7','8'
insert into mytable1 select '1','1','2','3','4','5','7','8'
insert into mytable1 select '1','1','2','3','4','5','6','8'
insert into mytable1 select '1','1','2','3','4','5','6','7'

insert into mytable2 select '1','2','3','4','5','6','7','8'
insert into mytable2 select '1','1','3','4','5','6','7','8'
insert into mytable2 select '1','1','2','4','5','6','7','8'
insert into mytable2 select '1','1','2','3','5','6','7','8'
insert into mytable2 select '1','1','2','3','4','6','7','8'
insert into mytable2 select '1','1','2','3','4','5','7','8'
insert into mytable2 select '1','1','2','3','4','5','6','8'
insert into mytable2 select '1','1','2','3','4','5','6','7'

select 
	-- keeping values for Logging ;-)
	a.col1 as ACOL1,b.col1 as BCOL1,  
	a.col2 as ACOL2,b.col2 as BCOL2,  
	a.col3 as ACOL3,b.col3 as BCOL3,  
	a.col4 as ACOL4,b.col4 as BCOL4,  
	a.col5 as ACOL5,b.col5 as BCOL5,  
	a.col6 as ACOL6,b.col6 as BCOL6,  
	a.col7 as ACOL7,b.col7 as BCOL7,  
	a.col8 as ACOL8,b.col8 as BCOL8, 
	getdate() as [DIFF_DATE],
	Case when a.col1 <> b.col1 then '1|' else '' end + 
	Case when a.col2 <> b.col2 then '2|' else '' end + 
	Case when a.col3 <> b.col3 then '3|' else '' end + 
	Case when a.col4 <> b.col4 then '4|' else '' end + 
	Case when a.col5 <> b.col5 then '5|' else '' end + 
	Case when a.col6 <> b.col6 then '6|' else '' end + 
	Case when a.col7 <> b.col7 then '7|' else '' end + 
	Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8

select * from #DIFF_TABLE

drop table mytable1
drop table mytable2
drop table #DIFF_TABLE

Open in new window

0
 

Author Comment

by:SMACD
Comment Utility
poor beggar

Thanks for that

Noted I do want to capture the difference value, but your example (unless I've misread it) has no mismatches as all rows are the same in both tables and would generate a list matching everything against everything else.  I have member records in two tables, some of which have been updated in one table and not the other. I also have some records missing in one table as they have been added (this can happen on either table).   If I ran this against by 500,000+ record set I would be in trouble.

Can feel that cut and paste coming on

0
 
LVL 18

Expert Comment

by:sventhan
Comment Utility
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 18

Accepted Solution

by:
sventhan earned 250 total points
Comment Utility
the code from the above link is posted here
CREATE TABLE TABLEA (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)
CREATE TABLE TABLEB (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)

INSERT INTO TABLEA VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEA VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEA VALUES (3, NULL, NULL, 2, 'a', 'b', 'c', NULL, '2010-03-03', '2010-04-25')
INSERT INTO TABLEA VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-01-01', '2010-07-10', '2010-01-03')
INSERT INTO TABLEA VALUES (5, 2, 4, 5, 'a', 'b', 'c', '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEA VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEA VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEA VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEA VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEA VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEA VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEA VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEA VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEA VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEA VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

INSERT INTO TABLEB VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEB VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEB VALUES (3, 2, 4, 2, 'a', 'b', 'c', '2010-05-11', '2010-03-03', '2010-04-25')
INSERT INTO TABLEB VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-05-30', '2010-07-10', '2010-01-03')
INSERT INTO TABLEB VALUES (5, 2, 4, 5, 'a', 'b', NULL, '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEB VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEB VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEB VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEB VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEB VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEB VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEB VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEB VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEB VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEB VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

/* First query, finding mismatches (ignoring where both are NULL) */

SELECT 'TABLEA' AS Source, a.primarykey, a.Col2, a.Col3, a.Col4, a.Col5, a.Col6, a.Col7, 
    a.Col8, a.Col9, a.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
UNION ALL
SELECT 'TABLEB' AS Source, b.primarykey, b.Col2, b.Col3, b.Col4, b.Col5, b.Col6, b.Col7, 
    b.Col8, b.Col9, b.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
ORDER BY 2, 1

/* returns:

Source primarykey Col2 Col3 Col4 Col5 Col6 Col7 Col8                    Col9                    Col10
-----------------------------------------------------------------------------------------------------------------------
TABLEA 3          NULL NULL 2    a    b    c    NULL                    2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEB 3          2    4    2    a    b    c    2010-05-11 00:00:00.000 2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEA 4          3    3    2    a    b    c    2010-01-01 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEB 4          3    3    2    a    b    c    2010-05-30 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEA 5          2    4    5    a    b    c    2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000
TABLEB 5          2    4    5    a    b    NULL 2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000

*/


/* 2nd query, counting diffs (ignoring where both are null) by column */

SELECT SUM(CASE WHEN COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) THEN 1 ELSE 0 END) AS Col2Count,
    SUM(CASE WHEN COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) THEN 1 ELSE 0 END) AS Col3Count,
    SUM(CASE WHEN COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) THEN 1 ELSE 0 END) AS Col4Count,
    SUM(CASE WHEN COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col5Count,
    SUM(CASE WHEN COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col6Count,
    SUM(CASE WHEN COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col7Count,
    SUM(CASE WHEN COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col8Count,
    SUM(CASE WHEN COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col9Count,
    SUM(CASE WHEN COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col10Count
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey

/* returns:

Col2Count Col3Count Col4Count Col5Count Col6Count Col7Count Col8Count Col9Count Col10Count
------------------------------------------------------------------------------------------
1         1         0         0         0         1         2         0         0

*/

Open in new window

0
 
LVL 10

Assisted Solution

by:John Claes
John Claes earned 250 total points
Comment Utility
SMACD:

I've indeed not used any primary Key in my script
You must indeed add it in the select Query

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.primarykey = b.primarykey and (
a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8)
0
 

Author Comment

by:SMACD
Comment Utility
Thanks guys, this helped
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

Suggested Solutions

In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now