Solved

Large Marking task

Posted on 2011-02-18
6
217 Views
Last Modified: 2012-05-11
Hello experts,

I am comparing two tables that contain many records that have the same information but slightly different values in most cases.  In order to work out what is similar and what is not similar I have a script that compares the data in eight columns reports out the match type then deletes the values.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #EXACT
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 = b.col8

delete from mytable1 where col1 in (select a1 from #EXACT)
delete from mytable2 where col1 in (select b1 from #EXACT)

What I then need to do is go through each degree of potential difference between the remaining rows. so work out where there is a difference in col8 where everything else is an match, col7 where everything else is a match and so on (except for col1 as this is always the same).  Then I need to match where all pairs of columns, don't match, all groups of three columns, all groups of fours, fives, sixes don't match

What I want is a stored procedure or a function to populate the matching criteria automatically rather than write out over 100 select statements into #DIFFS.

select a.col1 a1, b.col1 b1, a.col2 a2, b.col2 b2, a.col3 a3, b.col3 b3, a.col4 a4, b.col4 b4, a.col5  a5, b.col5 b5, a.col6 a6, b.col6 b6, a.col7 a7, b.col7 b7, a.col8 a8, b.col8 b8 into #Diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 = b.col7 and
a.col8 <> b.col8

Insert into #diffs ( a1, b1, a2, b2, a3, b3, a4, b4, a5, b5, a6, b6, a7, b7, a.8, b8)
select a.col1, b.col1, a.col2, b.col2, a.col3, b.col3, a.col4, b.col4, a.col5, b.col5, a.col6, b.col6, a.col7, b.col7, a.col8, b.col8 into #diffs
from mytable1 a, mytable2 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4  and
a.col5 = b.col5 and
a.col6 = b.col6 and
a.col7 <> b.col7 and
a.col8 = b.col8

Would this be possible or shall I get cutting and pasting ?
0
Comment
Question by:SMACD
  • 2
  • 2
  • 2
6 Comments
 
LVL 10

Expert Comment

by:John Claes
ID: 34924691
First remark : You are not keeping track of the differencedegree
See beneath for a Difference Calcualting Query :

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8


try the Code beneath as example
create table mytable1
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)

create table mytable2
(
	col1 nvarchar(20),  
	col2 nvarchar(20),  
	col3 nvarchar(20),  
	col4 nvarchar(20),  
	col5 nvarchar(20),  
	col6 nvarchar(20),  
	col7 nvarchar(20),  
	col8 nvarchar(20), 
)


insert into mytable1 select '1','2','3','4','5','6','7','8'
insert into mytable1 select '1','1','3','4','5','6','7','8'
insert into mytable1 select '1','1','2','4','5','6','7','8'
insert into mytable1 select '1','1','2','3','5','6','7','8'
insert into mytable1 select '1','1','2','3','4','6','7','8'
insert into mytable1 select '1','1','2','3','4','5','7','8'
insert into mytable1 select '1','1','2','3','4','5','6','8'
insert into mytable1 select '1','1','2','3','4','5','6','7'

insert into mytable2 select '1','2','3','4','5','6','7','8'
insert into mytable2 select '1','1','3','4','5','6','7','8'
insert into mytable2 select '1','1','2','4','5','6','7','8'
insert into mytable2 select '1','1','2','3','5','6','7','8'
insert into mytable2 select '1','1','2','3','4','6','7','8'
insert into mytable2 select '1','1','2','3','4','5','7','8'
insert into mytable2 select '1','1','2','3','4','5','6','8'
insert into mytable2 select '1','1','2','3','4','5','6','7'

select 
	-- keeping values for Logging ;-)
	a.col1 as ACOL1,b.col1 as BCOL1,  
	a.col2 as ACOL2,b.col2 as BCOL2,  
	a.col3 as ACOL3,b.col3 as BCOL3,  
	a.col4 as ACOL4,b.col4 as BCOL4,  
	a.col5 as ACOL5,b.col5 as BCOL5,  
	a.col6 as ACOL6,b.col6 as BCOL6,  
	a.col7 as ACOL7,b.col7 as BCOL7,  
	a.col8 as ACOL8,b.col8 as BCOL8, 
	getdate() as [DIFF_DATE],
	Case when a.col1 <> b.col1 then '1|' else '' end + 
	Case when a.col2 <> b.col2 then '2|' else '' end + 
	Case when a.col3 <> b.col3 then '3|' else '' end + 
	Case when a.col4 <> b.col4 then '4|' else '' end + 
	Case when a.col5 <> b.col5 then '5|' else '' end + 
	Case when a.col6 <> b.col6 then '6|' else '' end + 
	Case when a.col7 <> b.col7 then '7|' else '' end + 
	Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8

select * from #DIFF_TABLE

drop table mytable1
drop table mytable2
drop table #DIFF_TABLE

Open in new window

0
 

Author Comment

by:SMACD
ID: 34924843
poor beggar

Thanks for that

Noted I do want to capture the difference value, but your example (unless I've misread it) has no mismatches as all rows are the same in both tables and would generate a list matching everything against everything else.  I have member records in two tables, some of which have been updated in one table and not the other. I also have some records missing in one table as they have been added (this can happen on either table).   If I ran this against by 500,000+ record set I would be in trouble.

Can feel that cut and paste coming on

0
 
LVL 18

Expert Comment

by:sventhan
ID: 34925091
0
VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

 
LVL 18

Accepted Solution

by:
sventhan earned 250 total points
ID: 34925095
the code from the above link is posted here
CREATE TABLE TABLEA (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)
CREATE TABLE TABLEB (primarykey int, Col2 int, Col3 int, Col4 int, Col5 varchar(50), Col6 varchar(50), Col7 varchar(50), Col8 datetime, Col9 datetime, Col10 datetime)

INSERT INTO TABLEA VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEA VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEA VALUES (3, NULL, NULL, 2, 'a', 'b', 'c', NULL, '2010-03-03', '2010-04-25')
INSERT INTO TABLEA VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-01-01', '2010-07-10', '2010-01-03')
INSERT INTO TABLEA VALUES (5, 2, 4, 5, 'a', 'b', 'c', '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEA VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEA VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEA VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEA VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEA VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEA VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEA VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEA VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEA VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEA VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

INSERT INTO TABLEB VALUES (1, 1, 4, 3, 'a', 'b', 'c', '2010-05-05', '2010-02-13', '2010-07-21')
INSERT INTO TABLEB VALUES (2, 4, 5, 1, 'a', 'b', 'c', '2010-07-25', '2010-07-19', NULL)
INSERT INTO TABLEB VALUES (3, 2, 4, 2, 'a', 'b', 'c', '2010-05-11', '2010-03-03', '2010-04-25')
INSERT INTO TABLEB VALUES (4, 3, 3, 2, 'a', 'b', 'c', '2010-05-30', '2010-07-10', '2010-01-03')
INSERT INTO TABLEB VALUES (5, 2, 4, 5, 'a', 'b', NULL, '2010-05-27', '2010-02-25', '2010-06-05')
INSERT INTO TABLEB VALUES (6, 5, 3, 1, 'a', 'b', 'c', '2010-05-12', '2010-01-30', '2010-08-03')
INSERT INTO TABLEB VALUES (7, 2, 5, 2, 'a', 'b', 'c', '2010-08-04', '2010-06-19', '2010-06-20')
INSERT INTO TABLEB VALUES (8, 1, 2, 2, 'a', 'b', 'c', '2010-07-26', '2010-03-16', '2010-02-15')
INSERT INTO TABLEB VALUES (9, 4, 3, 4, 'a', 'b', 'c', '2010-03-05', '2010-03-26', '2010-08-15')
INSERT INTO TABLEB VALUES (10, 2, 3, 2, 'a', 'b', 'c', '2010-02-27', '2010-06-17', '2010-08-28')
INSERT INTO TABLEB VALUES (11, 1, 2, 2, 'a', 'b', 'c', '2010-06-11', '2010-04-22', '2010-03-11')
INSERT INTO TABLEB VALUES (12, 2, 3, 1, 'a', 'b', 'c', '2010-02-01', '2010-09-04', '2010-02-15')
INSERT INTO TABLEB VALUES (13, 4, 5, 5, 'a', 'b', 'c', '2010-06-02', '2010-05-04', '2010-03-25')
INSERT INTO TABLEB VALUES (14, 4, 2, 1, 'a', 'b', 'c', '2010-07-14', '2010-06-19', '2010-07-17')
INSERT INTO TABLEB VALUES (15, 2, 1, 5, 'a', 'b', 'c', '2010-07-05', '2010-08-16', '2010-07-10')

/* First query, finding mismatches (ignoring where both are NULL) */

SELECT 'TABLEA' AS Source, a.primarykey, a.Col2, a.Col3, a.Col4, a.Col5, a.Col6, a.Col7, 
    a.Col8, a.Col9, a.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
UNION ALL
SELECT 'TABLEB' AS Source, b.primarykey, b.Col2, b.Col3, b.Col4, b.Col5, b.Col6, b.Col7, 
    b.Col8, b.Col9, b.Col10
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey
WHERE COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) OR
    COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) OR
    COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) OR
    COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') OR
    COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') OR
    COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') OR
    COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') OR
    COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') OR
    COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42')
ORDER BY 2, 1

/* returns:

Source primarykey Col2 Col3 Col4 Col5 Col6 Col7 Col8                    Col9                    Col10
-----------------------------------------------------------------------------------------------------------------------
TABLEA 3          NULL NULL 2    a    b    c    NULL                    2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEB 3          2    4    2    a    b    c    2010-05-11 00:00:00.000 2010-03-03 00:00:00.000 2010-04-25 00:00:00.000
TABLEA 4          3    3    2    a    b    c    2010-01-01 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEB 4          3    3    2    a    b    c    2010-05-30 00:00:00.000 2010-07-10 00:00:00.000 2010-01-03 00:00:00.000
TABLEA 5          2    4    5    a    b    c    2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000
TABLEB 5          2    4    5    a    b    NULL 2010-05-27 00:00:00.000 2010-02-25 00:00:00.000 2010-06-05 00:00:00.000

*/


/* 2nd query, counting diffs (ignoring where both are null) by column */

SELECT SUM(CASE WHEN COALESCE(a.Col2, -123456) <> COALESCE(b.Col2, -123456) THEN 1 ELSE 0 END) AS Col2Count,
    SUM(CASE WHEN COALESCE(a.Col3, -123456) <> COALESCE(b.Col3, -123456) THEN 1 ELSE 0 END) AS Col3Count,
    SUM(CASE WHEN COALESCE(a.Col4, -123456) <> COALESCE(b.Col4, -123456) THEN 1 ELSE 0 END) AS Col4Count,
    SUM(CASE WHEN COALESCE(a.Col5, 'as3hdv7p@oip$') <> COALESCE(b.Col5, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col5Count,
    SUM(CASE WHEN COALESCE(a.Col6, 'as3hdv7p@oip$') <> COALESCE(b.Col6, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col6Count,
    SUM(CASE WHEN COALESCE(a.Col7, 'as3hdv7p@oip$') <> COALESCE(b.Col7, 'as3hdv7p@oip$') THEN 1 ELSE 0 END) AS Col7Count,
    SUM(CASE WHEN COALESCE(a.Col8, '1850-01-01 03:17:42') <> COALESCE(b.Col8, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col8Count,
    SUM(CASE WHEN COALESCE(a.Col9, '1850-01-01 03:17:42') <> COALESCE(b.Col9, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col9Count,
    SUM(CASE WHEN COALESCE(a.Col10, '1850-01-01 03:17:42') <> COALESCE(b.Col10, '1850-01-01 03:17:42') THEN 1 ELSE 0 END) AS Col10Count
FROM TABLEA a INNER JOIN
    TABLEB b ON a.primarykey = b.primarykey

/* returns:

Col2Count Col3Count Col4Count Col5Count Col6Count Col7Count Col8Count Col9Count Col10Count
------------------------------------------------------------------------------------------
1         1         0         0         0         1         2         0         0

*/

Open in new window

0
 
LVL 10

Assisted Solution

by:John Claes
John Claes earned 250 total points
ID: 34925625
SMACD:

I've indeed not used any primary Key in my script
You must indeed add it in the select Query

select
      -- keeping values for Logging ;-)
      a.col1 as ACOL1,b.col1 as BCOL1,  
      a.col2 as ACOL2,b.col2 as BCOL2,  
      a.col3 as ACOL3,b.col3 as BCOL3,  
      a.col4 as ACOL4,b.col4 as BCOL4,  
      a.col5 as ACOL5,b.col5 as BCOL5,  
      a.col6 as ACOL6,b.col6 as BCOL6,  
      a.col7 as ACOL7,b.col7 as BCOL7,  
      a.col8 as ACOL8,b.col8 as BCOL8,
      getdate() as [DIFF_DATE],
      Case when a.col1 <> b.col1 then '1|' else '' end +
      Case when a.col2 <> b.col2 then '2|' else '' end +
      Case when a.col3 <> b.col3 then '3|' else '' end +
      Case when a.col4 <> b.col4 then '4|' else '' end +
      Case when a.col5 <> b.col5 then '5|' else '' end +
      Case when a.col6 <> b.col6 then '6|' else '' end +
      Case when a.col7 <> b.col7 then '7|' else '' end +
      Case when a.col8 <> b.col8 then '8|' else '' end as  [DIFF_INDICATION]
into #DIFF_TABLE
from mytable1 a, mytable2 b
where a.primarykey = b.primarykey and (
a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col4 <> b.col4 or
a.col5 <> b.col5 or
a.col6 <> b.col6 or
a.col7 <> b.col7 or
a.col8 <> b.col8)
0
 

Author Comment

by:SMACD
ID: 34950649
Thanks guys, this helped
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Addition to SQL for dynamic fields 6 38
SQL: launch actions one before the other 10 22
MS SQL with ODBC 5 35
Insert query into temp tables using Coldfusion 3 16
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question