Link to home
Start Free TrialLog in
Avatar of jana
janaFlag for United States of America

asked on

How to display the difference found between same-formated-tables of different Databases in Ms SQL 2005 and 2000

We have 4 databases.  Each database has the exact same formatted tables, just with different data.  

There is a specific table (CustTable) that within all 4 databases, has the same data; that is an AccountNumber and Description.  In all 4 DB there are the same AccountNumber.  We have found that within the Description column, there are differences, not many, but does exist.

We are trying to do is prepare a script to display only the differences in description between all database of the same Table.

See pic included for an example of our situation and what we are trying to do.

We want to compare the following manner:

Comp1 >> Comp2, Comp2 >> Comp3 and Comp3 >> Comp4.

The script we are using, upto now is the following
Avatar of jana
jana
Flag of United States of America image

ASKER

Keep forgetting to include attachments... please reference the included data...

SELECT 'DB' 'COMP1-COMP2',(rtrim(a.AccountNumberSegment2) + '-' + rtrim(a.AccountNumberSegment3)), a.actdescr, b.actdescr 
       FROM COMP1..CustTable a
       inner JOIN COMP2..gl00100 b ON a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3 
WHERE a.actdescr <> b.actdescr ORDER BY 1

Open in new window

DifferencesinDB-Table.jpg
Avatar of jana

ASKER

Forgot to mention that the AccountNumber is actually 3 segment number; that is 3 columns to represent an AccountNumber.
SOLUTION
Avatar of jogos
jogos
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SELECT (rtrim(a.AccountNumberSegment2) + '-' + rtrim(a.AccountNumberSegment3)), a.actdescr as Desc1, b.actdescr as Desc2,  c.actdescr as Desc3, d.actdescr as Desc4

FROM
COMP1..CustTable a
inner JOIN COMP2..gl00100 b ON a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3
inner JOIN COMP3..gl00100 c ON
a.AccountNumberSegment2=c.AccountNumberSegment2 AND a.AccountNumberSegment3=c.AccountNumberSegment3
inner JOIN COMP4..gl00100 d ON
a.AccountNumberSegment2=d.AccountNumberSegment2 AND a.AccountNumberSegment3=d.AccountNumberSegment3
WHERE a.actdescr <> b.actdescr or b.actdescr <> c.actdescr or c.actdescr <> d.actdescr
Avatar of jana

ASKER

will try
Avatar of jana

ASKER

Ust tried it and some rows were not shown.

To compare I used the script inincluded in the question and compare from comp1 to comp2, comp1 to comp3, comp1 to comp4 and found rows not displayed by your script.
hmmm...a couple things come to mind.

First, you said Account Number is really a composite of three columns, but your query (and mine, which is modeled after it) only joins on two of the three columns.  

However, I expect that the real issue is that some of the tables have Account Number combinations that do not exist in others.  My query only returns results where the Account Number exists in all 4 tables.  What does these return:

Select count(*) from comp1 a  where not exists (select * from comp2  b where  a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3)


Select count(*) from comp1 a  where not exists (select * from comp3  b where  a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3)


Select count(*) from comp1 a  where not exists (select * from comp4  b where  a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3)
<<However, I expect that the real issue is that some of the tables have Account Number combinations that do not exist in others. >>
That's where the 'group by AccNbr,Actdescr  having count(*) < 4' sollution is identifying them.
Avatar of jana

ASKER

To try to simplify it is display the accounts that are found in all Comps and also that at least one of the Description within the 4 are not the same.

The script I provided in ID: 37720683, is just a search of One Comp "compare" to the rest  (Comp1 comparing to Comp2,Comp3, etc)

In reality, since we haven't come up with a single script, I have at least 20 of those script comparing the following manner (it is 5 database, not 4):

         Comp1 compare Comp2, Comp3, Comp4, Comp5
         Comp2 compare Comp3, Comp4, Comp5
         Comp3 compare Comp1, Comp2, Comp4, Comp5
         Comp4 compare Comp1, Comp2, Comp3, Comp5
         Comp5 compare Comp1, Comp2, Comp3, Comp4

Can a single script do this compare and display the result as indicated in  ID: 37720683?
Again comment  #a37720856 identifies the items that are not the same in all databases. Did you test this?

But it gives both the good count(*) = 3 and the bad count(*) = 1.
Problem is to identify the one and only good description , if you don't have certainty there is one database that must be the reference.

So 4 logical steps
1) list of not fully the same (my union ... having count(*) <)
2) (perfectly integratable in step 1)  match the good description with it (always that of COMP1?)
3) With row_number() over() http://msdn.microsoft.com/en-us/library/ms189461.aspx give for each db a rank on the 'different' records
4) join all different records together with a FULL JOIN on rank (step 3) to get exactly the result you want
http://msdn.microsoft.com/en-us/library/ms187518.aspx

Steps 3 and 4 are for having that formated as in your example
Avatar of jana

ASKER

actually, no.  Since we saw the next entry, we though it was your and discard your entry.  Will test tomorrow.

Also to comment that yes your are correct, we have to have a DB as references; this is Comp1
<<Comp1 >> Comp2, Comp2 >> Comp3 and Comp3 >> Comp4.>>
was not showing as comp1 was the reference, but with reference it's easier

Without testing (no sql here) i tried to build something going in the direction you want


Step 1, 2 and 3 combined in the cte-definition (http://msdn.microsoft.com/en-us/library/ms175972.aspx). Test that first, see if you get out of it what you want and see what the row_number has done.

Step 4 the allignement

;with cte_descr (AccNbr, actdescr,refactdescr, rownbr)
as 
(
select 'comp2' as db , (rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3))  as AccNbr, actdescr
, ROW_NUMBER() OVER(ORDER BY rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3) ) as rownbr
 from comp1 as t1
 left join comp2 as t2 
    on ( rtrim(t2.AccountNumberSegment2) + '-' + rtrim(t2.AccountNumberSegment3) 
       =rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3)
        and t2.actdescr <> t1.actdescr
        )
 union all 
select 'comp3' as db , (rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3))  as AccNbr, actdescr
, ROW_NUMBER() OVER(ORDER BY rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3) ) as rownbr
 from comp1 as t1
 left join comp3 as t2 
    on ( rtrim(t2.AccountNumberSegment2) + '-' + rtrim(t2.AccountNumberSegment3) 
       =rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3)
        and t2.actdescr <> t1.actdescr
        )
 union all 
select 'comp4' as db , (rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3))  as AccNbr, actdescr
, ROW_NUMBER() OVER(ORDER BY rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3) ) as rownbr
 from comp1 as t1
 left join comp4 as t2 
    on ( rtrim(t2.AccountNumberSegment2) + '-' + rtrim(t2.AccountNumberSegment3) 
       =rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3)
        and t2.actdescr <> t1.actdescr
        )
 union all 
select 'comp5' as db , (rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3))  as AccNbr, actdescr
, ROW_NUMBER() OVER(ORDER BY rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3) ) as rownbr
 from comp1 as t1
 left join comp5 as t2 
    on ( rtrim(t2.AccountNumberSegment2) + '-' + rtrim(t2.AccountNumberSegment3) 
       =rtrim(t1.AccountNumberSegment2) + '-' + rtrim(t1.AccountNumberSegment3)
        and t2.actdescr <> t1.actdescr
        )
 ) 
-- exploit the cte for your output-format (step 4)
select x2.acccdescr as c2descr ,x2.refaccescr as c2ref
,x3.acccdescr as c3descr ,x3.refaccescr as c3ref
,x4.acccdescr as c4descr ,x4.refaccescr as c4ref
,x5.acccdescr as c5descr ,x5.refaccescr as c5ref
from cte_descr  as x2
FULL join cte_descr as x3 on x3.rownbr = x2.rownbr and x3.db = 'comp3'
FULL join cte_descr as x4 on x4.rownbr = x2.rownbr and x4.db = 'comp4'
FULL join cte_descr as x5 on x4.rownbr = x2.rownbr and x5.db = 'comp5'
where isnull(x2.db,'comp2')='comp2'

Open in new window

Avatar of jana

ASKER

Will check today
Now that you've identified Comp1 as the reference table, try this:


SELECT (rtrim(a.AccountNumberSegment2) + '-' + rtrim(a.AccountNumberSegment3)), a.actdescr as Desc1, b.actdescr as Desc2,  c.actdescr as Desc3, d.actdescr as Desc4

FROM
COMP1..CustTable a
left JOIN COMP2..gl00100 b ON a.AccountNumberSegment2=b.AccountNumberSegment2 AND a.AccountNumberSegment3=b.AccountNumberSegment3
left JOIN COMP3..gl00100 c ON
a.AccountNumberSegment2=c.AccountNumberSegment2 AND a.AccountNumberSegment3=c.AccountNumberSegment3
left JOIN COMP4..gl00100 d ON
a.AccountNumberSegment2=d.AccountNumberSegment2 AND a.AccountNumberSegment3=d.AccountNumberSegment3
WHERE a.actdescr <> b.actdescr or a.actdescr <> c.actdescr or a.actdescr <> d.actdescr
or b.actdescr is null or c.actdescr is null or d.actdescr is null

ps.  Still don't understand why AccountNumberSegment1 is not part of the join expresssion????
Avatar of jana

ASKER

Becuase AccountNumberSegment1 is the same value in all Comps.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jana

ASKER

Just tried it.  It gave me a error since "AccountNumber" doesnt exist.  The column we are working with is "AccountNumber2" and "AccountNumber3".  Column "AccountNumber1" has the same value in all DB.

What should replace "AccountNumber" on the first select and group by and order by?
If you run the inner (union) query by itself, the first column it returns is named AccountNumber. That's what you should use in the outer select, group by, and order by.  

You may need to do this:

Select AccountNumber
...
WHERE a.actdescr <> b.actdesc
) as tmp             --table alias may be required                  
group by AccountNumber
order by AccountNumber
Avatar of jana

ASKER

You are totally correct.  Replaced it and gave us a result.  We will verify contents.... sorry about that...
Avatar of jana

ASKER

We still testing but its five databases (ID 37723518).

I have fixed the script and added the additional Union All and Select, but how can add the column for the additional DB?
Avatar of jana

ASKER

Ok... figured it out.
Avatar of jana

ASKER

We have incorporating your script in ID: 37725684, and validating at least 20% of the rows and has worked excellently.

Thank you very much!!!!
And the way you wanted to format it in the output?
Avatar of jana

ASKER

Thanx.