Link to home
Start Free TrialLog in
Avatar of mwells08
mwells08Flag for United States of America

asked on

Need ideas on how to better approach the comparison of two different databases tables and show all the differences in a report.

Hello experts, I need your help,

I have two different databases: Oracle and Pervasive; each database have a table that it "supposed" to have the same information.
Both tables have the same columns (with diff. col. names). I need to pull out rows from the pervasive and compare each of the columns/rows with the oracle table; checking for differences in the data; data with no differences; and new data.
I need to create a report (using crystal rep 2013) that will display all the rows from the oracle table that were compared, highlighting all the differences found in the pervasive table, and at the end all new rows in pervasive not found in the oracle table.
I also need to create a table that will keep track of the rows from the pervasive table already compared with the oracle table, so next time the user run the application looking for differences, it will filter out all those rows. I am using MS Access database to help me with the comparison and to keep track of rows already compared.

All ideas are appreciated, Thank you!
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

How large are the tables?

If they are small enough I would probably look at pulling both tables into .Net and use LINQ to compare them.  You should be able to take the LINQ output and use that as the source for Crystal.

I'm far from a .Net/LINQ Expert and know very little about Crystal but there are many examples out there on using LINQ to compare data sets.

This one appeared to be the best one I looked at:
http://www.codeproject.com/Articles/30102/Comparing-DataSets-using-LINQ
Doesn't seem THAT hard in the specific case -0- but building a general case is always the hard part.  Psuedocode solution would be:
- Iterate over all records in Database A Table in order by Unique Key
   - Read corresponding record from Database B Table by Unique Key
   - Set EqualFlag to 1
   - Iterate over all field names
      - Compare each field from Table A Record to Table B Record, Set EqualFlag to 0 if Different
        (This can be as simple or complicated as you need, due to data type mismatches and the like)
   - Write results to Difference Table (Unique Key Value, EqualFlag)

Once you've done the comparison, you can read the Difference table, pull the raw data from each database, and generate your Crystal Report as needed.

As for the last part, you ALWAYS have a chance that data on either side is changing, but as long as the unique keys are indeed unique and cannot be changed, then you can use the same process, but with a change to the top line:
  - Iterate over all records in Database A Table That are NOT in the Difference table in order by Unique Key
(This lookup can be done with a simple INNER JOIN  "WHERE DifferenceTable.ID IS NULL".)

All is straightforward enough, but again, the devils are in the details.......
Is there a KEY field that can link the 2 tables?

mlmcc
Avatar of mwells08

ASKER

Thank you both for the comments!
There are 30 columns on each table having 3 unique keys and one foreign. The problem here is that I can find differences in the unique keys as well because somebody entered the information misspelling one or two numbers or it can be truly new records.
I need to display the information in a datagridview showing which rows have differences, and allowing the user to select which row  to print.

Bill, could you please elaborate on "Set EqualFlag to 1" ? do you mean create a procedure that will raise a flag if different?
The identifying key values can be wrong?  Yikes!  That makes it impossible to find the error -- it will look like one side has a record and the other does not.  If you cannot rely on identifiers, then all bets are off.  

My original thought was a simple Powershell script.  If you know Powershell, you'll realize that this logic is almost the entire script!  Your analysis is accurate -- I was defining a variable called EqualFlag, and setting it so that the records are assumed to be equal.  Then, as you start comparing fields (which, again, can be horribly complicated), you simply set the flag to 0 when you detect a difference, and log that fact in the Difference table.  Of course, further optimizations will make it run faster (i.e. immediately terminate looking at individual fields once you find the first difference).  At the same time, you could even save the results of your comparisons into the Difference table, too, something like this:
  -  Compare each field from Table A Record to Table B Record.  If Different, set EqualFlag to 0 and Append FieldName to ThisDifference
Then, you write out ThisDifference to the file, too.  Or, if you know you have 30 fields, you could create 30 different fields, each holding either a flag indicating the presence of a difference, or perhaps a string showing the two values themselves.  Doing the latter is a lot more work, but your Crystal Report now can get away with ONLY querying the Difference table.
I suppose all this depends on your expected outputs.  If you are looking for similar keys, the problem gets more difficult.

With the first column being the unique key,
source 1 has:
12,A

Source 2 has:
21,A

Was that a transposed ID column 12 for 21 or are those two different rows?

How would you want to display that difference in the report?

There is the .Net EXCEPT that might help as well.

In the pure database world, I would do two MINUS queries to get all the differences and missing rows between the tables.  How that is displayed in a report is more problematic.

If you can provide some sample data an expected results it will likely help us with your exact requirements.
why use a 3rd database ?
you have oracle and pervasive.
surely one of them has space enough to hold the comparison table
Thank you all for you comments.

Unfortunately I am not familiar with  Powershell scripts and I have to use vb.net for this project since is going to be an option added to an already existent application;

Both databases have a combination of key fields that makes a record unique;  for Oracle table
the key fields are: aircraft,log, and leg. There are fields on each table that does not exists in the other, but for my purposes I just care for the fields that "supposed" to have the same data. I am not allowed to modify any of the two databases so I had to use a third one.

This is what I have so far:
Created a sql to query pervasive table User generated image.
 by grouping into aircraft # (since this is the only field that seems to have a 95% accuracy);  I can iterate the records from pervasive table comparing the other fields with the records in the oracle table. Flagging the record as "F" if found in both tables (as in having the 3 key fields identical) and "H" if any of 3 key fields are different (because the entry was wrong  or because there are new records for oracle db), and saving it and a temp table, in other table save only the records with differences and marking each field with a 0 if different and -1 if is ok. I can't stop the comparison if I found the first  difference because I need to show all the differences within a record.
I created the report joining the tables with a left outer joinUser generated image but only the records with differences are showing up, the records that are identical are not  displayed.User generated image
WHat filtering are you doing on the table for the report?

mlmcc
ASKER CERTIFIED SOLUTION
Avatar of Bill Bach
Bill Bach
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for all your comments!

thanks Bill,
Here is a portion of my code. User generated image

thanks mlmcc,
After iterate through all the records comparing table A and B, I dump all the data in a table C which I bind to a datagridview. Every row in the datagridview has a checkbox which If checked it will be saved in table P. The report has two tables Table D (rows with differences) and P (rows to print), on the report I just join the tables and display the information.  I hope I answered your question.

What I don't know is why only the rows that are in table P and D are showing in the report if I joined the tables as "left outer join"; rows in table P that are not in table D is showing as blank space. I wonder if is something I am doing wrong.

thanks slightwv,
due to company policy I can't post an example of the real data, but I made up something similar that will example the data on the tables A and B.User generated image.
Can you post that sample data as TEXT so we can easily copy/paste it to create out own test cases?

>> similar that will example the data on the tables A and B.

Based on that sample data, what are the expected results?

In the original post you stated the tables had the same column data just different column names.  I'm not seeing that in the sample data.

It might help when posting the expected results also posting what columns map to what columns.
slightwv,
Here is the sample of the data and the report. Book1.xlsx  The columns names are different in the example, I am not sure why you say they are not.
Thanks
>>The columns names are different in the example, I am not sure why you say they are not.

In the original question you stated:
have a table that it "supposed" to have the same information.
 Both tables have the same columns (with diff. col. names).

I now see the Oracle columns store the times in separate columns (I would fire the developer/dba that designed that).

From your sample data and expected results:  In Oracle how do you decide AM or PM?

In the results, can you explain why A918 leg 2 is highlighted?  Is it only because of the arrival airport pervasive has 2AT and Oracle has FUG?

In the results, I also don't understand where the DEPARTURE, ARRIVAL data is coming from.
Oops! I knew I was missing something...  Oracle table uses 24 hr. system,  In my code where it says "Formatting Time" is where I convert the time from 12 to 24 to compare it to Table B. I fixed my mistake in the sample data:  Book1.xlsx

The A918 leg 2 is highlighted because one of the 3 key field is different (Table A = "A9181"  vs Table B = "A918"). In the logic I have to assume that Table A is missing the record.

Almost all Airports have two Id codes: ICAO and the other IATA:User generated image
Departure and Arrival are coming from the four columns of Table A: DEP STA ICAO (departure), ARR STA ICAO (arrival), DEP STA IATA(departure), DEP STA IATA  (arrival)
>>The A918 leg 2 is highlighted because one of the 3 key field is different (Table A = "A9181"  vs Table B = "A918").

But that isn't shown in the result.  I think you would want to show all the differences and let the human decide which value of which row was 'valid'.

More of a, there is something 'different' here, you make the call:
...
PervasiveRow,A9181,2,9/2/2016,8:28:00 AM,8:31:00 AM,10:13:00 AM,10:18:00 AM,1,ZSJH,ZSOF,JUH,2AT
OracleRow,A918,2,9/2/2016,8:28:00 AM,8:31:00 AM,10:13:00 AM,10:18:00 AM,1,ZSJH,ZSOF,JUH,2AT
...


Now looking back at your posted code:  Thing to fix #1:  NEVER use string concatenation to generate a SQL statement.  Do a quick Google on "SQL Injection".

Change it to use Bind Variables.

I would make the data returned from the Oracle query match the Pervasive data.  That should make comparison much easier.

I have an example test case based on your data that I'll post below.

I would still look at LINQ and probably start with the EXCEPT I posted above.

I'm swamped right now but if you cannot get there, I can try to work on an example later this week or early next.

Here is my Oracle query I would use (along with the test table setup):
/*
drop table tab1 purge;
create table tab1(ACTAILNO char(2), FLOG varchar2(5), FLEG number, FDATE date, OUTHR number, OUTMIN number, OFFHR number, OFFMIN number, ONHR number, ONMIN number, INHR number, INMIN number, CYCLES number, DEPARTURE char(3), ARRIVAL char(3));

insert into tab1 values('A1','A918',1,to_date('9/2/2016','mm/dd/yyyy'),5,46,5,53,7,45,7,48,1,'AQG','JUH');
insert into tab1 values('A1','A918',2,to_date('9/2/2016','mm/dd/yyyy'),8,28,8,31,10,13,10,18,1,'JUH','FUG');
insert into tab1 values('A1','A918',4,to_date('9/2/2016','mm/dd/yyyy'),14,36,14,43,15,10,15,25,1,'FUG','TXN');
insert into tab1 values('A1','A919',1,to_date('9/3/2016','mm/dd/yyyy'),5,55,6,2,6,30,6,45,1,'TXN','PEK');
insert into tab1 values('A1','A919',2,to_date('9/3/2016','mm/dd/yyyy'),23,45,23,59,0,19,0,25,1,'PEK','JIQ');
commit;
*/


select actailno,
	flog,
	fleg,
	to_char(fdate,'mm/dd/yyyy') fdate,
	to_char(to_date(outhr||':'||outmin,'HH24:MI'),'HH:MI:SS AM') out_time,
	to_char(to_date(offhr||':'||offmin,'HH24:MI'),'HH:MI:SS AM') off_time,
	to_char(to_date(onhr||':'||onmin,'HH24:MI'),'HH:MI:SS AM') on_time,
	to_char(to_date(inhr||':'||inmin,'HH24:MI'),'HH:MI:SS AM') in_time,
	cycles,
	departure,
	arrival
from tab1;

Open in new window



The results of that query:
A1 A918           1 09/02/2016 05:46:00 AM 05:53:00 AM 07:45:00 AM 07:48:00 AM 1 AQG JUH
A1 A918           2 09/02/2016 08:28:00 AM 08:31:00 AM 10:13:00 AM 10:18:00 AM 1 JUH FUG
A1 A918           4 09/02/2016 02:36:00 PM 02:43:00 PM 03:10:00 PM 03:25:00 PM 1 FUG TXN
A1 A919           1 09/03/2016 05:55:00 AM 06:02:00 AM 06:30:00 AM 06:45:00 AM 1 TXN PEK
A1 A919           2 09/03/2016 11:45:00 PM 11:59:00 PM 12:19:00 AM 12:25:00 AM 1 PEK JIQ

Open in new window

Thank you Bill and slighwv. I will look into LINQ Except.