Solved

Need ideas on how to better approach the comparison of two different databases tables and show all the differences in a report.

Posted on 2016-10-07
18
80 Views
Last Modified: 2016-10-13
Hello experts, I need your help,

I have two different databases: Oracle and Pervasive; each database have a table that it "supposed" to have the same information.
Both tables have the same columns (with diff. col. names). I need to pull out rows from the pervasive and compare each of the columns/rows with the oracle table; checking for differences in the data; data with no differences; and new data.
I need to create a report (using crystal rep 2013) that will display all the rows from the oracle table that were compared, highlighting all the differences found in the pervasive table, and at the end all new rows in pervasive not found in the oracle table.
I also need to create a table that will keep track of the rows from the pervasive table already compared with the oracle table, so next time the user run the application looking for differences, it will filter out all those rows. I am using MS Access database to help me with the comparison and to keep track of rows already compared.

All ideas are appreciated, Thank you!
0
Comment
Question by:mwells08
  • 6
  • 6
  • 3
  • +2
18 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 41834375
How large are the tables?

If they are small enough I would probably look at pulling both tables into .Net and use LINQ to compare them.  You should be able to take the LINQ output and use that as the source for Crystal.

I'm far from a .Net/LINQ Expert and know very little about Crystal but there are many examples out there on using LINQ to compare data sets.

This one appeared to be the best one I looked at:
http://www.codeproject.com/Articles/30102/Comparing-DataSets-using-LINQ
0
 
LVL 28

Expert Comment

by:Bill Bach
ID: 41834390
Doesn't seem THAT hard in the specific case -0- but building a general case is always the hard part.  Psuedocode solution would be:
- Iterate over all records in Database A Table in order by Unique Key
   - Read corresponding record from Database B Table by Unique Key
   - Set EqualFlag to 1
   - Iterate over all field names
      - Compare each field from Table A Record to Table B Record, Set EqualFlag to 0 if Different
        (This can be as simple or complicated as you need, due to data type mismatches and the like)
   - Write results to Difference Table (Unique Key Value, EqualFlag)

Once you've done the comparison, you can read the Difference table, pull the raw data from each database, and generate your Crystal Report as needed.

As for the last part, you ALWAYS have a chance that data on either side is changing, but as long as the unique keys are indeed unique and cannot be changed, then you can use the same process, but with a change to the top line:
  - Iterate over all records in Database A Table That are NOT in the Difference table in order by Unique Key
(This lookup can be done with a simple INNER JOIN  "WHERE DifferenceTable.ID IS NULL".)

All is straightforward enough, but again, the devils are in the details.......
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 41834422
Is there a KEY field that can link the 2 tables?

mlmcc
0
 

Author Comment

by:mwells08
ID: 41834438
Thank you both for the comments!
There are 30 columns on each table having 3 unique keys and one foreign. The problem here is that I can find differences in the unique keys as well because somebody entered the information misspelling one or two numbers or it can be truly new records.
I need to display the information in a datagridview showing which rows have differences, and allowing the user to select which row  to print.

Bill, could you please elaborate on "Set EqualFlag to 1" ? do you mean create a procedure that will raise a flag if different?
0
 
LVL 28

Expert Comment

by:Bill Bach
ID: 41834449
The identifying key values can be wrong?  Yikes!  That makes it impossible to find the error -- it will look like one side has a record and the other does not.  If you cannot rely on identifiers, then all bets are off.  

My original thought was a simple Powershell script.  If you know Powershell, you'll realize that this logic is almost the entire script!  Your analysis is accurate -- I was defining a variable called EqualFlag, and setting it so that the records are assumed to be equal.  Then, as you start comparing fields (which, again, can be horribly complicated), you simply set the flag to 0 when you detect a difference, and log that fact in the Difference table.  Of course, further optimizations will make it run faster (i.e. immediately terminate looking at individual fields once you find the first difference).  At the same time, you could even save the results of your comparisons into the Difference table, too, something like this:
  -  Compare each field from Table A Record to Table B Record.  If Different, set EqualFlag to 0 and Append FieldName to ThisDifference
Then, you write out ThisDifference to the file, too.  Or, if you know you have 30 fields, you could create 30 different fields, each holding either a flag indicating the presence of a difference, or perhaps a string showing the two values themselves.  Doing the latter is a lot more work, but your Crystal Report now can get away with ONLY querying the Difference table.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 41834474
I suppose all this depends on your expected outputs.  If you are looking for similar keys, the problem gets more difficult.

With the first column being the unique key,
source 1 has:
12,A

Source 2 has:
21,A

Was that a transposed ID column 12 for 21 or are those two different rows?

How would you want to display that difference in the report?

There is the .Net EXCEPT that might help as well.

In the pure database world, I would do two MINUS queries to get all the differences and missing rows between the tables.  How that is displayed in a report is more problematic.

If you can provide some sample data an expected results it will likely help us with your exact requirements.
0
 
LVL 36

Expert Comment

by:Geert Gruwez
ID: 41837942
why use a 3rd database ?
you have oracle and pervasive.
surely one of them has space enough to hold the comparison table
0
 

Author Comment

by:mwells08
ID: 41838843
Thank you all for you comments.

Unfortunately I am not familiar with  Powershell scripts and I have to use vb.net for this project since is going to be an option added to an already existent application;

Both databases have a combination of key fields that makes a record unique;  for Oracle table
the key fields are: aircraft,log, and leg. There are fields on each table that does not exists in the other, but for my purposes I just care for the fields that "supposed" to have the same data. I am not allowed to modify any of the two databases so I had to use a third one.

This is what I have so far:
Created a sql to query pervasive table Querying Pervasive.
 by grouping into aircraft # (since this is the only field that seems to have a 95% accuracy);  I can iterate the records from pervasive table comparing the other fields with the records in the oracle table. Flagging the record as "F" if found in both tables (as in having the 3 key fields identical) and "H" if any of 3 key fields are different (because the entry was wrong  or because there are new records for oracle db), and saving it and a temp table, in other table save only the records with differences and marking each field with a 0 if different and -1 if is ok. I can't stop the comparison if I found the first  difference because I need to show all the differences within a record.
I created the report joining the tables with a left outer joinjoin-in-crystal-rep.JPG but only the records with differences are showing up, the records that are identical are not  displayed.REPORT.JPG
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 41839264
WHat filtering are you doing on the table for the report?

mlmcc
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 28

Accepted Solution

by:
Bill Bach earned 300 total points
ID: 41839275
I'm not sure how you posted the code fragment, but it seems that only part of it displayed.  However, the original core still works.  It will change since you are only comparing certain fields, but it doesn't change by much:

- Iterate over all records in PSQL Table in order by ACTail,Log,Leg
   - Parse core fields as needed
   - Build Differences record with initial values (ACTail,Log,Leg)
   - SELECT corresponding record from Oracle Table Where TailField = ACTail AND LogField = Log AND LegField = Leg
   - If result set is not null, Set Differences.Flag = "F", Else Differences.Flag = "H"
   - If PSQL.field1 = Oracle.field1, then Differences.FieldFlag1 = -1, Else Differences.FieldFlag1 = 0
   - If PSQL.field2 = Oracle.field1, then Differences.FieldFlag2 = -1, Else Differences.FieldFlag1 = 0
   - If PSQL.field3 = Oracle.field1, then Differences.FieldFlag3 = -1, Else Differences.FieldFlag1 = 0
   - If PSQL.field4 = Oracle.field1, then Differences.FieldFlag4 = -1, Else Differences.FieldFlag1 = 0
    - etc.
   - Write results to Difference Table
- Loop until all records processed
0
 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 200 total points
ID: 41839341
Seems like you want two minus queries.  I'm sure that can be easily pulled off with .Net.

Better if you can use LINQ.  It appears to have the EXCEPT method as well:
https://msdn.microsoft.com/en-us/library/bb908822(v=vs.90).aspx

Again, please post sample data and expected results.  Then and only then can we post functional working examples.
0
 

Author Comment

by:mwells08
ID: 41840937
Thank you for all your comments!

thanks Bill,
Here is a portion of my code. code-part1.JPG

thanks mlmcc,
After iterate through all the records comparing table A and B, I dump all the data in a table C which I bind to a datagridview. Every row in the datagridview has a checkbox which If checked it will be saved in table P. The report has two tables Table D (rows with differences) and P (rows to print), on the report I just join the tables and display the information.  I hope I answered your question.

What I don't know is why only the rows that are in table P and D are showing in the report if I joined the tables as "left outer join"; rows in table P that are not in table D is showing as blank space. I wonder if is something I am doing wrong.

thanks slightwv,
due to company policy I can't post an example of the real data, but I made up something similar that will example the data on the tables A and B.Data-example.JPG.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 41840948
Can you post that sample data as TEXT so we can easily copy/paste it to create out own test cases?

>> similar that will example the data on the tables A and B.

Based on that sample data, what are the expected results?

In the original post you stated the tables had the same column data just different column names.  I'm not seeing that in the sample data.

It might help when posting the expected results also posting what columns map to what columns.
0
 

Author Comment

by:mwells08
ID: 41841930
slightwv,
Here is the sample of the data and the report. Book1.xlsx  The columns names are different in the example, I am not sure why you say they are not.
Thanks
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 41841944
>>The columns names are different in the example, I am not sure why you say they are not.

In the original question you stated:
have a table that it "supposed" to have the same information.
 Both tables have the same columns (with diff. col. names).

I now see the Oracle columns store the times in separate columns (I would fire the developer/dba that designed that).

From your sample data and expected results:  In Oracle how do you decide AM or PM?

In the results, can you explain why A918 leg 2 is highlighted?  Is it only because of the arrival airport pervasive has 2AT and Oracle has FUG?

In the results, I also don't understand where the DEPARTURE, ARRIVAL data is coming from.
0
 

Author Comment

by:mwells08
ID: 41842005
Oops! I knew I was missing something...  Oracle table uses 24 hr. system,  In my code where it says "Formatting Time" is where I convert the time from 12 to 24 to compare it to Table B. I fixed my mistake in the sample data:  Book1.xlsx

The A918 leg 2 is highlighted because one of the 3 key field is different (Table A = "A9181"  vs Table B = "A918"). In the logic I have to assume that Table A is missing the record.

Almost all Airports have two Id codes: ICAO and the other IATA:icao-and-iata.JPG
Departure and Arrival are coming from the four columns of Table A: DEP STA ICAO (departure), ARR STA ICAO (arrival), DEP STA IATA(departure), DEP STA IATA  (arrival)
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 41842093
>>The A918 leg 2 is highlighted because one of the 3 key field is different (Table A = "A9181"  vs Table B = "A918").

But that isn't shown in the result.  I think you would want to show all the differences and let the human decide which value of which row was 'valid'.

More of a, there is something 'different' here, you make the call:
...
PervasiveRow,A9181,2,9/2/2016,8:28:00 AM,8:31:00 AM,10:13:00 AM,10:18:00 AM,1,ZSJH,ZSOF,JUH,2AT
OracleRow,A918,2,9/2/2016,8:28:00 AM,8:31:00 AM,10:13:00 AM,10:18:00 AM,1,ZSJH,ZSOF,JUH,2AT
...


Now looking back at your posted code:  Thing to fix #1:  NEVER use string concatenation to generate a SQL statement.  Do a quick Google on "SQL Injection".

Change it to use Bind Variables.

I would make the data returned from the Oracle query match the Pervasive data.  That should make comparison much easier.

I have an example test case based on your data that I'll post below.

I would still look at LINQ and probably start with the EXCEPT I posted above.

I'm swamped right now but if you cannot get there, I can try to work on an example later this week or early next.

Here is my Oracle query I would use (along with the test table setup):
/*
drop table tab1 purge;
create table tab1(ACTAILNO char(2), FLOG varchar2(5), FLEG number, FDATE date, OUTHR number, OUTMIN number, OFFHR number, OFFMIN number, ONHR number, ONMIN number, INHR number, INMIN number, CYCLES number, DEPARTURE char(3), ARRIVAL char(3));

insert into tab1 values('A1','A918',1,to_date('9/2/2016','mm/dd/yyyy'),5,46,5,53,7,45,7,48,1,'AQG','JUH');
insert into tab1 values('A1','A918',2,to_date('9/2/2016','mm/dd/yyyy'),8,28,8,31,10,13,10,18,1,'JUH','FUG');
insert into tab1 values('A1','A918',4,to_date('9/2/2016','mm/dd/yyyy'),14,36,14,43,15,10,15,25,1,'FUG','TXN');
insert into tab1 values('A1','A919',1,to_date('9/3/2016','mm/dd/yyyy'),5,55,6,2,6,30,6,45,1,'TXN','PEK');
insert into tab1 values('A1','A919',2,to_date('9/3/2016','mm/dd/yyyy'),23,45,23,59,0,19,0,25,1,'PEK','JIQ');
commit;
*/


select actailno,
	flog,
	fleg,
	to_char(fdate,'mm/dd/yyyy') fdate,
	to_char(to_date(outhr||':'||outmin,'HH24:MI'),'HH:MI:SS AM') out_time,
	to_char(to_date(offhr||':'||offmin,'HH24:MI'),'HH:MI:SS AM') off_time,
	to_char(to_date(onhr||':'||onmin,'HH24:MI'),'HH:MI:SS AM') on_time,
	to_char(to_date(inhr||':'||inmin,'HH24:MI'),'HH:MI:SS AM') in_time,
	cycles,
	departure,
	arrival
from tab1;

Open in new window



The results of that query:
A1 A918           1 09/02/2016 05:46:00 AM 05:53:00 AM 07:45:00 AM 07:48:00 AM 1 AQG JUH
A1 A918           2 09/02/2016 08:28:00 AM 08:31:00 AM 10:13:00 AM 10:18:00 AM 1 JUH FUG
A1 A918           4 09/02/2016 02:36:00 PM 02:43:00 PM 03:10:00 PM 03:25:00 PM 1 FUG TXN
A1 A919           1 09/03/2016 05:55:00 AM 06:02:00 AM 06:30:00 AM 06:45:00 AM 1 TXN PEK
A1 A919           2 09/03/2016 11:45:00 PM 11:59:00 PM 12:19:00 AM 12:25:00 AM 1 PEK JIQ

Open in new window

0
 

Author Closing Comment

by:mwells08
ID: 41842745
Thank you Bill and slighwv. I will look into LINQ Except.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Overview: This article:       (a) explains one principle method to cross-reference invoice items in Quickbooks®       (b) explores the reasons one might need to cross-reference invoice items       (c) provides a sample process for creating a M…
Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
In Microsoft Access, when working with VBA, learn some techniques for writing readable and easily maintained code.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now