I'm developing a VB.Net application, and within that app, I have a daily processing involving 2 datatables which is surprisingly slow, hence my request for help.
The first data table is pulled out of a AS/400 DB2 table which has only about 5,000 rows but a lot of columns. I load all the table, i.e. all the columns, although I only need less than 20 columns. The Load over the dataadapter is quick (les than 10 seconds). The source table has a unique key consisting of 2 small fixed length strings (7 and 3 characters respectively). The data is about financial instruments, so each instrument has this unique key and about 20 columns of interest, and about 40 which I ignore
1st question: is there a big performance hit AFTER the load when processing the table if the source table has, say, 60 columns, and I only need 20 ?
The second data table, which is also pulled from an AS/400 DB2 DB, has about 100,000 rows and is actually having partially the same data like the first one, it features the same unique instrument key like the first datatable, but instead it has 1 row for each combination of key and field, i.e. it has say 20 rows for each instrument, for 1 row of the same instrument in the first datatable.
I.e. if Instrument X (with key 1234567-001) in datatable 1 has fields A, B and C, then datable 2 has 3 rows for instrument X:
- row 1 with key 1234567-001 and field name of field A and field value of field A
- row 2 with key 1234567-001 and field name of field B and field value of field B
- row 3 with key 1234567-001 and field name of field C and field value of field C
Both loads are done like this:
Dim adapterN As New OdbcDataAdapter(selectCmd, connDB2)
adapterN.SelectCommand.CommandText = selectCmd
where N is 1 or 2
The processing is about comparing the data from datatable 1, which is always up to date, with the data in datable 2, which has to be updated to be snychronised with datatable 1. I.e. the processing:
- reads each row from datatable 1
- loops on each of the 20 relevant fields of the row
- compares the field values in datatable 2 in each of the corresponding datatable 2 rows with the value in datatable 1, and if they differ:
- updates the field value in datatable 2 if it finds an entry for that field in datatable 2
- inserts a new row in datatable 2 if it does not find an entry for that field in datatable 2
Both these cases (update and insert) are pretty rare, because the financial instrument data does not change much from one day to the other, as it is mainly "base" or static data such as name and issuer and issue date and size and stuff like that. So I was expecting a pretty fast process, but it took HOURS when I ran that synchronization process the first time. So it seems that just the compare process, which involves selecting the right entry in the second dadatable based on the unique key and the field name in the first datatable, is very slow, and I can't really see why, as everything is happening in memory with datatables so i'd hoped it would be lightning fast.
Can you see reasons why such a compare process should be slow ? How can it be made faster ?
Thanks for help, hints and tips...