asked on

Looking for a better way to compare 3 datasets

I have 3 datasets that i need to compare to each other and find all of the columns that are different. This is actually for a Pos system so that datasets basically should be exact duplicates of each other. However, i can't just copy the datasets to each other because the correct values could be in any of the 3 locations. So i am placing the differing items in another database to be manually sorted through later on.

The problem that i have is that this script takes forever to run, since each dataset has about 15,000 items in it. There may not be a better way to do this however, i just want to put the question out there incase anyone else has come across a similar problem.

Code:
Dim dsColl As New Collection()

dsColl.Add(DataSet21.Tables("MenuItem"))
dsColl.Add(DataSet31.Tables("MenuItem"))
dsColl.Add(DataSet41.Tables("MenuItem"))
stIndicator.Text = "Searching"
stIndicator.Update()
stopSearch = False
For r = 1 To dsColl.Count ' This will circulate through all Location Comparisons
If stopSearch = True Then
Exit For
End If
Dim rTmp = r - 1
Dim dbloca
Dim dblocb
If rTmp >= 2 Then
dbloca = dsColl(1)
dblocb = dsColl(3)
Else
dbloca = dsColl(rTmp + 1)
dblocb = dsColl(rTmp + 2)

End If
For i1 = 0 To dbloca.Rows.Count - 1 ' This Searchs loca for plu's
If stopSearch = True Then
Exit For
End If
stIndicator.Text = "Searching.." & dbloca.Rows(i1).Item(0)
stIndicator.Update()
For j = 0 To dblocb.Rows.Count - 1 'this searchs locb for plu's
If stopSearch = True Then
Exit For
End If
If dblocb.Rows(j).Item(0) = dbloca.Rows(i1).Item(0) Then 'this compares loca to locb's plu's
i2 = j
Exit For
End If
Next
For k = 0 To dbloca.Columns.Count - 1 'This searches each column
If k <> 1 Then
If CStr(dbloca.Rows(i1).Item(k) & "") <> CStr(dblocb.Rows(i2).Item(k) & "") Then

LstBx1.Items.Add("PLU (" & CStr(dbloca.Rows(i1).Item(0)) & _
") in Location 1 has a different (" & _
dbloca.Columns(k).ColumnName() & ":" & _
CStr(dbloca.Rows(i1).Item(k) & ")") & _
" then in Location 2 (" & _
dblocb.Columns(k).ColumnName() & _
CStr(dblocb.Rows(i2).Item(k) & ")"))

LstBx1.Update()
stIndicator.Text = "Found a Difference in.." & dbloca.Rows(i1).Item(0)
stIndicator.Update()

AddChangedData(CStr(dbloca.Rows(i1).Item(0) & ""), _
CStr(dbloca.Columns(k).ColumnName & ""), _
CStr(dbloca.Rows(i1).Item(k) & ""), _
CInt(i1), _
CInt(dbloca.Rows(i1).Item(1)), _
CStr(dblocb.Columns(k).ColumnName & ""), _
CStr(dblocb.Rows(i2).Item(k) & ""), _
CInt(i2), _
CInt(dblocb.Rows(i2).Item(1)))

End If
End If

Next
Next
Next

SOLUTION

mjwills

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mjwills

<quote>
End 5 5

No more data in the first, so obviously it doesn't have 5. Second and third both have 5. Go to the next row for second and third:

No more data in the first, second or third. Job done.

The above approach would be how I would attack the problem.
</quote>

SHOULD READ:

<corrected>
End 5 5

No more data in the first, so obviously it doesn't have 5. Second and third both have 5. Go to the next row for second and third:

End End End

No more data in the first, second or third. Job done.

The above approach would be how I would attack the problem.
</corrected>

ethnarch

ASKER

Sounds simple enough, i am going to leave this Topic open though as possible there are some other ideas out there.

ethnarch

ASKER

The way you explained it doesn't seems to be really complicated, now that i am actually trying it.

basically you are assuming that 1 is in the first dataset what if one is in the second or third dataset.
then the code get's really complicated, because not only do you have to compare each dataset you also have to figure out which dataset actually has the lowest value or highest depending on sorting order.

just determing what to do with the items in the dataset is mindboggling never mind comparing the data. I seem to be building a ridiculous amount of If statements in order to accomplish this.

Am i approaching this wrong maybe i am missing something but this is what i am thinking i would have to do(this is short form not actual code)

If ds1.Plu <> ds2.PLU <> ds3.PLU then
if ds1.Plu < ds2.Plu and ds2.PLu = ds3.Plu then
if ds1.Plu < ds3.Plu and ds1.PLu = ds2.Plu then
if ds2.Plu < ds3.Plu and ds2.PLu = ds1.Plu then
end if

I can't seem to wrap my mind around this one for some reason it's really complicated That code is not even correct either
I definitly need some help.

ethnarch

ASKER

I tried drawing this out with flow charts but that's not helping either It seems the only real way to do this efficiently would be to make sure that all databases have a matching value in the sort by column.

ASKER CERTIFIED SOLUTION

Sancler

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ethnarch

ASKER

sancler that seems to add an extra layer to the actual search. because first i have to run the code through all three databases searching for matches, then i can run the code through again to find the columns that don't match. Maybe i am misunderstanding but that doesn't seem to really speed things up.

mjwills suggestion seems the best i just can't figure out how to put it into code it's really complicated. there must be someone out there willing to help.

mjwills

<quote>
basically you are assuming that 1 is in the first dataset what if one is in the second or third dataset.
then the code get's really complicated, because not only do you have to compare each dataset you also have to figure out which dataset actually has the lowest value or highest depending on sorting order.
</quote>

It doesn't matter where it is. The trick is to only go to the next row on the datasets which have the lowest value. As long as you do that everything will be sweet.

Imagine instead of my original example the example was:

<change to original example>
Dataset 1
2
3
4

Dataset 2
1
2
3
5

Dataset 3
1
2
3
4
5

First, get the first row of all 3 datasets. These will be as follows:

2 1 1

Since the data is ordered, we know that 1 is only in the second and third datasets. So we go to the next row in the second and third datasets:

2 2 2
</change to original example>

See - the same basic principle works regardless of whether the first dataset holds 1 or not.

Sancler

Ethnarch

This is pseudo-code, I am not up to coding it for real.

===========================================================================================

db4 is a new database (same structure as all the others) that holds single copy of matching records.
db5 is a new database (same structure as all the others) to hold all non-matching records.

with db1
      for each record in db1
            clear "WHERE" clause
            for each field in record
                  add fieldname = fieldvalue to "WHERE" clause
            next field
            apply "WHERE" clause as filter to db2
            if match then
                  apply WHERE clause as filter to db3
                  if match then
                        'db1 record = db2 record = db3 record
                        copy db1 record to db4
                        delete records in db1, db2, db3
                  end if
            end if
      next record
end with

copy all remaining (non-deleted) records from db1, db2 and db3 to db5.

===========================================================================================

It is not, of course, going to be quite that simple: e.g. I am not sure what effect the deletion of records from db1 would have on the for/next loop for that. But it illustrates the concept and why I think it might be quicker and - even though not so simple as my pseudo-code - a lot simpler to code than any other method. YOU don't have to code the run through of the records in db2 and db3 and, within those, the run through of each of their fields: the in-built WHERE clause processor does that.

But, as I said, I was only brainstorming

Roger

ethnarch

ASKER

Ok I honestly didn't completely understand either of your ideas but i mjwills helped out a lot, i combined the sort method with what i was already doing. I think it will work but there seems to be a problem with my code it seems to find all the plu's as not matching and at the same time seems to be finding what i wanted to find. Below i'll put a short post of my output and my current code.

OutPut:
      131      0      0
3892      "000000073950"      "PLU"      "000000073950"      132      1      "PLU"      "000000073950"      2      132      0      0
3893      "000000073950"      "PLU"      "000000073950"      132      1      "PLU"      "000000073950"      3      132      0      0
3894      "000000076753"      "PLU"      "000000076753"      133      1      "PLU"      "000000076753"      2      133      0      0
3895      "000000076753"      "PLU"      "000000076753"      133      1      "PLU"      "000000076753"      3      133      0      0
3896      "000000076760"      "PLU"      "000000076760"      134      1      "PLU"      "000000076760"      2      134      0      0
3897      "000000076760"      "PLU"      "000000076760"      134      1      "PLU"      "000000076760"      3      134      0      0
3898      "000000078139"      "PLU"      "000000078139"      135      1      "PLU"      "000000078139"      2      135      0      0
3899      "000000078139"      "PLU"      "000000078139"      135      1      "PLU"      "000000078139"      3      135      0      0
3900      "000000078306"      "PLU"      "000000078306"      136      1      "PLU"      "000000078306"      2      136      0      0
3901      "000000078306"      "Price"      "5.95"      136      1      "Price"      "5"      2      136      0      0
3902      "000000078306"      "PLU"      "000000078306"      136      1      "PLU"      "000000078306"      3      136      0      0
3903      "000000078313"      "PLU"      "000000078313"      137      1      "PLU"      "000000078313"      2      137      0      0
3904      "000000078313"      "Price"      "9.95"      137      1      "Price"      "11.95"      2      137      0      0
3905      "000000078313"      "PLU"      "000000078313"      137      1      "PLU"      "000000078313"      3      137      0      0
      0

Code:

Sub newSearchData() 'new way of searching data

Dim i1, i2, i3, r, k, j, l As Integer
i1 = i2 = i3 = 0

stIndicator.Text = "Loading Loc1"
stIndicator.Update()
SqlDataAdapter1.Fill(DataSet21)

stIndicator.Text = "Loading loc2"
stIndicator.Update()
SqlDataAdapter2.Fill(DataSet31)

stIndicator.Text = "Loading loc3"
stIndicator.Update()
SqlDataAdapter3.Fill(DataSet41)

Dim dsColl As New Collection()

dsColl.Add(DataSet21.Tables("MenuItem"))
dsColl.Add(DataSet31.Tables("MenuItem"))
dsColl.Add(DataSet41.Tables("MenuItem"))

stIndicator.Text = "Searching"
stIndicator.Update()
stopSearch = False

For i1 = 100 To 150 'dsColl(1).Rows.Count - 1 ' This Searchs loc 1 for plu's
If stopSearch = True Then
Exit For
End If
stIndicator.Text = "Searching.." & dsColl(1).Rows(i1).Item(0)
stIndicator.Update()
For j = i2 To dsColl(2).Rows.Count - 1 'Plu Match Comparison AB
If stopSearch = True Then
Exit For
End If
If dsColl(2).Rows(j).Item(0) = dsColl(1).Rows(i1).Item(0) Then

For k = 0 To dsColl(1).Columns.Count - 1 'Column comparison AC
If k <> 1 Then
If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(i2).Item(k) & "") Then
i2 = j
LstBx1.Items.Add("PLU (" & CStr(dsColl(1).Rows(i1).Item(0)) & _
") in Location 1 has a different (" & _
dsColl(1).Columns(k).ColumnName() & ":" & _
CStr(dsColl(1).Rows(i1).Item(k) & ")") & _
" then in Location 2 (" & _
dsColl(2).Columns(k).ColumnName() & _
CStr(dsColl(2).Rows(i2).Item(k) & ")"))

LstBx1.Update()
stIndicator.Text = "Found a Difference in.." & dsColl(1).Rows(i1).Item(0)
stIndicator.Update()

AddChangedData(CStr(dsColl(1).Rows(i1).Item(0) & ""), _
CStr(dsColl(1).Columns(k).ColumnName & ""), _
CStr(dsColl(1).Rows(i1).Item(k) & ""), _
CInt(i1), _
CInt(dsColl(1).Rows(i1).Item(1)), _
CStr(dsColl(2).Columns(k).ColumnName & ""), _
CStr(dsColl(2).Rows(i2).Item(k) & ""), _
CInt(i2), _
CInt(dsColl(2).Rows(i2).Item(1)))
End If
End If
Next
Exit For
End If
Next

For j = i3 To dsColl(3).Rows.Count - 1 'Plu Match Comparison AC
If stopSearch = True Then
Exit For
End If
If dsColl(3).Rows(j).Item(0) = dsColl(1).Rows(i1).Item(0) Then
For k = 0 To dsColl(1).Columns.Count - 1 'Column comparison AC
If k <> 1 Then
If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(3).Rows(i3).Item(k) & "") Then
i3 = j
LstBx1.Items.Add("PLU (" & CStr(dsColl(1).Rows(i1).Item(0)) & _
") in Location 1 has a different (" & _
dsColl(1).Columns(k).ColumnName() & ":" & _
CStr(dsColl(1).Rows(i1).Item(k) & ")") & _
" then in Location 3 (" & _
dsColl(3).Columns(k).ColumnName() & _
CStr(dsColl(3).Rows(i3).Item(k) & ")"))

LstBx1.Update()
stIndicator.Text = "Found a Difference in.." & dsColl(1).Rows(i1).Item(0)
stIndicator.Update()

AddChangedData(CStr(dsColl(1).Rows(i1).Item(0) & ""), _
CStr(dsColl(1).Columns(k).ColumnName & ""), _
CStr(dsColl(1).Rows(i1).Item(k) & ""), _
CInt(i1), _
CInt(dsColl(1).Rows(i1).Item(1)), _
CStr(dsColl(3).Columns(k).ColumnName & ""), _
CStr(dsColl(3).Rows(i3).Item(k) & ""), _
CInt(i2), _
CInt(dsColl(3).Rows(i3).Item(1)))

End If
End If

Next
Exit For
End If
Next
Next

stIndicator.Text = "Done"
stIndicator.Update()
End Sub

ethnarch

ASKER

oh yeah i should also let you know i did the sorting in the SQL query this is the code for that

SQL QUERY CODE:
SELECT PLU, LocationNo, Item_no, Dept, Description, Ext_Desc, Sizes, Price
FROM MenuItem
WHERE (LocationNo = 1)
ORDER BY PLU
--------

Basically each dataAdapter looks like the above except the locationNo is different

ethnarch

ASKER

I figured it out ehh it was another one of those little mistakes that takes you hours to find I accidently put i2 where i should have put j in this line where it reads:

If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(i2).Item(k) & "") Then

it should read:
If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(j).Item(k) & "") Then

ethnarch

ASKER

anyway thanks for the help on this one I think the speed is definitly much better

Sancler

Thanks for the points, but I don't really think my answer was the "accepted" one, nor that I really "assisted".

If you want to get Community Support to alter things so mjwills gets them all, I'd be happy.

Glad you got it sorted to your satisfaction.

Roger