• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 312
  • Last Modified:

Looking for a better way to compare 3 datasets

I have 3 datasets that i need to compare to each other and find all of the columns that are different. This is actually for a Pos system so that datasets basically should be exact duplicates of each other. However, i can't just copy the datasets to each other because the correct values could be in any of the 3 locations.  So i am placing the differing items in another database to be manually sorted through later on.  

The problem that i have is that this script takes forever to run, since each dataset has about 15,000 items in it.  There may not be a better way to do this however, i just want to put the question out there incase anyone else has come across a similar problem.


Code:
Dim dsColl As New Collection()

        dsColl.Add(DataSet21.Tables("MenuItem"))
        dsColl.Add(DataSet31.Tables("MenuItem"))
        dsColl.Add(DataSet41.Tables("MenuItem"))
        stIndicator.Text = "Searching"
        stIndicator.Update()
        stopSearch = False
        For r = 1 To dsColl.Count ' This will circulate through all Location Comparisons
            If stopSearch = True Then
                Exit For
            End If
            Dim rTmp = r - 1
            Dim dbloca
            Dim dblocb
            If rTmp >= 2 Then
                dbloca = dsColl(1)
                dblocb = dsColl(3)
            Else
                dbloca = dsColl(rTmp + 1)
                dblocb = dsColl(rTmp + 2)

            End If
            For i1 = 0 To dbloca.Rows.Count - 1 ' This Searchs loca for plu's
                If stopSearch = True Then
                    Exit For
                End If
                stIndicator.Text = "Searching.." & dbloca.Rows(i1).Item(0)
                stIndicator.Update()
                For j = 0 To dblocb.Rows.Count - 1 'this searchs locb for plu's
                    If stopSearch = True Then
                        Exit For
                    End If
                    If dblocb.Rows(j).Item(0) = dbloca.Rows(i1).Item(0) Then 'this compares loca to locb's plu's
                        i2 = j
                        Exit For
                    End If
                Next
                For k = 0 To dbloca.Columns.Count - 1 'This searches each column
                    If k <> 1 Then
                        If CStr(dbloca.Rows(i1).Item(k) & "") <> CStr(dblocb.Rows(i2).Item(k) & "") Then

                            LstBx1.Items.Add("PLU (" & CStr(dbloca.Rows(i1).Item(0)) & _
                                                    ") in Location 1 has a different (" & _
                                                    dbloca.Columns(k).ColumnName() & ":" & _
                                                    CStr(dbloca.Rows(i1).Item(k) & ")") & _
                                                    " then in Location 2 (" & _
                                                    dblocb.Columns(k).ColumnName() & _
                                                    CStr(dblocb.Rows(i2).Item(k) & ")"))

                            LstBx1.Update()
                            stIndicator.Text = "Found a Difference in.." & dbloca.Rows(i1).Item(0)
                            stIndicator.Update()

                            AddChangedData(CStr(dbloca.Rows(i1).Item(0) & ""), _
                                CStr(dbloca.Columns(k).ColumnName & ""), _
                                CStr(dbloca.Rows(i1).Item(k) & ""), _
                                CInt(i1), _
                                CInt(dbloca.Rows(i1).Item(1)), _
                                CStr(dblocb.Columns(k).ColumnName & ""), _
                                CStr(dblocb.Rows(i2).Item(k) & ""), _
                                CInt(i2), _
                                CInt(dblocb.Rows(i2).Item(1)))



                        End If
                    End If

                Next
            Next
        Next
0
ethnarch
Asked:
ethnarch
  • 8
  • 3
  • 3
2 Solutions
 
mjwillsCommented:
I am not sure 100%, but if your code is looking through the second dataset once for every row in the first dataset (and so on) then it will be very slow.

Instead, sort all of them, then compare. This means you need to go through each dataset only once.

Lets assume the data was something like:

Dataset 1
1
2
3
4

Dataset 2
2
3
5

Dataset 3
2
3
4
5

First, get the first row of all 3 datasets. These will be as follows:

1         2         2

Since the data is ordered, we know that 1 is only in the first dataset and not the others. So we go to the next row in the first dataset:

2         2         2

All of them have 2 - good. So go to the next row in all three:

3         3         3

All of them have 3 - good. So go to the next row in all three:

4         5         4

The first and third dataset have 4, but the second doesn't. So go to the next row for the first and third:

End     5          5

No more data in the first, so obviously it doesn't have 5. Second and third both have 5. Go to the next row for second and third:

No more data in the first, second or third. Job done.

The above approach would be how I would attack the problem.
0
 
mjwillsCommented:
<quote>
End     5          5

No more data in the first, so obviously it doesn't have 5. Second and third both have 5. Go to the next row for second and third:

No more data in the first, second or third. Job done.

The above approach would be how I would attack the problem.
</quote>

SHOULD READ:

<corrected>
End     5          5

No more data in the first, so obviously it doesn't have 5. Second and third both have 5. Go to the next row for second and third:

End     End       End

No more data in the first, second or third. Job done.

The above approach would be how I would attack the problem.
</corrected>
0
 
ethnarchAuthor Commented:
Sounds simple enough, i am going to leave this Topic open though as possible there are some other ideas out there.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
ethnarchAuthor Commented:
The way you explained it doesn't seems to be really complicated, now that i am actually trying it.  

basically you are assuming that 1 is in the first dataset what if one is in the second or third dataset.
then the code get's really complicated, because not only do you have to compare each dataset you also have to figure out which dataset actually has the lowest value or highest depending on sorting order.

just determing what to do with the items in the dataset is mindboggling never mind comparing the data.  I seem to be building a ridiculous amount of If statements in order to accomplish this.

Am i approaching this wrong maybe i am missing something but this is what i am thinking i would have to do(this is short form not actual code)

If ds1.Plu <> ds2.PLU <> ds3.PLU then
  if ds1.Plu < ds2.Plu and ds2.PLu = ds3.Plu then
  if ds1.Plu < ds3.Plu and ds1.PLu = ds2.Plu  then
  if ds2.Plu < ds3.Plu and ds2.PLu = ds1.Plu  then
end if

I can't seem to wrap my mind around this one for some reason it's really complicated That code is not even correct either
I definitly need some help.
0
 
ethnarchAuthor Commented:
I tried drawing this out with flow charts but that's not helping either It seems the only real way to do this efficiently would be to make sure that all databases have a matching value in the sort by column.
0
 
SanclerCommented:
This is just brainstorming, but an alternative approach might be to cycle through db1, using its data to construct a search string that is then applied to db2 and, if and only if there is a match in that, also applied to db3.  If a match is found in both db2 and db3, somehow mark the relevant records as matched and exclude marked records from further searches: also mark the db1 record as matched.  When the cycle is complete, those records that are not marked are the ones you want to extract for later manual analysis.

Conceptually, the change in approach is one of searching for matches rather than looking for differences.  But, provided the structure of the databases will support it, it means that comparisons would be based on in-built methods which I would expect to be quicker than self-coding and there would be far fewer comparisons to be made.

Roger
0
 
ethnarchAuthor Commented:
sancler that seems to add an extra layer to the actual search. because first i have to run the code through all three databases searching for matches, then i can run the code through again to find the columns that don't match.  Maybe i am misunderstanding but that doesn't seem to really speed things up.

mjwills suggestion seems the best i just can't figure out how to put it into code it's really complicated. there must be someone out there willing to help.

0
 
mjwillsCommented:
<quote>
basically you are assuming that 1 is in the first dataset what if one is in the second or third dataset.
then the code get's really complicated, because not only do you have to compare each dataset you also have to figure out which dataset actually has the lowest value or highest depending on sorting order.
</quote>

It doesn't matter where it is. The trick is to only go to the next row on the datasets which have the lowest value. As long as you do that everything will be sweet.

Imagine instead of my original example the example was:

<change to original example>
Dataset 1
2
3
4

Dataset 2
1
2
3
5

Dataset 3
1
2
3
4
5

First, get the first row of all 3 datasets. These will be as follows:

2         1         1

Since the data is ordered, we know that 1 is only in the second and third datasets. So we go to the next row in the second and third datasets:

2         2         2
</change to original example>

See - the same basic principle works regardless of whether the first dataset holds 1 or not.
0
 
SanclerCommented:
Ethnarch

This is pseudo-code, I am not up to coding it for real.

===========================================================================================

db4 is a new database (same structure as all the others) that holds single copy of matching records.
db5 is a new database (same structure as all the others) to hold all non-matching records.

with db1
      for each record in db1
            clear "WHERE" clause
            for each field in record
                  add fieldname = fieldvalue to "WHERE" clause
            next field
            apply "WHERE" clause as filter to db2
            if match then
                  apply WHERE clause as filter to db3
                  if match then
                        'db1 record = db2 record = db3 record
                        copy db1 record to db4
                        delete records in db1, db2, db3
                  end if
            end if
      next record
end with

copy all remaining (non-deleted) records from db1, db2 and db3 to db5.

===========================================================================================

It is not, of course, going to be quite that simple: e.g. I am not sure what effect the deletion of records from db1 would have on the for/next loop for that.  But it illustrates the concept and why I think it might be quicker and - even though not so simple as my pseudo-code - a lot simpler to code than any other method.  YOU don't have to code the run through of the records in db2 and db3 and, within those, the run through of each of their fields: the in-built WHERE clause processor does that.

But, as I said, I was only brainstorming

Roger
0
 
ethnarchAuthor Commented:
Ok I honestly didn't completely understand either of your ideas but i mjwills helped out a lot, i combined the sort method with what i was already doing.  I think it will work but there seems to be a problem with my code it seems to find all the plu's as not matching and at the same time seems to be finding what i wanted to find.  Below i'll put a short post of my output and my current code.




OutPut:
      131      0      0
3892      "000000073950"      "PLU"      "000000073950"      132      1      "PLU"      "000000073950"      2      132      0      0
3893      "000000073950"      "PLU"      "000000073950"      132      1      "PLU"      "000000073950"      3      132      0      0
3894      "000000076753"      "PLU"      "000000076753"      133      1      "PLU"      "000000076753"      2      133      0      0
3895      "000000076753"      "PLU"      "000000076753"      133      1      "PLU"      "000000076753"      3      133      0      0
3896      "000000076760"      "PLU"      "000000076760"      134      1      "PLU"      "000000076760"      2      134      0      0
3897      "000000076760"      "PLU"      "000000076760"      134      1      "PLU"      "000000076760"      3      134      0      0
3898      "000000078139"      "PLU"      "000000078139"      135      1      "PLU"      "000000078139"      2      135      0      0
3899      "000000078139"      "PLU"      "000000078139"      135      1      "PLU"      "000000078139"      3      135      0      0
3900      "000000078306"      "PLU"      "000000078306"      136      1      "PLU"      "000000078306"      2      136      0      0
3901      "000000078306"      "Price"      "5.95"      136      1      "Price"      "5"      2      136      0      0
3902      "000000078306"      "PLU"      "000000078306"      136      1      "PLU"      "000000078306"      3      136      0      0
3903      "000000078313"      "PLU"      "000000078313"      137      1      "PLU"      "000000078313"      2      137      0      0
3904      "000000078313"      "Price"      "9.95"      137      1      "Price"      "11.95"      2      137      0      0
3905      "000000078313"      "PLU"      "000000078313"      137      1      "PLU"      "000000078313"      3      137      0      0
      0





Code:

Sub newSearchData() 'new way of searching data

        Dim i1, i2, i3, r, k, j, l As Integer
        i1 = i2 = i3 = 0

        stIndicator.Text = "Loading Loc1"
        stIndicator.Update()
        SqlDataAdapter1.Fill(DataSet21)

        stIndicator.Text = "Loading loc2"
        stIndicator.Update()
        SqlDataAdapter2.Fill(DataSet31)

        stIndicator.Text = "Loading loc3"
        stIndicator.Update()
        SqlDataAdapter3.Fill(DataSet41)

        Dim dsColl As New Collection()

        dsColl.Add(DataSet21.Tables("MenuItem"))
        dsColl.Add(DataSet31.Tables("MenuItem"))
        dsColl.Add(DataSet41.Tables("MenuItem"))

        stIndicator.Text = "Searching"
        stIndicator.Update()
        stopSearch = False

        For i1 = 100 To 150 'dsColl(1).Rows.Count - 1 ' This Searchs loc 1 for plu's
            If stopSearch = True Then
                Exit For
            End If
            stIndicator.Text = "Searching.." & dsColl(1).Rows(i1).Item(0)
            stIndicator.Update()
            For j = i2 To dsColl(2).Rows.Count - 1 'Plu Match Comparison AB
                If stopSearch = True Then
                    Exit For
                End If
                If dsColl(2).Rows(j).Item(0) = dsColl(1).Rows(i1).Item(0) Then

                    For k = 0 To dsColl(1).Columns.Count - 1 'Column comparison AC
                        If k <> 1 Then
                            If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(i2).Item(k) & "") Then
                                i2 = j
                                LstBx1.Items.Add("PLU (" & CStr(dsColl(1).Rows(i1).Item(0)) & _
                                                        ") in Location 1 has a different (" & _
                                                        dsColl(1).Columns(k).ColumnName() & ":" & _
                                                        CStr(dsColl(1).Rows(i1).Item(k) & ")") & _
                                                        " then in Location 2 (" & _
                                                        dsColl(2).Columns(k).ColumnName() & _
                                                        CStr(dsColl(2).Rows(i2).Item(k) & ")"))

                                LstBx1.Update()
                                stIndicator.Text = "Found a Difference in.." & dsColl(1).Rows(i1).Item(0)
                                stIndicator.Update()

                                AddChangedData(CStr(dsColl(1).Rows(i1).Item(0) & ""), _
                                    CStr(dsColl(1).Columns(k).ColumnName & ""), _
                                    CStr(dsColl(1).Rows(i1).Item(k) & ""), _
                                    CInt(i1), _
                                    CInt(dsColl(1).Rows(i1).Item(1)), _
                                    CStr(dsColl(2).Columns(k).ColumnName & ""), _
                                    CStr(dsColl(2).Rows(i2).Item(k) & ""), _
                                    CInt(i2), _
                                    CInt(dsColl(2).Rows(i2).Item(1)))
                            End If
                        End If
                    Next
                    Exit For
                End If
            Next

            For j = i3 To dsColl(3).Rows.Count - 1 'Plu Match Comparison AC
                If stopSearch = True Then
                    Exit For
                End If
                If dsColl(3).Rows(j).Item(0) = dsColl(1).Rows(i1).Item(0) Then
                    For k = 0 To dsColl(1).Columns.Count - 1 'Column comparison AC
                        If k <> 1 Then
                            If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(3).Rows(i3).Item(k) & "") Then
                                i3 = j
                                LstBx1.Items.Add("PLU (" & CStr(dsColl(1).Rows(i1).Item(0)) & _
                                                        ") in Location 1 has a different (" & _
                                                        dsColl(1).Columns(k).ColumnName() & ":" & _
                                                        CStr(dsColl(1).Rows(i1).Item(k) & ")") & _
                                                        " then in Location 3 (" & _
                                                        dsColl(3).Columns(k).ColumnName() & _
                                                        CStr(dsColl(3).Rows(i3).Item(k) & ")"))

                                LstBx1.Update()
                                stIndicator.Text = "Found a Difference in.." & dsColl(1).Rows(i1).Item(0)
                                stIndicator.Update()

                                AddChangedData(CStr(dsColl(1).Rows(i1).Item(0) & ""), _
                                    CStr(dsColl(1).Columns(k).ColumnName & ""), _
                                    CStr(dsColl(1).Rows(i1).Item(k) & ""), _
                                    CInt(i1), _
                                    CInt(dsColl(1).Rows(i1).Item(1)), _
                                    CStr(dsColl(3).Columns(k).ColumnName & ""), _
                                    CStr(dsColl(3).Rows(i3).Item(k) & ""), _
                                    CInt(i2), _
                                    CInt(dsColl(3).Rows(i3).Item(1)))



                            End If
                        End If

                    Next
                    Exit For
                End If
            Next
        Next


        stIndicator.Text = "Done"
        stIndicator.Update()
    End Sub
0
 
ethnarchAuthor Commented:
oh yeah i should also let you know i did the sorting in the SQL query this is the code for that


SQL QUERY CODE:
SELECT PLU, LocationNo, Item_no, Dept, Description, Ext_Desc, Sizes, Price
FROM MenuItem
WHERE (LocationNo = 1)
ORDER BY PLU
--------

Basically each dataAdapter looks like the above except the locationNo is different
0
 
ethnarchAuthor Commented:
I figured it out ehh it was another one of those little mistakes that takes you hours to find I accidently put i2 where i should have put j in this line where it reads:

If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(i2).Item(k) & "") Then

it should read:
If CStr(dsColl(1).Rows(i1).Item(k) & "") <> CStr(dsColl(2).Rows(j).Item(k) & "") Then
0
 
ethnarchAuthor Commented:
anyway thanks for the help on this one I think the speed is definitly much better
0
 
SanclerCommented:
Thanks for the points, but I don't really think my answer was the "accepted" one, nor that I really "assisted".

If you want to get Community Support to alter things so mjwills gets them all, I'd be happy.

Glad you got it sorted to your satisfaction.

Roger
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 8
  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now