vlg
asked on
Best way to compare 2 lists(?) and prune one
Hello,
I'm thinking about creating 2 list-like data structures to manipulate. I don't really care what the data type is -- use whichever you think is best for this problem:
The first "list" will be populated with a bunch of integers (They represent id_numbers from a db). This is the reference list.
The second list will also have a bunch of integers. Neither list is too big, 50 numbers or so.
I'd like an algorithm which looks through the 2nd list, and if it finds a number NOT in the first list, removes it from the 2nd list, and also eliminates duplicates in the second list.
Put another way, no numbers are allowed in the 2nd list UNLESS they exist in the first list, and no duplicate numbers are allowed in the 2nd list, either.
There may be duplicates in either list to begin with, but in the end the 2nd list must have only unique values.
So, if the first (reference) list has 7 5 6 2 3 4 4 1 56
and the second list has 2 3 4 75 33 6 4 2
then, in the end, the second list can only contain (order doesn't matter)
2 3 4 6
Sorry this is only worth 120 points -- that's all I have left...
Thanks for any help
V
I'm thinking about creating 2 list-like data structures to manipulate. I don't really care what the data type is -- use whichever you think is best for this problem:
The first "list" will be populated with a bunch of integers (They represent id_numbers from a db). This is the reference list.
The second list will also have a bunch of integers. Neither list is too big, 50 numbers or so.
I'd like an algorithm which looks through the 2nd list, and if it finds a number NOT in the first list, removes it from the 2nd list, and also eliminates duplicates in the second list.
Put another way, no numbers are allowed in the 2nd list UNLESS they exist in the first list, and no duplicate numbers are allowed in the 2nd list, either.
There may be duplicates in either list to begin with, but in the end the 2nd list must have only unique values.
So, if the first (reference) list has 7 5 6 2 3 4 4 1 56
and the second list has 2 3 4 75 33 6 4 2
then, in the end, the second list can only contain (order doesn't matter)
2 3 4 6
Sorry this is only worth 120 points -- that's all I have left...
Thanks for any help
V
ASKER
Idle_Mind:
That looks great -- I'll try it right now. Can you tell me what's the deal with the newEnum() properties, hiding it, and giving it a Procedure number of -4? Why are all those things being done?
Thanks for the explanation,
V
That looks great -- I'll try it right now. Can you tell me what's the deal with the newEnum() properties, hiding it, and giving it a Procedure number of -4? Why are all those things being done?
Thanks for the explanation,
V
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
It makes use of collections to do a fast compare of the second list to the first. Duplicates in the second are automatically removed because you cannot have duplicate keys. The compare routine and the removal of items in the second collection will be EXTREMELY fast since we are doing lookups using the built in hash function in the collectiobn and removal of an item is simply changing the references in the linked list.
The limiting factor in this scheme is memory. You essentially have to duplicate both listboxes in memory. If both boxes are small it's not a problem. We are sacrificing memory for speed.
Anyway, here it is. Create a new project and on the main form and add two listboxes and a commandbutton. Add a class module and leave it named "Class1".
*** IMPORTANT! *** <<<-----------------------
Once you have pasted the code for the class module into Class1, you have to change the properties for the newEnum() function. Click on Tools --> Procedure Attributes and make sure newEnum is selected in the drop down box. Then click on Advanced and change the Procedure Id to -4. Under the attributes section, check the box that says "Hide this member" and hit OK.
Regards,
Idle_Mind
' Code for "Form1"
Option Explicit
Private Sub Form_Load()
' So, if the first (reference) list has 7 5 6 2 3 4 4 1 56
' and the second list has 2 3 4 75 33 6 4 2
' then, in the end, the second list can only contain (order doesn't matter)
' 2 3 4 6
List1.AddItem "7"
List1.AddItem "5"
List1.AddItem "6"
List1.AddItem "2"
List1.AddItem "3"
List1.AddItem "4"
List1.AddItem "4"
List1.AddItem "1"
List1.AddItem "56"
List2.AddItem "2"
List2.AddItem "3"
List2.AddItem "4"
List2.AddItem "75"
List2.AddItem "33"
List2.AddItem "6"
List2.AddItem "4"
List2.AddItem "2"
End Sub
Private Sub Command1_Click()
Dim colList1 As New Class1
Dim colList2 As New Class1
Dim a As Integer
Dim item As Variant
' Add items in first list box to first collection
For a = 0 To List1.ListCount - 1
colList1.addWithKey List1.List(a), List1.List(a)
Next a
' Add items in second list box to second collection
' Duplicates will not be added because we can't have duplicate keys
For a = 0 To List2.ListCount - 1
colList2.addWithKey List2.List(a), List2.List(a)
Next a
' Walk the second collection
' If the item from the second collection is not in the first collection,
' then remove it from second collection
For Each item In colList2
If Not colList1.exists(item) Then
colList2.removeByKey item
End If
Next item
' Empty the second list box and populate it with whatever is left in
' the second collection!
List2.clear
For Each item In colList2
List2.AddItem item
Next item
End Sub
<<<-----------------------
' Code for "Class1"
Option Explicit
Private myCollection As New Collection
Public Function newEnum() As IUnknown
On Error Resume Next
' Click on Tools --> Procedure Attributes --> Advanced
' This function should be a hidden member and the Procedure Id should be -4
Set newEnum = myCollection.[_NewEnum]
End Function
Public Function count() As Long
On Error Resume Next
count = myCollection.count
End Function
Private Sub Class_Terminate()
On Error Resume Next
Set myCollection = Nothing
End Sub
Public Sub add(ByVal item As Variant)
On Error Resume Next
myCollection.add item
Exit Sub
End Sub
Public Sub addWithKey(ByVal item As Variant, key As String)
On Error Resume Next
myCollection.add item, key
Exit Sub
End Sub
Public Sub removeByKey(ByVal key As String)
On Error Resume Next
myCollection.remove key
End Sub
Public Sub removeByIndex(ByVal index As Long)
On Error Resume Next
myCollection.remove index
End Sub
Public Sub clear()
On Error Resume Next
'Erase the contents of the collection
Set myCollection = Nothing
Set myCollection = New Collection
End Sub
Public Function exists(ByVal key As String) As Boolean
On Error GoTo exists_Error
myCollection.add key, key
exists = False
myCollection.remove key
Exit Function
exists_Error:
exists = True
End Function
Public Function itemByIndex(index As Long) As Variant
If index >= 1 And index <= myCollection.count Then
Set itemByIndex = myCollection.item(index)
Else
Set itemByIndex = Empty
End If
End Function
Public Function itemByKey(key As String) As Variant
If Me.exists(key) Then
Set itemByKey = myCollection.item(key)
Else
Set itemByKey = Empty
End If
End Function