# Removing DUpicate Entries in an Array

Posted on 2000-03-27
I am writing a VB app that will take an external file and read it into an array.
I then want to parse the array and remove any duplicate entries.  The entries are name, email addy pairs, and I am only concerned with duplicate email addys and that is why I am stepping by 2 through the arrays. I have made 2 identical copies of the array and have been attempting to do it like this:

Dim myComp
For i = 2 To k Step 2
For j = 4 To k Step 2
If i = j Then
j = j + 2
End If
If array1(i) = "" Then
j = 30 'this is past the end of the array and needs to be more dynamic as the size of the array will be.
End If
myComp = StrComp(array1(i), array2(j), vbTextCompare)
If myComp = 0 Then
array2(j) = ""
array1(j) = ""
End If
Next
Next

Unfortunately, that doesn't work, and I am really not sure why.  Doing it this way is not set in stone and if someone knows of a better algorithm to accomplish this that is fine, or if you know how to modify what I have to remove the duplicates that would be great.
Thanks,
Chris

Question by:churley
Expert Comment

Hi Churley,

This shorter code should do the job. You actually only need one array, and must better use the LBound and UBound functions (LBound is lowest array index, often 0, where UBound returns the highest array index).

Also, you only need forward checkng, that is if you have compared 2 with 6, you don't have to compare 6 with 2 anymore. The If array1(i)= "" loop is not necessary neither.

Dim myComp
For i = LBound(array1) + 1 To UBound(array1) Step 2
For j = i + 2 To UBound(array1) Step 2
myComp = StrComp(array1(i), array1(j), vbTextCompare)
If myComp = 0 Then
array1(j) = ""
End If
Next
Next

Hope this helps

Calacuccia
Author Comment

Actually, that throws it into an infinite loop in the 2nd For loop.  So I put a constant upper bound in and it doesn't infinite loop but it doesn't pull out the duplicates either.
Thanks.
Expert Comment

Hi Chris

Do not understand very well, it certainly worked when I tested it...

Could you tell how you output the array once it has been handled ?

The 2nd loop should never be infinite as it will only run from i (which is going up itself) to the upper bound of the array. The only thing I can think of is the very large size of your array which makes it look infinite ?

Are the duplicate records exact copies, and are you sure its the 2nd of the pair and not the first. Could depend on how your array is initialized..

Calacuccia
Expert Comment

I am not sure if this will work in your situation but....

Have you tried sorting the items and then stepping through each one and checking it against the previous?  If it matches the previous, then it is a duplicate and can be deleted.  The loop to delete the duplicates would need to run from lbound+1 to ubound.

hope this helps.

Good luck and best regards,

- Tom
Expert Comment

One more thought, while you are sorting the array, you could put another condition into the sorting algo that you use stating if = then delete.

- Tom
Author Comment

calacuccia -
Yeah I am sure it is the second and not the first, I used the debugger.  I think you are right though, it is a very large array(1 million max) and that is probably what made it seem infinite.

I am outputting to an external file using the following:

Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
l = l + 2
If array2(l) = "" Then
l = l + 2
Else
Print #FileNum, array2(l - 1)
Print #FileNum, array2(l)

Count1 = Count1 + 1
End If
Loop
Close FileNum
Author Comment

Actually, this may help....here is my entire code segment for the remove button with yours added in and my old one commented out.  Maybe this will help make some sense of it.

Private Sub cmdRemove_Click()
FileNum = FreeFile
Open "entries.txt" For Input As FileNum
Do Until EOF(FileNum)
k = k + 1
Line Input #FileNum, NextLine
LinesFromFile = LinesFromFile + NextLine + Chr(13) '+ Chr(10)
If LinesFromFile = "" Then
EOF (FileNum)
Else
array1(k) = LinesFromFile
array2(k) = LinesFromFile
LinesFromFile = ""
End If
Loop
i = 2
j = 4

Dim myComp
For i = LBound(array1) + 1 To UBound(array1) Step 2
For j = i + 2 To UBound(array1) Step 2
myComp = StrComp(array1(i), array1(j), vbTextCompare)
If myComp = 0 Then
array1(j) = ""
End If
Next
Next

'Dim myComp
'For i = 2 To k Step 2
'   For j = 4 To k Step 2
'       If i = j Then
'       j = j + 2
'       End If
'       If array1(i) = "" Then
'       j = 30
'       End If
'       myComp = StrComp(array1(i), array2(j), vbTextCompare)
'       If myComp = 0 Then
'       array2(j) = ""
'       array1(j) = ""
'       End If
'    Next
'Next

Close FileNum

Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
l = l + 2
If array1(l) = "" Then
l = l + 2
Else
Print #FileNum, array1(l - 1)
Print #FileNum, array1(l)

Count1 = Count1 + 1
End If
Loop
Close FileNum
cmdRemove.Enabled = False
cmdChoose.Enabled = True

End Sub
Accepted Solution

Hi churley,

I would add in the start of your sub a declaration for k

Dim k As integer

That will automatically set k to 0 at the start of your macro.

and then test again.

If that is not satisfactory, try to alter the start of the loop (first loop) as follows:

For i = LBound(array1)  To UBound(array1) Step 2

or

For i = LBound(array1) + 2 To UBound(array1) Step 2

You are right that k could be used instead of UBound.

Short of that, I don't see anything.

Hope this helps

Calacuccia
Author Comment

Actually....i just figured it out...my problem wasn't where i thought it was....it was in my output to the file section...

Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
l = l + 2
If array2(l) = "" Then
l = l + 2
Else
Print #FileNum, array2(l - 1)
Print #FileNum, array2(l)

Count1 = Count1 + 1
End If
Loop

in my if statement where I increment l, i then bypassed my print statement and thus it omitted good instances of the array.  I added:

Print #FileNum, array2(l - 1)
Print #FileNum, array2(l)
after the l=1+2 statement and that fixed it.  I am going to give you the points though since you took the time to help and in all reality I asked the wrong question.
Thanks.
Chris
Expert Comment

Thanks in return.

And indeed, the l+2 was double used, and by-passed some of the good instances. Well spot, did not see it neither.

Calacuccia
