Solved

Removing DUpicate Entries in an Array

Posted on 2000-03-27
10
172 Views
Last Modified: 2010-05-02
I am writing a VB app that will take an external file and read it into an array.  
I then want to parse the array and remove any duplicate entries.  The entries are name, email addy pairs, and I am only concerned with duplicate email addys and that is why I am stepping by 2 through the arrays. I have made 2 identical copies of the array and have been attempting to do it like this:

Dim myComp
For i = 2 To k Step 2
   For j = 4 To k Step 2
       If i = j Then
       j = j + 2
       End If
       If array1(i) = "" Then
       j = 30 'this is past the end of the array and needs to be more dynamic as the size of the array will be.
       End If
       myComp = StrComp(array1(i), array2(j), vbTextCompare)
       If myComp = 0 Then
       array2(j) = ""
       array1(j) = ""
       End If
    Next
Next


Unfortunately, that doesn't work, and I am really not sure why.  Doing it this way is not set in stone and if someone knows of a better algorithm to accomplish this that is fine, or if you know how to modify what I have to remove the duplicates that would be great.
Thanks,
Chris

0
Comment
Question by:churley
  • 4
  • 4
  • 2
10 Comments
 
LVL 17

Expert Comment

by:calacuccia
Comment Utility
Hi Churley,

This shorter code should do the job. You actually only need one array, and must better use the LBound and UBound functions (LBound is lowest array index, often 0, where UBound returns the highest array index).

Also, you only need forward checkng, that is if you have compared 2 with 6, you don't have to compare 6 with 2 anymore. The If array1(i)= "" loop is not necessary neither.

Dim myComp
For i = LBound(array1) + 1 To UBound(array1) Step 2
   For j = i + 2 To UBound(array1) Step 2
       myComp = StrComp(array1(i), array1(j), vbTextCompare)
       If myComp = 0 Then
       array1(j) = ""
       End If
    Next
Next



Hope this helps

Calacuccia
0
 

Author Comment

by:churley
Comment Utility
Actually, that throws it into an infinite loop in the 2nd For loop.  So I put a constant upper bound in and it doesn't infinite loop but it doesn't pull out the duplicates either.
Thanks.
0
 
LVL 17

Expert Comment

by:calacuccia
Comment Utility
Hi Chris

Do not understand very well, it certainly worked when I tested it...

Could you tell how you output the array once it has been handled ?

The 2nd loop should never be infinite as it will only run from i (which is going up itself) to the upper bound of the array. The only thing I can think of is the very large size of your array which makes it look infinite ?

Are the duplicate records exact copies, and are you sure its the 2nd of the pair and not the first. Could depend on how your array is initialized..

Calacuccia
0
 

Expert Comment

by:tcornett
Comment Utility
I am not sure if this will work in your situation but....

Have you tried sorting the items and then stepping through each one and checking it against the previous?  If it matches the previous, then it is a duplicate and can be deleted.  The loop to delete the duplicates would need to run from lbound+1 to ubound.

hope this helps.

Good luck and best regards,

- Tom
0
 

Expert Comment

by:tcornett
Comment Utility
One more thought, while you are sorting the array, you could put another condition into the sorting algo that you use stating if = then delete.

- Tom
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:churley
Comment Utility
calacuccia -
Yeah I am sure it is the second and not the first, I used the debugger.  I think you are right though, it is a very large array(1 million max) and that is probably what made it seem infinite.

I am outputting to an external file using the following:

Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
 l = l + 2
 If array2(l) = "" Then
 l = l + 2
 Else
 Print #FileNum, array2(l - 1)
 Print #FileNum, array2(l)
 
 Count1 = Count1 + 1
 End If
Loop
Close FileNum
0
 

Author Comment

by:churley
Comment Utility
Actually, this may help....here is my entire code segment for the remove button with yours added in and my old one commented out.  Maybe this will help make some sense of it.

Private Sub cmdRemove_Click()
FileNum = FreeFile
Open "entries.txt" For Input As FileNum
Do Until EOF(FileNum)
    k = k + 1
    Line Input #FileNum, NextLine
    LinesFromFile = LinesFromFile + NextLine + Chr(13) '+ Chr(10)
    If LinesFromFile = "" Then
    EOF (FileNum)
    Else
    array1(k) = LinesFromFile
    array2(k) = LinesFromFile
    LinesFromFile = ""
    End If
Loop
i = 2
j = 4


Dim myComp
For i = LBound(array1) + 1 To UBound(array1) Step 2
   For j = i + 2 To UBound(array1) Step 2
       myComp = StrComp(array1(i), array1(j), vbTextCompare)
       If myComp = 0 Then
       array1(j) = ""
       End If
    Next
Next





'Dim myComp
'For i = 2 To k Step 2
'   For j = 4 To k Step 2
'       If i = j Then
'       j = j + 2
'       End If
'       If array1(i) = "" Then
'       j = 30
'       End If
'       myComp = StrComp(array1(i), array2(j), vbTextCompare)
'       If myComp = 0 Then
'       array2(j) = ""
'       array1(j) = ""
'       End If
'    Next
'Next

Close FileNum
     
Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
 l = l + 2
 If array1(l) = "" Then
 l = l + 2
 Else
 Print #FileNum, array1(l - 1)
 Print #FileNum, array1(l)
 
 Count1 = Count1 + 1
 End If
Loop
Close FileNum
cmdRemove.Enabled = False
cmdChoose.Enabled = True

End Sub
0
 
LVL 17

Accepted Solution

by:
calacuccia earned 100 total points
Comment Utility
Hi churley,

I would add in the start of your sub a declaration for k

Dim k As integer

That will automatically set k to 0 at the start of your macro.

and then test again.

If that is not satisfactory, try to alter the start of the loop (first loop) as follows:

For i = LBound(array1)  To UBound(array1) Step 2

or

For i = LBound(array1) + 2 To UBound(array1) Step 2
   
You are right that k could be used instead of UBound.

Short of that, I don't see anything.

Hope this helps

Calacuccia
0
 

Author Comment

by:churley
Comment Utility
Actually....i just figured it out...my problem wasn't where i thought it was....it was in my output to the file section...

I had this:
Count1 = 1
Open "single.txt" For Output As FileNum
l = 0
Do Until l > k
 l = l + 2
 If array2(l) = "" Then
 l = l + 2
 Else
 Print #FileNum, array2(l - 1)
 Print #FileNum, array2(l)
 
 Count1 = Count1 + 1
 End If
Loop


in my if statement where I increment l, i then bypassed my print statement and thus it omitted good instances of the array.  I added:

Print #FileNum, array2(l - 1)
Print #FileNum, array2(l)
after the l=1+2 statement and that fixed it.  I am going to give you the points though since you took the time to help and in all reality I asked the wrong question.
Thanks.
Chris
0
 
LVL 17

Expert Comment

by:calacuccia
Comment Utility
Thanks in return.

And indeed, the l+2 was double used, and by-passed some of the good instances. Well spot, did not see it neither.

Calacuccia
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

I’ve seen a number of people looking for examples of how to access web services from VB6.  I’ve been using a test harness I built in VB6 (using many resources I found online) that I use for small projects to work out how to communicate with web serv…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now