Link to home
Start Free TrialLog in
Avatar of linuxrox
linuxroxFlag for United States of America

asked on

read file into array and remove duplicates

Hello.  I have a file that has about 3000 names in it that i need to read into an array and remove the duplicate names.  they are files that need to be deleted from  a couple servers so i need to write this tiny app that reads the list of files, removes the duplicates and then deletes them.  here is the code i'm using to read the files into an array and seems to work ok:

Private Sub Form_Load()
Dim sArray() As String
FileToArray "C:\expiredcopyfiles.txt", sArray
For lCtr = 0 To UBound(sArray)
Debug.Print sArray(lCtr)
Next


End Sub

Public Sub FileToArray(ByVal FileName As String, _
    ByRef TheArray As Variant)
'PURPOSE:    Puts all lines of file into a string array
'PARAMETERS: FileName = FullPath of File
'            TheArray = StringArray to which contents
'                       Of File will be added.
'Example
'  Dim sArray() as String
'  FileToArray "C:\MyTextFile.txt", sArray
'  For lCtr = 0 to Ubound(sArray)
'  Debug.Print sArray(lCtr)
'  Next

'NOTES:
'  --  Requires a reference to Microsoft Scripting Runtime
'      Library
'  --  You can write this method in a number of different ways
'      For instance, you can take advantage of VB 6's ability to
'      return an array.
' --   You can also read all the contents of the file and use the
'      Split function with vbCrlf as the delimiter, but I
'      wanted to illustrate use of the ReadLine
'      and AtEndOfStream methods.
'**********************************************************

  Dim oFSO As New FileSystemObject
  Dim oFSTR As Scripting.TextStream
  Dim ret As Long
  Dim lCtr As Long

  If Dir(FileName) = "" Then Exit Sub

'Check if string array was passed
'If you want to permit other type of arrays (e.g.,
'variant) remove or modify this line
'If VarType(TheArray) <> vbArray + vbString Then Exit Sub
 
  On Error GoTo ErrorHandler
     Set oFSTR = oFSO.OpenTextFile(FileName)
     
     Do While Not oFSTR.AtEndOfStream
            ReDim Preserve TheArray(lCtr) As String
            TheArray(lCtr) = oFSTR.ReadLine
            lCtr = lCtr + 1
            DoEvents 'optional but with large file
                     'program will appear to hang
                     'without it
    Loop
     oFSTR.Close
     
ErrorHandler:
     Set oFSTR = Nothing
End Sub
______________________________________________________________________________________

Now, problem is that array has duplicates in it that need to be removed.  then i could simply loop through the
new array, add a .WAV extension to it and check if file exists and if so delete the file.  can someone help with removing duplicates from this array?

thanks!
Avatar of VBRocks
VBRocks
Flag of United States of America image

Perhaps a simple approach would be to create a collection and add each name to the collection as "key" and as the "item".  Because a collection will not allow duplicates, an error will automatically occur, which you can catch, and then process accordingly.

'Create a module level collection variable:
Private cNames as New Collection

'Create a sub to add each name to the collection, and then catch any error that occurs
Private Sub AddName(ByVal sName as String)
'This is part of the key:  catch the error when the duplicate item cannot be added to the collection
OnError GoTo ErrorHandler

     'Try to add the name to the collection
     cNames.Add sName, sName

'If no error occurs, the name will be added, then you will exit the sub
Exit Sub

ErrorHandler:
     'If an error occured, the name was already in the collection, so delete it, or handle it according to your preference.

     'Clear the error
     err.Clear

     '***  Perform your special processing here with the duplicate name when the error occurs
    '     delete the record, or whatever...

End Sub


'Then, just call the sub from your code as you loop through each item of the array:
For lCtr = 0 To UBound(sArray)
     AddName sArray(lCtr)

Next
An additional idea, if you want to handle the processing of the duplicate item in your code instead of in the sub, you can use a boolean function instead of a sub as follows:

Private Function AddName(ByVal sName as String) As Boolean

OnError GoTo ErrorHandler

     'Try to add the name to the collection
     cNames.Add sName, sName

     'Return 'True' because the name was added successfully
     AddName = True

Exit Function

ErrorHandler:
     'Clear the error
     err.Clear

     'Return 'False' because an error occured trying to add the name to the collection
     AddName = False

End Function


'Then, just call the Function from your code as you loop through each item of the array:
For lCtr = 0 To UBound(sArray)
     If AddName(sArray(lCtr)) = False Then
          '***  Perform your special processing here with the duplicate name when the error occurs
          '     delete the record, or whatever...
     End If

Next
Avatar of linuxrox

ASKER

are you saying in my project go to "add / module" and insert the code there instead of the main form?
Oh no, just add the code to your main form.
gotchya...
ok, have that working but what do i do with the collection?  
For lCtr = 0 To UBound(sArray)
AddName sArray(lCtr)
'would like to add each item in the collection to a listbox or loop through
'the collection to perform functions on each item
Next
SOLUTION
Avatar of Rory Archibald
Rory Archibald
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks guys!!! PERFECT!!
one thing:
in my file there are 2892 lines; each line has a duplicate..such as:
C200YC
C200YC
C200YD
C200YD
C200YE
C200YE
C200YF
C200YF
C200YG
C200YG
C200YH
C200YH
C200YI
C200YI
C200YJ
C200YJ
C200YK
C200YK

VBRocks: when i run the code i would expect the cnames.count to be 1446 because 2892 divided by two equals 1446, however i actually get a count of 1429  !?!?
any thoughts or ideas?
i could email you the file if you'd like zipped.  it's very small zipped.
my email is xf at westky.com
actually i guess 1429 is correct because i opened the file in editpad pro and removed duplicates and there were 1429 lines.  i just would have expected 1446 lines, perhaps my math and logic is wrong there..