read file into array and remove duplicates

Hello.  I have a file that has about 3000 names in it that i need to read into an array and remove the duplicate names.  they are files that need to be deleted from  a couple servers so i need to write this tiny app that reads the list of files, removes the duplicates and then deletes them.  here is the code i'm using to read the files into an array and seems to work ok:

Private Sub Form_Load()
Dim sArray() As String
FileToArray "C:\expiredcopyfiles.txt", sArray
For lCtr = 0 To UBound(sArray)
Debug.Print sArray(lCtr)
Next


End Sub

Public Sub FileToArray(ByVal FileName As String, _
    ByRef TheArray As Variant)
'PURPOSE:    Puts all lines of file into a string array
'PARAMETERS: FileName = FullPath of File
'            TheArray = StringArray to which contents
'                       Of File will be added.
'Example
'  Dim sArray() as String
'  FileToArray "C:\MyTextFile.txt", sArray
'  For lCtr = 0 to Ubound(sArray)
'  Debug.Print sArray(lCtr)
'  Next

'NOTES:
'  --  Requires a reference to Microsoft Scripting Runtime
'      Library
'  --  You can write this method in a number of different ways
'      For instance, you can take advantage of VB 6's ability to
'      return an array.
' --   You can also read all the contents of the file and use the
'      Split function with vbCrlf as the delimiter, but I
'      wanted to illustrate use of the ReadLine
'      and AtEndOfStream methods.
'**********************************************************

  Dim oFSO As New FileSystemObject
  Dim oFSTR As Scripting.TextStream
  Dim ret As Long
  Dim lCtr As Long

  If Dir(FileName) = "" Then Exit Sub

'Check if string array was passed
'If you want to permit other type of arrays (e.g.,
'variant) remove or modify this line
'If VarType(TheArray) <> vbArray + vbString Then Exit Sub
 
  On Error GoTo ErrorHandler
     Set oFSTR = oFSO.OpenTextFile(FileName)
     
     Do While Not oFSTR.AtEndOfStream
            ReDim Preserve TheArray(lCtr) As String
            TheArray(lCtr) = oFSTR.ReadLine
            lCtr = lCtr + 1
            DoEvents 'optional but with large file
                     'program will appear to hang
                     'without it
    Loop
     oFSTR.Close
     
ErrorHandler:
     Set oFSTR = Nothing
End Sub
______________________________________________________________________________________

Now, problem is that array has duplicates in it that need to be removed.  then i could simply loop through the
new array, add a .WAV extension to it and check if file exists and if so delete the file.  can someone help with removing duplicates from this array?

thanks!
linuxroxAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

VBRocksCommented:
Perhaps a simple approach would be to create a collection and add each name to the collection as "key" and as the "item".  Because a collection will not allow duplicates, an error will automatically occur, which you can catch, and then process accordingly.

'Create a module level collection variable:
Private cNames as New Collection

'Create a sub to add each name to the collection, and then catch any error that occurs
Private Sub AddName(ByVal sName as String)
'This is part of the key:  catch the error when the duplicate item cannot be added to the collection
OnError GoTo ErrorHandler

     'Try to add the name to the collection
     cNames.Add sName, sName

'If no error occurs, the name will be added, then you will exit the sub
Exit Sub

ErrorHandler:
     'If an error occured, the name was already in the collection, so delete it, or handle it according to your preference.

     'Clear the error
     err.Clear

     '***  Perform your special processing here with the duplicate name when the error occurs
    '     delete the record, or whatever...

End Sub


'Then, just call the sub from your code as you loop through each item of the array:
For lCtr = 0 To UBound(sArray)
     AddName sArray(lCtr)

Next
0
VBRocksCommented:
An additional idea, if you want to handle the processing of the duplicate item in your code instead of in the sub, you can use a boolean function instead of a sub as follows:

Private Function AddName(ByVal sName as String) As Boolean

OnError GoTo ErrorHandler

     'Try to add the name to the collection
     cNames.Add sName, sName

     'Return 'True' because the name was added successfully
     AddName = True

Exit Function

ErrorHandler:
     'Clear the error
     err.Clear

     'Return 'False' because an error occured trying to add the name to the collection
     AddName = False

End Function


'Then, just call the Function from your code as you loop through each item of the array:
For lCtr = 0 To UBound(sArray)
     If AddName(sArray(lCtr)) = False Then
          '***  Perform your special processing here with the duplicate name when the error occurs
          '     delete the record, or whatever...
     End If

Next
0
linuxroxAuthor Commented:
are you saying in my project go to "add / module" and insert the code there instead of the main form?
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

VBRocksCommented:
Oh no, just add the code to your main form.
0
linuxroxAuthor Commented:
gotchya...
ok, have that working but what do i do with the collection?  
For lCtr = 0 To UBound(sArray)
AddName sArray(lCtr)
'would like to add each item in the collection to a listbox or loop through
'the collection to perform functions on each item
Next
0
Rory ArchibaldCommented:
You may want to check out this Scripting Guys article - it does exactly what you want using a Dictionary object:
http://www.microsoft.com/technet/scriptcenter/resources/qanda/apr05/hey0413.mspx

HTH
Rory
0
VBRocksCommented:
Here is an example of what I mean.  Hopefully I didn't miss anything.  Let me know if you have any questions:

'Create a module level collection variable:
Private cNames as New Collection

Private Sub Form_Load()
Dim sArray() As String
FileToArray "C:\expiredcopyfiles.txt", sArray
'Then, just call the Function from your code as you loop through each item of the array:
For lCtr = 0 To UBound(sArray)
     If AddName(sArray(lCtr)) = False Then
          '***  Perform your special processing here with the duplicate name when the error occurs
          '     delete the record, or whatever...
     End If
Next

End Sub


Private Function AddName(ByVal sName as String) As Boolean

OnError GoTo ErrorHandler

     'Try to add the name to the collection
     cNames.Add sName, sName

     'Return 'True' because the name was added successfully
     AddName = True

Exit Function

ErrorHandler:
     'Clear the error
     err.Clear

     'Return 'False' because an error occured trying to add the name to the collection
     AddName = False

End Function


Public Sub FileToArray(ByVal FileName As String, _
    ByRef TheArray As Variant)

     'The code for your sub belongs here...

End Sub

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
VBRocksCommented:
Well, each of the items in the collection are unique items:  in other words, there are no duplicates in them.  So, you can just loop through each item in the collection and handle it anyway you like, such as adding them to a listbox, or whatever

Dim i as integer
For i = 1 to cNames.Count
     'perform your action with:  cNames.Item(i)

Next i
0
linuxroxAuthor Commented:
thanks guys!!! PERFECT!!
0
linuxroxAuthor Commented:
one thing:
in my file there are 2892 lines; each line has a duplicate..such as:
C200YC
C200YC
C200YD
C200YD
C200YE
C200YE
C200YF
C200YF
C200YG
C200YG
C200YH
C200YH
C200YI
C200YI
C200YJ
C200YJ
C200YK
C200YK

VBRocks: when i run the code i would expect the cnames.count to be 1446 because 2892 divided by two equals 1446, however i actually get a count of 1429  !?!?
any thoughts or ideas?
0
linuxroxAuthor Commented:
i could email you the file if you'd like zipped.  it's very small zipped.
my email is xf at westky.com
0
linuxroxAuthor Commented:
actually i guess 1429 is correct because i opened the file in editpad pro and removed duplicates and there were 1429 lines.  i just would have expected 1446 lines, perhaps my math and logic is wrong there..
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.