Link to home
Start Free TrialLog in
Avatar of Travis Hydzik
Travis HydzikFlag for Australia

asked on

VB6 fastest way to get filenames in a folder

I am looking for the fastest method to read a large number of filenames in a folder.

doesn't matter how it achieves this, just needs to be fast.
Avatar of Ultra_Master
Ultra_Master

Avatar of Dana Seaman
Use API; Unlike most other samples this code also supports Unicode:


Option Explicit

Public Enum FileAttributes
   ReadOnly = &H1
   Hidden = &H2
   System = &H4
   Volume = &H8
   Directory = &H10
   Archive = &H20
   Alias = &H40 ' or Device [reserved]
   Normal = &H80
   Temporary = &H100
   SparseFile = &H200
   ReparsePoint = &H400
   Compressed = &H800
   Offline = &H1000
   NotContentIndexed = &H2000
   Encrypted = &H4000
   Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then  'PreserveEnumCase
   Private ReadOnly, Hidden, System, Volume, Directory, Archive
   Private Alias, Normal, Temporary, SparseFile, ReparsePoint
   Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If

Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long

Private Const MAX_PATH = 260

Private Type WIN32_FIND_DATA
   dwFileAttributes     As Long
   ftCreationTime       As Currency
   ftLastAccessTime     As Currency
   ftLastWriteTime      As Currency
   nFileSizeBig         As Currency
   dwReserved0          As Long
   dwReserved1          As Long
   cFileName            As String * MAX_PATH
   cShortFileName       As String * 14
End Type

Public Sub EnumFolders(ByVal sPath As String, _
   Optional ByVal sPattern As String = "*.*", _
   Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
   Optional ByVal bRecurse As Boolean = False)

   Dim lHandle          As Long
   Dim sFileName        As String
   Dim Lines            As Long
   Dim wFD              As WIN32_FIND_DATA

   On Error GoTo ProcedureError

   sPath = QualifyPath(sPath)

   lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
   If lHandle > 0 Then
      Do
         With wFD
            If AscW(.cFileName) <> 46 Then  'skip . and .. entries
               sFileName = StripNull(.cFileName)
               If (.dwFileAttributes And Directory) Then
                  If bRecurse Then
                     EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
                  End If
               ElseIf (.dwFileAttributes And lAttributeFilter) Then
                  List1.AddItem sFileName
               End If
            End If
         End With
      Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
   End If
   FindClose lHandle
   Exit Sub
ProcedureError:
   Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"

End Sub

Public Function StripNull(StrIn As String) As String
   Dim nul              As Long
   nul = InStr(StrIn, vbNullChar)
   If (nul) Then
      StripNull = Left$(StrIn, nul - 1)
   Else
      StripNull = Trim$(StrIn)
   End If
End Function

Public Function QualifyPath(ByVal Path As String) As String
   Dim Delimiter        As String   ' segmented path delimiter

   If InStr(Path, "://") > 0 Then      ' it's a URL path
      Delimiter = "/"                 ' use URL path delimiter
   Else                                ' it's a disk based path
      Delimiter = "\"                 ' use disk based path delimiter
   End If

   Select Case Right$(Path, 1)         ' whats last character in path?
      Case "/", "\"                       ' it's one of the valid delimiters
         QualifyPath = Path              ' use the supplied path
      Case Else                           ' needs a trailing path delimiter
         QualifyPath = Path & Delimiter  ' append it
   End Select
End Function

Private Sub Form_Load()
   List1.Clear
   EnumFolders App.Path
End Sub

Open in new window

@thydzik

1. Do you need to iterate the entire directory sub-tree or just the current/target directory?

2. Be aware that the fastest iteration would NOT involve adding items to a listbox (as was shown in these two code references)

3. What do you need to do with the file list?

4. Do you need a list of all the files or do you need to apply some filter to the list of files, returning the file names that match the filter?
If a list of files is a necessity, you can use a little trick. Make your listbox an array of two listboxes. Set List1(0).Visible = False, and List(1).Visible = True. Populate List1(0), which will happen quicker because there will be no screen/object refreshes to do. Upon completion of population, set List1(0).Visible = True. In other words, populate the listbox when it is invisible, then only show it when it is full. At this time you either set List1(1).Visible = False, or you can avoid that statement by using "Bring to front" (or use ZOrder) to make List1(0) appear on top of List1(1).
ASKER CERTIFIED SOLUTION
Avatar of HooKooDooKu
HooKooDooKu

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
In practice the code I provided should be added to a class with the filename and/or other info returned via an Event. Unlike using the Dir command (returns ANSI only), this code supports Unicode filenames.

Option Explicit

Private WithEvents cF  As cFiles

Private Sub cF_ItemDetails(ByVal strFileName As String)
   List1.AddItem strFileName
End Sub

Private Sub Form_Load()
   List1.Clear
   Set cF = New cFiles
   cF.EnumFolders App.Path
End Sub

 Add this code to a class named cFiles:


Option Explicit

Public Enum FileAttributes
   ReadOnly = &H1
   Hidden = &H2
   System = &H4
   Volume = &H8
   Directory = &H10
   Archive = &H20
   Alias = &H40 ' or Device [reserved]
   Normal = &H80
   Temporary = &H100
   SparseFile = &H200
   ReparsePoint = &H400
   Compressed = &H800
   Offline = &H1000
   NotContentIndexed = &H2000
   Encrypted = &H4000
   Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then  'PreserveEnumCase
   Private ReadOnly, Hidden, System, Volume, Directory, Archive
   Private Alias, Normal, Temporary, SparseFile, ReparsePoint
   Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If

Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long

Private Const MAX_PATH = 260

Private Type WIN32_FIND_DATA
   dwFileAttributes     As Long
   ftCreationTime       As Currency
   ftLastAccessTime     As Currency
   ftLastWriteTime      As Currency
   nFileSizeBig         As Currency
   dwReserved0          As Long
   dwReserved1          As Long
   cFileName            As String * MAX_PATH
   cShortFileName       As String * 14
End Type

Public Event ItemDetails(ByVal strFileName As String)

Public Sub EnumFolders(ByVal sPath As String, _
   Optional ByVal sPattern As String = "*.*", _
   Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
   Optional ByVal bRecurse As Boolean = False)

   Dim lHandle          As Long
   Dim sFileName        As String
   Dim Lines            As Long
   Dim wFD              As WIN32_FIND_DATA

   On Error GoTo ProcedureError

   sPath = QualifyPath(sPath)

   lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
   If lHandle > 0 Then
      Do
         With wFD
            If AscW(.cFileName) <> 46 Then  'skip . and .. entries
               sFileName = StripNull(.cFileName)
               If (.dwFileAttributes And Directory) Then
                  If bRecurse Then
                     EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
                  End If
               ElseIf (.dwFileAttributes And lAttributeFilter) Then
                  RaiseEvent ItemDetails(sFileName)
               End If
            End If
         End With
      Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
   End If
   FindClose lHandle
   Exit Sub
ProcedureError:
   Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"

End Sub

Private Function StripNull(StrIn As String) As String
   Dim nul              As Long
   nul = InStr(StrIn, vbNullChar)
   If (nul) Then
      StripNull = Left$(StrIn, nul - 1)
   Else
      StripNull = Trim$(StrIn)
   End If
End Function

Private Function QualifyPath(ByVal Path As String) As String
   Dim Delimiter        As String   ' segmented path delimiter

   If InStr(Path, "://") > 0 Then      ' it's a URL path
      Delimiter = "/"                 ' use URL path delimiter
   Else                                ' it's a disk based path
      Delimiter = "\"                 ' use disk based path delimiter
   End If

   Select Case Right$(Path, 1)         ' whats last character in path?
      Case "/", "\"                       ' it's one of the valid delimiters
         QualifyPath = Path              ' use the supplied path
      Case Else                           ' needs a trailing path delimiter
         QualifyPath = Path & Delimiter  ' append it
   End Select
End Function

Open in new window

Avatar of Travis Hydzik

ASKER

experts, thank you for all the replies. I am in the process of reviewing the solutions.

aikimark,
I do not need to iterate sub directories.

I do not need to add them to a list box, only iterate through the files. I will parse the file's content.

I need to apply a single filter, i.e. all .txt files.
Please try this function.  It returns a collection of file names.

Option Explicit

Public Function GetTextFileNames(ByVal parmPath As String) As Collection
  Dim strFilename As String
  Dim strPath As String
  Dim colFiles As New Collection
  If Len(Dir(parmPath, vbDirectory)) <> 0 Then
    strPath = parmPath
    If Right$(strPath, 1) <> "\" Then
      strPath = strPath & "\"
    End If
    strFilename = Dir(strPath & "*.txt")
    Do Until Len(strFilename) = 0
      colFiles.Add strFilename
      strFilename = Dir()
    Loop
  End If
  Set GetTextFileNames = colFiles
  Set colFiles = Nothing
End Function

Sub testget()
  'Drive the GetTextFilenames function
  Dim colTF As Collection
  Dim vFile As Variant
  Set colTF = GetTextFileNames("C:\Users\AikiMark\Downloads")
  Debug.Print colTF.Count
  For Each vFile In colTF
    Debug.Print vFile
  Next
End Sub

Open in new window

@thydzik

What is the context of this question?

Are you monitoring a directory for new files?   If so, a better architecture would be to have the OS notify your program when changes happen to the folder.  That way, you only look at the contents when you need to.
aikimark,
No, I just want to read a large amount of files as quick as possible.
okay, I have done two basic tests.
one using Ultra_Master reply of the API and one using the known Dir.
Dir seems to be 40% faster, does this sound right?

>>...read a large amount of files as quick as possible

Does that mean that you need to open the text files and read their content?

How often will this run?

Will these be local directories or directories on a file server?

========
The greater the number of file names returned, the greater the performance difference of using a collection object (or dictionary object) as the return data type.  String concatenation can be a real performance killer.

If your code has already performed some of the data validation steps prior to invocation, you can remove them from my code.
aikimark, I am opening the text files, but I am not including these in my tests.

these were my two test cases;

hFile = FindFirstFile(fold & "*.txt", WFD)
   If hFile <> INVALID_HANDLE_VALUE Then
      Do
         filestr = TrimNull(WFD.cFileName)
         i = i + 1
      Loop While FindNextFile(hFile, WFD)
   End If
   Call FindClose(hFile)

Open in new window


 
filestr = Dir(fold & "*.txt", vbNormal)
Do While LenB(filestr) > 0
    i = i + 1
    filestr = Dir$
Loop

Open in new window

@thydzik

>>does this sound right?

It is counter-intuitive, but is possible.  The danaseaman API code is more general and was designed to traverse a directory tree.  As such, it contains code that might not be as streamlined as possible.  Also, it is unicode friendly, which may add some overhead.

Which DIR() code did you test?

How are you measuring the performance?

How many times did you measure the performance?

Is the testing done against compiled code or in debug mode?

If compiled, was it optimized?

How many files are in the directory when you tested?

Are you replicating the production environment?
What is the TrimNull() function?
TrimNull is as attached, I removed the separate function and included it with the main test, but only improved it the speed by a small amount.

can the trimnull function bee improved?
Private Function TrimNull(startstr As String) As String

   TrimNull = Left$(startstr, lstrlen(StrPtr(startstr)))
   
End Function

Open in new window

Try this:

hFile = FindFirstFile(fold & "*.txt", WFD)
If hFile <> INVALID_HANDLE_VALUE Then
   Do
      filestr = Left$(WFD.cFileName, Instr(WFD.cFileName, vbNullChar) - 1)
      i = i + 1
   Loop While FindNextFile(hFile, WFD)
End If
Call FindClose(hFile)

Open in new window

okay, I have tried the above, changed it slightly but was still the same as previous;
hFile = FindFirstFile(fold & "*.txt", WFD)
    If hFile <> INVALID_HANDLE_VALUE Then
       Do
            tempStr = WFD.cFileName
            filestr = Left$(WFD.cFileName, InStr(4, WFD.cFileName, vbNullChar, vbBinaryCompare) - 1)
            i = i + 1
       Loop While FindNextFile(hFile, WFD)
    End If
   Call FindClose(hFile)

Open in new window

Are you testing compiled code?
yes, though I don't see any difference in speed between debug and compiled.
sorry, the previous code should have actually read;
If hFile <> INVALID_HANDLE_VALUE Then
       Do
            tempStr = WFD.cFileName
            filestr = Left$(tempStr, InStr(4, tempStr, vbNullChar, vbBinaryCompare) - 1)
            i = i + 1
       Loop While FindNextFile(hFile, WFD)
    End If

Open in new window

Please revisit some of the unanswered questions I posted in http:#34982328

We have entered the realm of testing methodology.  Results and valid comparisons between solutions really depend what and how we measure.

What happens to your API performance figures if you comment the line that trims the returned string?
aikimark,

I am testing this with 5000x 1mb txt files, and then perform this 1000 times. using GetTickCount as the timer.

removing the trim line improves it by 5%.
still around 25% greater than dir.

that is all for me tonight, I will follow up in the morning.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
One more thing that I'd like to know about your performance measurement is where you are placing your GetTick calls.

What operating systems does this need to run on?

========
In your production environment, what is the percentage of non-text files in the target directory?

In your production environment, how volatile is the directory? (activity of new, changed, and deleted files)

I know you said you are only interested in a list of the text files, but what are you doing with the file names once you get them?  If we can reduce the amount of work you have to do on the back end, it might improve the entire pipeline performance profile.
Since we can eliminate the allocation from our performance measurements, it would worth testing an FSO configuration.  The following code requires a reference to the Microsoft Scripting Runtime library.  
Note: It can (and should) use late binding for production.
 
Option Explicit

Public Function FilesCollection() As Collection
  Dim oFS As New Scripting.FileSystemObject
  Dim oDir As Folder
  Dim oFile As File
  Dim colFiles As New Collection
  Dim sngStart As Single
  Set oDir = oFS.GetFolder("C:\Users\AikiMark\Documents")
  sngStart = Timer
  For Each oFile In oDir.Files
    If Right$(oFile.Name, 3) = "txt" Then
      colFiles.Add oFile.Name
    End If
  Next
  Set FilesCollection = colFiles
  Debug.Print Timer - sngStart
End Function

Sub testit()
  Dim colThing As Collection
  Set colThing = FilesCollection
End Sub

Open in new window

aikimark, thanks. I have tried the above. it is significantly slower.

i believe for my application Dir() the winner.
FSO is always slower.
The reason that DIR() outperforms FindFirstFile API is probably that even though Vb6 internal functions also use FindFirstFile API, Vb6 internal functions are written in C++.
@thydzik

>>it is significantly slower

Both the WMI and FSO are expected to be slower.  However, there are some circumstances where they might be faster.  For instance, if you wanted to get a list of text files with certain characteristics (modified date, size, attribute), you could use WMI to return only those file names.  This kind of filtering would require secondary statements/functions.

========
Although you mentioned the number and size of the files, we don't know if the size of the files plays a part in the directory iteration.

However, the fastest possible method for getting this list is to instantiate a separate process/object that will
1. iterate the directory during the initialization process
2. monitor the directory for any changes (via system hook callback), updating the list to reflect the changes.

When your main process needs a list of files, it grabs the list.  This way, the iteration process takes place in the background and getting the list involves no delay.