Solved

VB6 fastest way to get filenames in a folder

Posted on 2011-02-24
29
2,669 Views
Last Modified: 2012-05-11
I am looking for the fastest method to read a large number of filenames in a folder.

doesn't matter how it achieves this, just needs to be fast.
0
Comment
Question by:thydzik
  • 13
  • 10
  • 3
  • +3
29 Comments
 
LVL 5

Expert Comment

by:Ultra_Master
ID: 34972019
0
 
LVL 22

Expert Comment

by:danaseaman
ID: 34972225
Use API; Unlike most other samples this code also supports Unicode:


Option Explicit

Public Enum FileAttributes
   ReadOnly = &H1
   Hidden = &H2
   System = &H4
   Volume = &H8
   Directory = &H10
   Archive = &H20
   Alias = &H40 ' or Device [reserved]
   Normal = &H80
   Temporary = &H100
   SparseFile = &H200
   ReparsePoint = &H400
   Compressed = &H800
   Offline = &H1000
   NotContentIndexed = &H2000
   Encrypted = &H4000
   Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then  'PreserveEnumCase
   Private ReadOnly, Hidden, System, Volume, Directory, Archive
   Private Alias, Normal, Temporary, SparseFile, ReparsePoint
   Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If

Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long

Private Const MAX_PATH = 260

Private Type WIN32_FIND_DATA
   dwFileAttributes     As Long
   ftCreationTime       As Currency
   ftLastAccessTime     As Currency
   ftLastWriteTime      As Currency
   nFileSizeBig         As Currency
   dwReserved0          As Long
   dwReserved1          As Long
   cFileName            As String * MAX_PATH
   cShortFileName       As String * 14
End Type

Public Sub EnumFolders(ByVal sPath As String, _
   Optional ByVal sPattern As String = "*.*", _
   Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
   Optional ByVal bRecurse As Boolean = False)

   Dim lHandle          As Long
   Dim sFileName        As String
   Dim Lines            As Long
   Dim wFD              As WIN32_FIND_DATA

   On Error GoTo ProcedureError

   sPath = QualifyPath(sPath)

   lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
   If lHandle > 0 Then
      Do
         With wFD
            If AscW(.cFileName) <> 46 Then  'skip . and .. entries
               sFileName = StripNull(.cFileName)
               If (.dwFileAttributes And Directory) Then
                  If bRecurse Then
                     EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
                  End If
               ElseIf (.dwFileAttributes And lAttributeFilter) Then
                  List1.AddItem sFileName
               End If
            End If
         End With
      Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
   End If
   FindClose lHandle
   Exit Sub
ProcedureError:
   Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"

End Sub

Public Function StripNull(StrIn As String) As String
   Dim nul              As Long
   nul = InStr(StrIn, vbNullChar)
   If (nul) Then
      StripNull = Left$(StrIn, nul - 1)
   Else
      StripNull = Trim$(StrIn)
   End If
End Function

Public Function QualifyPath(ByVal Path As String) As String
   Dim Delimiter        As String   ' segmented path delimiter

   If InStr(Path, "://") > 0 Then      ' it's a URL path
      Delimiter = "/"                 ' use URL path delimiter
   Else                                ' it's a disk based path
      Delimiter = "\"                 ' use disk based path delimiter
   End If

   Select Case Right$(Path, 1)         ' whats last character in path?
      Case "/", "\"                       ' it's one of the valid delimiters
         QualifyPath = Path              ' use the supplied path
      Case Else                           ' needs a trailing path delimiter
         QualifyPath = Path & Delimiter  ' append it
   End Select
End Function

Private Sub Form_Load()
   List1.Clear
   EnumFolders App.Path
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34973765
@thydzik

1. Do you need to iterate the entire directory sub-tree or just the current/target directory?

2. Be aware that the fastest iteration would NOT involve adding items to a listbox (as was shown in these two code references)

3. What do you need to do with the file list?

4. Do you need a list of all the files or do you need to apply some filter to the list of files, returning the file names that match the filter?
0
 
LVL 14

Expert Comment

by:VBClassicGuy
ID: 34974083
If a list of files is a necessity, you can use a little trick. Make your listbox an array of two listboxes. Set List1(0).Visible = False, and List(1).Visible = True. Populate List1(0), which will happen quicker because there will be no screen/object refreshes to do. Upon completion of population, set List1(0).Visible = True. In other words, populate the listbox when it is invisible, then only show it when it is full. At this time you either set List1(1).Visible = False, or you can avoid that statement by using "Bring to front" (or use ZOrder) to make List1(0) appear on top of List1(1).
0
 
LVL 16

Accepted Solution

by:
HooKooDooKu earned 300 total points
ID: 34975351
While I suspect the API code would be about as fast as you can go, I would hope the VB built in function DIR would be a simple wrapper around the API function calls and would work just as well (IF you don't have to walk a directory structure).

To use Dir, simply call Dir with a filespec.  The function will return the 1st file that matches the file spec.  To find the next file, call Dir with no parameters.  When no further files are found, Dir will return a blank string.

FileName = Dir("*.txt")
do while( Len( FileName ) )
  Call ProcessFile( FileName )
  FileName = Dir()
loop
0
 
LVL 22

Expert Comment

by:danaseaman
ID: 34976089
In practice the code I provided should be added to a class with the filename and/or other info returned via an Event. Unlike using the Dir command (returns ANSI only), this code supports Unicode filenames.

Option Explicit

Private WithEvents cF  As cFiles

Private Sub cF_ItemDetails(ByVal strFileName As String)
   List1.AddItem strFileName
End Sub

Private Sub Form_Load()
   List1.Clear
   Set cF = New cFiles
   cF.EnumFolders App.Path
End Sub

 Add this code to a class named cFiles:


Option Explicit

Public Enum FileAttributes
   ReadOnly = &H1
   Hidden = &H2
   System = &H4
   Volume = &H8
   Directory = &H10
   Archive = &H20
   Alias = &H40 ' or Device [reserved]
   Normal = &H80
   Temporary = &H100
   SparseFile = &H200
   ReparsePoint = &H400
   Compressed = &H800
   Offline = &H1000
   NotContentIndexed = &H2000
   Encrypted = &H4000
   Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then  'PreserveEnumCase
   Private ReadOnly, Hidden, System, Volume, Directory, Archive
   Private Alias, Normal, Temporary, SparseFile, ReparsePoint
   Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If

Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long

Private Const MAX_PATH = 260

Private Type WIN32_FIND_DATA
   dwFileAttributes     As Long
   ftCreationTime       As Currency
   ftLastAccessTime     As Currency
   ftLastWriteTime      As Currency
   nFileSizeBig         As Currency
   dwReserved0          As Long
   dwReserved1          As Long
   cFileName            As String * MAX_PATH
   cShortFileName       As String * 14
End Type

Public Event ItemDetails(ByVal strFileName As String)

Public Sub EnumFolders(ByVal sPath As String, _
   Optional ByVal sPattern As String = "*.*", _
   Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
   Optional ByVal bRecurse As Boolean = False)

   Dim lHandle          As Long
   Dim sFileName        As String
   Dim Lines            As Long
   Dim wFD              As WIN32_FIND_DATA

   On Error GoTo ProcedureError

   sPath = QualifyPath(sPath)

   lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
   If lHandle > 0 Then
      Do
         With wFD
            If AscW(.cFileName) <> 46 Then  'skip . and .. entries
               sFileName = StripNull(.cFileName)
               If (.dwFileAttributes And Directory) Then
                  If bRecurse Then
                     EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
                  End If
               ElseIf (.dwFileAttributes And lAttributeFilter) Then
                  RaiseEvent ItemDetails(sFileName)
               End If
            End If
         End With
      Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
   End If
   FindClose lHandle
   Exit Sub
ProcedureError:
   Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"

End Sub

Private Function StripNull(StrIn As String) As String
   Dim nul              As Long
   nul = InStr(StrIn, vbNullChar)
   If (nul) Then
      StripNull = Left$(StrIn, nul - 1)
   Else
      StripNull = Trim$(StrIn)
   End If
End Function

Private Function QualifyPath(ByVal Path As String) As String
   Dim Delimiter        As String   ' segmented path delimiter

   If InStr(Path, "://") > 0 Then      ' it's a URL path
      Delimiter = "/"                 ' use URL path delimiter
   Else                                ' it's a disk based path
      Delimiter = "\"                 ' use disk based path delimiter
   End If

   Select Case Right$(Path, 1)         ' whats last character in path?
      Case "/", "\"                       ' it's one of the valid delimiters
         QualifyPath = Path              ' use the supplied path
      Case Else                           ' needs a trailing path delimiter
         QualifyPath = Path & Delimiter  ' append it
   End Select
End Function

Open in new window

0
 
LVL 11

Author Comment

by:thydzik
ID: 34978748
experts, thank you for all the replies. I am in the process of reviewing the solutions.

aikimark,
I do not need to iterate sub directories.

I do not need to add them to a list box, only iterate through the files. I will parse the file's content.

I need to apply a single filter, i.e. all .txt files.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 34979340
Please try this function.  It returns a collection of file names.

Option Explicit

Public Function GetTextFileNames(ByVal parmPath As String) As Collection
  Dim strFilename As String
  Dim strPath As String
  Dim colFiles As New Collection
  If Len(Dir(parmPath, vbDirectory)) <> 0 Then
    strPath = parmPath
    If Right$(strPath, 1) <> "\" Then
      strPath = strPath & "\"
    End If
    strFilename = Dir(strPath & "*.txt")
    Do Until Len(strFilename) = 0
      colFiles.Add strFilename
      strFilename = Dir()
    Loop
  End If
  Set GetTextFileNames = colFiles
  Set colFiles = Nothing
End Function

Sub testget()
  'Drive the GetTextFilenames function
  Dim colTF As Collection
  Dim vFile As Variant
  Set colTF = GetTextFileNames("C:\Users\AikiMark\Downloads")
  Debug.Print colTF.Count
  For Each vFile In colTF
    Debug.Print vFile
  Next
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34980873
@thydzik

What is the context of this question?

Are you monitoring a directory for new files?   If so, a better architecture would be to have the OS notify your program when changes happen to the folder.  That way, you only look at the contents when you need to.
0
 
LVL 11

Author Comment

by:thydzik
ID: 34981974
aikimark,
No, I just want to read a large amount of files as quick as possible.
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982190
okay, I have done two basic tests.
one using Ultra_Master reply of the API and one using the known Dir.
Dir seems to be 40% faster, does this sound right?

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34982197
>>...read a large amount of files as quick as possible

Does that mean that you need to open the text files and read their content?

How often will this run?

Will these be local directories or directories on a file server?

========
The greater the number of file names returned, the greater the performance difference of using a collection object (or dictionary object) as the return data type.  String concatenation can be a real performance killer.

If your code has already performed some of the data validation steps prior to invocation, you can remove them from my code.
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982299
aikimark, I am opening the text files, but I am not including these in my tests.

these were my two test cases;

hFile = FindFirstFile(fold & "*.txt", WFD)
   If hFile <> INVALID_HANDLE_VALUE Then
      Do
         filestr = TrimNull(WFD.cFileName)
         i = i + 1
      Loop While FindNextFile(hFile, WFD)
   End If
   Call FindClose(hFile)

Open in new window


 
filestr = Dir(fold & "*.txt", vbNormal)
Do While LenB(filestr) > 0
    i = i + 1
    filestr = Dir$
Loop

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34982328
@thydzik

>>does this sound right?

It is counter-intuitive, but is possible.  The danaseaman API code is more general and was designed to traverse a directory tree.  As such, it contains code that might not be as streamlined as possible.  Also, it is unicode friendly, which may add some overhead.

Which DIR() code did you test?

How are you measuring the performance?

How many times did you measure the performance?

Is the testing done against compiled code or in debug mode?

If compiled, was it optimized?

How many files are in the directory when you tested?

Are you replicating the production environment?
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 45

Expert Comment

by:aikimark
ID: 34982474
What is the TrimNull() function?
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982528
TrimNull is as attached, I removed the separate function and included it with the main test, but only improved it the speed by a small amount.

can the trimnull function bee improved?
Private Function TrimNull(startstr As String) As String

   TrimNull = Left$(startstr, lstrlen(StrPtr(startstr)))
   
End Function

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34982593
Try this:

hFile = FindFirstFile(fold & "*.txt", WFD)
If hFile <> INVALID_HANDLE_VALUE Then
   Do
      filestr = Left$(WFD.cFileName, Instr(WFD.cFileName, vbNullChar) - 1)
      i = i + 1
   Loop While FindNextFile(hFile, WFD)
End If
Call FindClose(hFile)

Open in new window

0
 
LVL 11

Author Comment

by:thydzik
ID: 34982740
okay, I have tried the above, changed it slightly but was still the same as previous;
hFile = FindFirstFile(fold & "*.txt", WFD)
    If hFile <> INVALID_HANDLE_VALUE Then
       Do
            tempStr = WFD.cFileName
            filestr = Left$(WFD.cFileName, InStr(4, WFD.cFileName, vbNullChar, vbBinaryCompare) - 1)
            i = i + 1
       Loop While FindNextFile(hFile, WFD)
    End If
   Call FindClose(hFile)

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34982765
Are you testing compiled code?
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982802
yes, though I don't see any difference in speed between debug and compiled.
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982810
sorry, the previous code should have actually read;
If hFile <> INVALID_HANDLE_VALUE Then
       Do
            tempStr = WFD.cFileName
            filestr = Left$(tempStr, InStr(4, tempStr, vbNullChar, vbBinaryCompare) - 1)
            i = i + 1
       Loop While FindNextFile(hFile, WFD)
    End If

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34982843
Please revisit some of the unanswered questions I posted in http:#34982328

We have entered the realm of testing methodology.  Results and valid comparisons between solutions really depend what and how we measure.

What happens to your API performance figures if you comment the line that trims the returned string?
0
 
LVL 11

Author Comment

by:thydzik
ID: 34982884
aikimark,

I am testing this with 5000x 1mb txt files, and then perform this 1000 times. using GetTickCount as the timer.

removing the trim line improves it by 5%.
still around 25% greater than dir.

that is all for me tonight, I will follow up in the morning.
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 200 total points
ID: 34983967
In addition to the API and Dir() methods, I know of four alternative methods of iterating the files in a directory.  One is the WMI example below.

The other three methods are:
* FileListBox control with the filter set to *.txt and path=the directory you are looking for.  Process the filelistbox LIST property (array)
* FileSystemObject, iterating the Files collection in a folder object
* Use a DIR /b *.txt command, directing the output to a file.  Your VB code opens the file containing the filenames.
Example:
Dir /b C:\Users\AikiMark\Documents\*.txt >  C:\Users\AikiMark\Documents\DirList.lst

Open in new window


The other three methods involve some overhead that would (intuitively) result in slower performance than you've been measuring.  However, the fact that you are seeing slower API performance than DIR leads me to think that I shouldn't trust my intuition.

The DIR command has an advantage that it might be run in advance and asynchronously to your VB code and the directory list data is already available to you.

========
Note the WMI code that is posted below.  I've made some of the variables static, which eliminates the repeated object creation for multiple invocations.

I noticed that the code you are testing is looping through the file list but only  counting the number of files.  Do you actually need to process the file names?

Note the WMI code only looks at the Filename property of the CIM_DataFile class  The Filename property does not include the File Extension (File Type).  Since we are only looking for TXT files, this shouldn't be a problem.  Also, this is another  performance tweak -- limit the number of columns to only the ones you need.
 
Option Explicit

Public Sub WMItest()

  Dim strWMIQuery, colFiles
  Dim vFile
  Dim lngFile As Long
  Dim sngStart As Single
  Static boolSecondTime As Boolean
  Static wmi
                        'can also use CIM_LogicalFile
  strWMIQuery = "select Filename from CIM_DataFile where Drive='C:' and  Path='\\Users\\AikiMark\\Documents\\' and Extension='txt'"
  strComputer = "."
  
  If boolSecondTime Then
  Else
    Set wmi = GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\cimv2")
    boolSecondTime = True
  End If
  Set colFiles = wmi.ExecQuery(strWMIQuery)
  'Debug.Print colFiles.Count
  sngStart = Timer
  For Each vFile In colFiles
    lngFile = lngFile + 1
  Next
  Debug.Print Timer - sngStart

End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 34985398
One more thing that I'd like to know about your performance measurement is where you are placing your GetTick calls.

What operating systems does this need to run on?

========
In your production environment, what is the percentage of non-text files in the target directory?

In your production environment, how volatile is the directory? (activity of new, changed, and deleted files)

I know you said you are only interested in a list of the text files, but what are you doing with the file names once you get them?  If we can reduce the amount of work you have to do on the back end, it might improve the entire pipeline performance profile.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 34988589
Since we can eliminate the allocation from our performance measurements, it would worth testing an FSO configuration.  The following code requires a reference to the Microsoft Scripting Runtime library.  
Note: It can (and should) use late binding for production.
 
Option Explicit

Public Function FilesCollection() As Collection
  Dim oFS As New Scripting.FileSystemObject
  Dim oDir As Folder
  Dim oFile As File
  Dim colFiles As New Collection
  Dim sngStart As Single
  Set oDir = oFS.GetFolder("C:\Users\AikiMark\Documents")
  sngStart = Timer
  For Each oFile In oDir.Files
    If Right$(oFile.Name, 3) = "txt" Then
      colFiles.Add oFile.Name
    End If
  Next
  Set FilesCollection = colFiles
  Debug.Print Timer - sngStart
End Function

Sub testit()
  Dim colThing As Collection
  Set colThing = FilesCollection
End Sub

Open in new window

0
 
LVL 11

Author Comment

by:thydzik
ID: 35007695
aikimark, thanks. I have tried the above. it is significantly slower.

i believe for my application Dir() the winner.
0
 
LVL 22

Expert Comment

by:danaseaman
ID: 35008026
FSO is always slower.
The reason that DIR() outperforms FindFirstFile API is probably that even though Vb6 internal functions also use FindFirstFile API, Vb6 internal functions are written in C++.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 35008266
@thydzik

>>it is significantly slower

Both the WMI and FSO are expected to be slower.  However, there are some circumstances where they might be faster.  For instance, if you wanted to get a list of text files with certain characteristics (modified date, size, attribute), you could use WMI to return only those file names.  This kind of filtering would require secondary statements/functions.

========
Although you mentioned the number and size of the files, we don't know if the size of the files plays a part in the directory iteration.

However, the fastest possible method for getting this list is to instantiate a separate process/object that will
1. iterate the directory during the initialization process
2. monitor the directory for any changes (via system hook callback), updating the list to reflect the changes.

When your main process needs a list of files, it grabs the list.  This way, the iteration process takes place in the background and getting the list involves no delay.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
Enums (shorthand for ‘enumerations’) are not often used by programmers but they can be quite valuable when they are.  What are they? An Enum is just a type of variable like a string or an Integer, but in this case one that you create that contains…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now