Travis Hydzik
asked on
VB6 fastest way to get filenames in a folder
I am looking for the fastest method to read a large number of filenames in a folder.
doesn't matter how it achieves this, just needs to be fast.
doesn't matter how it achieves this, just needs to be fast.
Use API; Unlike most other samples this code also supports Unicode:
Option Explicit
Public Enum FileAttributes
ReadOnly = &H1
Hidden = &H2
System = &H4
Volume = &H8
Directory = &H10
Archive = &H20
Alias = &H40 ' or Device [reserved]
Normal = &H80
Temporary = &H100
SparseFile = &H200
ReparsePoint = &H400
Compressed = &H800
Offline = &H1000
NotContentIndexed = &H2000
Encrypted = &H4000
Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then 'PreserveEnumCase
Private ReadOnly, Hidden, System, Volume, Directory, Archive
Private Alias, Normal, Temporary, SparseFile, ReparsePoint
Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If
Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long
Private Const MAX_PATH = 260
Private Type WIN32_FIND_DATA
dwFileAttributes As Long
ftCreationTime As Currency
ftLastAccessTime As Currency
ftLastWriteTime As Currency
nFileSizeBig As Currency
dwReserved0 As Long
dwReserved1 As Long
cFileName As String * MAX_PATH
cShortFileName As String * 14
End Type
Public Sub EnumFolders(ByVal sPath As String, _
Optional ByVal sPattern As String = "*.*", _
Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
Optional ByVal bRecurse As Boolean = False)
Dim lHandle As Long
Dim sFileName As String
Dim Lines As Long
Dim wFD As WIN32_FIND_DATA
On Error GoTo ProcedureError
sPath = QualifyPath(sPath)
lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
If lHandle > 0 Then
Do
With wFD
If AscW(.cFileName) <> 46 Then 'skip . and .. entries
sFileName = StripNull(.cFileName)
If (.dwFileAttributes And Directory) Then
If bRecurse Then
EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
End If
ElseIf (.dwFileAttributes And lAttributeFilter) Then
List1.AddItem sFileName
End If
End If
End With
Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
End If
FindClose lHandle
Exit Sub
ProcedureError:
Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"
End Sub
Public Function StripNull(StrIn As String) As String
Dim nul As Long
nul = InStr(StrIn, vbNullChar)
If (nul) Then
StripNull = Left$(StrIn, nul - 1)
Else
StripNull = Trim$(StrIn)
End If
End Function
Public Function QualifyPath(ByVal Path As String) As String
Dim Delimiter As String ' segmented path delimiter
If InStr(Path, "://") > 0 Then ' it's a URL path
Delimiter = "/" ' use URL path delimiter
Else ' it's a disk based path
Delimiter = "\" ' use disk based path delimiter
End If
Select Case Right$(Path, 1) ' whats last character in path?
Case "/", "\" ' it's one of the valid delimiters
QualifyPath = Path ' use the supplied path
Case Else ' needs a trailing path delimiter
QualifyPath = Path & Delimiter ' append it
End Select
End Function
Private Sub Form_Load()
List1.Clear
EnumFolders App.Path
End Sub
@thydzik
1. Do you need to iterate the entire directory sub-tree or just the current/target directory?
2. Be aware that the fastest iteration would NOT involve adding items to a listbox (as was shown in these two code references)
3. What do you need to do with the file list?
4. Do you need a list of all the files or do you need to apply some filter to the list of files, returning the file names that match the filter?
1. Do you need to iterate the entire directory sub-tree or just the current/target directory?
2. Be aware that the fastest iteration would NOT involve adding items to a listbox (as was shown in these two code references)
3. What do you need to do with the file list?
4. Do you need a list of all the files or do you need to apply some filter to the list of files, returning the file names that match the filter?
If a list of files is a necessity, you can use a little trick. Make your listbox an array of two listboxes. Set List1(0).Visible = False, and List(1).Visible = True. Populate List1(0), which will happen quicker because there will be no screen/object refreshes to do. Upon completion of population, set List1(0).Visible = True. In other words, populate the listbox when it is invisible, then only show it when it is full. At this time you either set List1(1).Visible = False, or you can avoid that statement by using "Bring to front" (or use ZOrder) to make List1(0) appear on top of List1(1).
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
In practice the code I provided should be added to a class with the filename and/or other info returned via an Event. Unlike using the Dir command (returns ANSI only), this code supports Unicode filenames.
Option Explicit
Private WithEvents cF As cFiles
Private Sub cF_ItemDetails(ByVal strFileName As String)
List1.AddItem strFileName
End Sub
Private Sub Form_Load()
List1.Clear
Set cF = New cFiles
cF.EnumFolders App.Path
End Sub
Add this code to a class named cFiles:
Option Explicit
Private WithEvents cF As cFiles
Private Sub cF_ItemDetails(ByVal strFileName As String)
List1.AddItem strFileName
End Sub
Private Sub Form_Load()
List1.Clear
Set cF = New cFiles
cF.EnumFolders App.Path
End Sub
Add this code to a class named cFiles:
Option Explicit
Public Enum FileAttributes
ReadOnly = &H1
Hidden = &H2
System = &H4
Volume = &H8
Directory = &H10
Archive = &H20
Alias = &H40 ' or Device [reserved]
Normal = &H80
Temporary = &H100
SparseFile = &H200
ReparsePoint = &H400
Compressed = &H800
Offline = &H1000
NotContentIndexed = &H2000
Encrypted = &H4000
Attr_ALL = ReadOnly Or Hidden Or System Or Archive Or Normal
End Enum
#If False Then 'PreserveEnumCase
Private ReadOnly, Hidden, System, Volume, Directory, Archive
Private Alias, Normal, Temporary, SparseFile, ReparsePoint
Private Compressed, Offline, NotContentIndexed, Encrypted, Attr_ALL
#End If
Private Declare Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindNextFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal lpFindFileData As Long) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long
Private Const MAX_PATH = 260
Private Type WIN32_FIND_DATA
dwFileAttributes As Long
ftCreationTime As Currency
ftLastAccessTime As Currency
ftLastWriteTime As Currency
nFileSizeBig As Currency
dwReserved0 As Long
dwReserved1 As Long
cFileName As String * MAX_PATH
cShortFileName As String * 14
End Type
Public Event ItemDetails(ByVal strFileName As String)
Public Sub EnumFolders(ByVal sPath As String, _
Optional ByVal sPattern As String = "*.*", _
Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
Optional ByVal bRecurse As Boolean = False)
Dim lHandle As Long
Dim sFileName As String
Dim Lines As Long
Dim wFD As WIN32_FIND_DATA
On Error GoTo ProcedureError
sPath = QualifyPath(sPath)
lHandle = FindFirstFileW(StrPtr(sPath & sPattern), VarPtr(wFD))
If lHandle > 0 Then
Do
With wFD
If AscW(.cFileName) <> 46 Then 'skip . and .. entries
sFileName = StripNull(.cFileName)
If (.dwFileAttributes And Directory) Then
If bRecurse Then
EnumFolders sPath & sFileName, sPattern, lAttributeFilter, bRecurse
End If
ElseIf (.dwFileAttributes And lAttributeFilter) Then
RaiseEvent ItemDetails(sFileName)
End If
End If
End With
Loop While FindNextFileW(lHandle, VarPtr(wFD)) > 0
End If
FindClose lHandle
Exit Sub
ProcedureError:
Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"
End Sub
Private Function StripNull(StrIn As String) As String
Dim nul As Long
nul = InStr(StrIn, vbNullChar)
If (nul) Then
StripNull = Left$(StrIn, nul - 1)
Else
StripNull = Trim$(StrIn)
End If
End Function
Private Function QualifyPath(ByVal Path As String) As String
Dim Delimiter As String ' segmented path delimiter
If InStr(Path, "://") > 0 Then ' it's a URL path
Delimiter = "/" ' use URL path delimiter
Else ' it's a disk based path
Delimiter = "\" ' use disk based path delimiter
End If
Select Case Right$(Path, 1) ' whats last character in path?
Case "/", "\" ' it's one of the valid delimiters
QualifyPath = Path ' use the supplied path
Case Else ' needs a trailing path delimiter
QualifyPath = Path & Delimiter ' append it
End Select
End Function
ASKER
experts, thank you for all the replies. I am in the process of reviewing the solutions.
aikimark,
I do not need to iterate sub directories.
I do not need to add them to a list box, only iterate through the files. I will parse the file's content.
I need to apply a single filter, i.e. all .txt files.
aikimark,
I do not need to iterate sub directories.
I do not need to add them to a list box, only iterate through the files. I will parse the file's content.
I need to apply a single filter, i.e. all .txt files.
Please try this function. It returns a collection of file names.
Option Explicit
Public Function GetTextFileNames(ByVal parmPath As String) As Collection
Dim strFilename As String
Dim strPath As String
Dim colFiles As New Collection
If Len(Dir(parmPath, vbDirectory)) <> 0 Then
strPath = parmPath
If Right$(strPath, 1) <> "\" Then
strPath = strPath & "\"
End If
strFilename = Dir(strPath & "*.txt")
Do Until Len(strFilename) = 0
colFiles.Add strFilename
strFilename = Dir()
Loop
End If
Set GetTextFileNames = colFiles
Set colFiles = Nothing
End Function
Sub testget()
'Drive the GetTextFilenames function
Dim colTF As Collection
Dim vFile As Variant
Set colTF = GetTextFileNames("C:\Users\AikiMark\Downloads")
Debug.Print colTF.Count
For Each vFile In colTF
Debug.Print vFile
Next
End Sub
@thydzik
What is the context of this question?
Are you monitoring a directory for new files? If so, a better architecture would be to have the OS notify your program when changes happen to the folder. That way, you only look at the contents when you need to.
What is the context of this question?
Are you monitoring a directory for new files? If so, a better architecture would be to have the OS notify your program when changes happen to the folder. That way, you only look at the contents when you need to.
ASKER
aikimark,
No, I just want to read a large amount of files as quick as possible.
No, I just want to read a large amount of files as quick as possible.
ASKER
okay, I have done two basic tests.
one using Ultra_Master reply of the API and one using the known Dir.
Dir seems to be 40% faster, does this sound right?
one using Ultra_Master reply of the API and one using the known Dir.
Dir seems to be 40% faster, does this sound right?
>>...read a large amount of files as quick as possible
Does that mean that you need to open the text files and read their content?
How often will this run?
Will these be local directories or directories on a file server?
========
The greater the number of file names returned, the greater the performance difference of using a collection object (or dictionary object) as the return data type. String concatenation can be a real performance killer.
If your code has already performed some of the data validation steps prior to invocation, you can remove them from my code.
Does that mean that you need to open the text files and read their content?
How often will this run?
Will these be local directories or directories on a file server?
========
The greater the number of file names returned, the greater the performance difference of using a collection object (or dictionary object) as the return data type. String concatenation can be a real performance killer.
If your code has already performed some of the data validation steps prior to invocation, you can remove them from my code.
ASKER
aikimark, I am opening the text files, but I am not including these in my tests.
these were my two test cases;
these were my two test cases;
hFile = FindFirstFile(fold & "*.txt", WFD)
If hFile <> INVALID_HANDLE_VALUE Then
Do
filestr = TrimNull(WFD.cFileName)
i = i + 1
Loop While FindNextFile(hFile, WFD)
End If
Call FindClose(hFile)
filestr = Dir(fold & "*.txt", vbNormal)
Do While LenB(filestr) > 0
i = i + 1
filestr = Dir$
Loop
@thydzik
>>does this sound right?
It is counter-intuitive, but is possible. The danaseaman API code is more general and was designed to traverse a directory tree. As such, it contains code that might not be as streamlined as possible. Also, it is unicode friendly, which may add some overhead.
Which DIR() code did you test?
How are you measuring the performance?
How many times did you measure the performance?
Is the testing done against compiled code or in debug mode?
If compiled, was it optimized?
How many files are in the directory when you tested?
Are you replicating the production environment?
>>does this sound right?
It is counter-intuitive, but is possible. The danaseaman API code is more general and was designed to traverse a directory tree. As such, it contains code that might not be as streamlined as possible. Also, it is unicode friendly, which may add some overhead.
Which DIR() code did you test?
How are you measuring the performance?
How many times did you measure the performance?
Is the testing done against compiled code or in debug mode?
If compiled, was it optimized?
How many files are in the directory when you tested?
Are you replicating the production environment?
What is the TrimNull() function?
ASKER
TrimNull is as attached, I removed the separate function and included it with the main test, but only improved it the speed by a small amount.
can the trimnull function bee improved?
can the trimnull function bee improved?
Private Function TrimNull(startstr As String) As String
TrimNull = Left$(startstr, lstrlen(StrPtr(startstr)))
End Function
Try this:
hFile = FindFirstFile(fold & "*.txt", WFD)
If hFile <> INVALID_HANDLE_VALUE Then
Do
filestr = Left$(WFD.cFileName, Instr(WFD.cFileName, vbNullChar) - 1)
i = i + 1
Loop While FindNextFile(hFile, WFD)
End If
Call FindClose(hFile)
ASKER
okay, I have tried the above, changed it slightly but was still the same as previous;
hFile = FindFirstFile(fold & "*.txt", WFD)
If hFile <> INVALID_HANDLE_VALUE Then
Do
tempStr = WFD.cFileName
filestr = Left$(WFD.cFileName, InStr(4, WFD.cFileName, vbNullChar, vbBinaryCompare) - 1)
i = i + 1
Loop While FindNextFile(hFile, WFD)
End If
Call FindClose(hFile)
Are you testing compiled code?
ASKER
yes, though I don't see any difference in speed between debug and compiled.
ASKER
sorry, the previous code should have actually read;
If hFile <> INVALID_HANDLE_VALUE Then
Do
tempStr = WFD.cFileName
filestr = Left$(tempStr, InStr(4, tempStr, vbNullChar, vbBinaryCompare) - 1)
i = i + 1
Loop While FindNextFile(hFile, WFD)
End If
Please revisit some of the unanswered questions I posted in http:#34982328
We have entered the realm of testing methodology. Results and valid comparisons between solutions really depend what and how we measure.
What happens to your API performance figures if you comment the line that trims the returned string?
We have entered the realm of testing methodology. Results and valid comparisons between solutions really depend what and how we measure.
What happens to your API performance figures if you comment the line that trims the returned string?
ASKER
aikimark,
I am testing this with 5000x 1mb txt files, and then perform this 1000 times. using GetTickCount as the timer.
removing the trim line improves it by 5%.
still around 25% greater than dir.
that is all for me tonight, I will follow up in the morning.
I am testing this with 5000x 1mb txt files, and then perform this 1000 times. using GetTickCount as the timer.
removing the trim line improves it by 5%.
still around 25% greater than dir.
that is all for me tonight, I will follow up in the morning.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
One more thing that I'd like to know about your performance measurement is where you are placing your GetTick calls.
What operating systems does this need to run on?
========
In your production environment, what is the percentage of non-text files in the target directory?
In your production environment, how volatile is the directory? (activity of new, changed, and deleted files)
I know you said you are only interested in a list of the text files, but what are you doing with the file names once you get them? If we can reduce the amount of work you have to do on the back end, it might improve the entire pipeline performance profile.
What operating systems does this need to run on?
========
In your production environment, what is the percentage of non-text files in the target directory?
In your production environment, how volatile is the directory? (activity of new, changed, and deleted files)
I know you said you are only interested in a list of the text files, but what are you doing with the file names once you get them? If we can reduce the amount of work you have to do on the back end, it might improve the entire pipeline performance profile.
Since we can eliminate the allocation from our performance measurements, it would worth testing an FSO configuration. The following code requires a reference to the Microsoft Scripting Runtime library.
Note: It can (and should) use late binding for production.
Note: It can (and should) use late binding for production.
Option Explicit
Public Function FilesCollection() As Collection
Dim oFS As New Scripting.FileSystemObject
Dim oDir As Folder
Dim oFile As File
Dim colFiles As New Collection
Dim sngStart As Single
Set oDir = oFS.GetFolder("C:\Users\AikiMark\Documents")
sngStart = Timer
For Each oFile In oDir.Files
If Right$(oFile.Name, 3) = "txt" Then
colFiles.Add oFile.Name
End If
Next
Set FilesCollection = colFiles
Debug.Print Timer - sngStart
End Function
Sub testit()
Dim colThing As Collection
Set colThing = FilesCollection
End Sub
ASKER
aikimark, thanks. I have tried the above. it is significantly slower.
i believe for my application Dir() the winner.
i believe for my application Dir() the winner.
FSO is always slower.
The reason that DIR() outperforms FindFirstFile API is probably that even though Vb6 internal functions also use FindFirstFile API, Vb6 internal functions are written in C++.
The reason that DIR() outperforms FindFirstFile API is probably that even though Vb6 internal functions also use FindFirstFile API, Vb6 internal functions are written in C++.
@thydzik
>>it is significantly slower
Both the WMI and FSO are expected to be slower. However, there are some circumstances where they might be faster. For instance, if you wanted to get a list of text files with certain characteristics (modified date, size, attribute), you could use WMI to return only those file names. This kind of filtering would require secondary statements/functions.
========
Although you mentioned the number and size of the files, we don't know if the size of the files plays a part in the directory iteration.
However, the fastest possible method for getting this list is to instantiate a separate process/object that will
1. iterate the directory during the initialization process
2. monitor the directory for any changes (via system hook callback), updating the list to reflect the changes.
When your main process needs a list of files, it grabs the list. This way, the iteration process takes place in the background and getting the list involves no delay.
>>it is significantly slower
Both the WMI and FSO are expected to be slower. However, there are some circumstances where they might be faster. For instance, if you wanted to get a list of text files with certain characteristics (modified date, size, attribute), you could use WMI to return only those file names. This kind of filtering would require secondary statements/functions.
========
Although you mentioned the number and size of the files, we don't know if the size of the files plays a part in the directory iteration.
However, the fastest possible method for getting this list is to instantiate a separate process/object that will
1. iterate the directory during the initialization process
2. monitor the directory for any changes (via system hook callback), updating the list to reflect the changes.
When your main process needs a list of files, it grabs the list. This way, the iteration process takes place in the background and getting the list involves no delay.
http://vbnet.mvps.org/index.html?code/fileapi/recursivefiles_minimal_multiple.htm