RIAS
asked on
Getfiles in vb.net
Hello,
dir.GetFiles("*.doc")
gets files starting with $ as well how can I avoid that?
Thanks
dir.GetFiles("*.doc")
gets files starting with $ as well how can I avoid that?
Thanks
dir.EnumerateFiles("*.doc").Where(Function(x) Not x.StartsWith("$")).ToArray()
ASKER
Ark,
Thanks but, got an error
Startswith is not a member of fileinfo?
Thanks but, got an error
Startswith is not a member of fileinfo?
ASKER
Private Sub LoadReports()
Dim dir As New IO.DirectoryInfo("C:\Users\admin.axv\Docnts\TTemplates")
Dim fils As IO.FileInfo() = dir.GetFiles("*.doc")
Dim f As IO.FileInfo
Dim dt As New DataTable
Dim col As New DataColumn() With {.ColumnName = "Select", .DataType = GetType(Boolean)}
dt.Columns.Add(col)
col = New DataColumn With {.ColumnName = "FileName", .DataType = GetType(String), .Caption = "File Name"}
dt.Columns.Add(col)
For Each f In fils
Dim dtrn As DataRow = dt.NewRow
dtrn("FileName") = f.Name
dt.Rows.Add(dtrn)
Next
With ReportList
.DataSource = dt
.DisplayMember = "Filename"
.ValueMember = "Filename"
End With
End Sub
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Andy,Thanks
Trying your solution..
Trying your solution..
Dim dir As New IO.DirectoryInfo("C:\Users\admin.axv\Docnts\TTemplates")
Dim fils As IO.FileInfo() = dir.EnumerateFiles("*.doc").Where(Function(x) Not x.Name.StartsWith("$")).ToArray()
'or, to get an array of file names instead of FileInfo:
'Dim fils As String() = dir.EnumerateFiles("*.doc").Where(Function(x) Not x.Name.StartsWith("$")).Select(Function(x) x.Name).ToArray()
'Rest of your code
ASKER
Ark,
Thanks tried your solution but,got same error.
Cheers!
Thanks tried your solution but,got same error.
Cheers!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@Ark
The best approach in your case is
Would EnumerateFiles be better than GetFiles in this case because you are processing the results set it returns to strip the 'false' matches out before creating the final array being used.
The best approach in your case is
Would EnumerateFiles be better than GetFiles in this case because you are processing the results set it returns to strip the 'false' matches out before creating the final array being used.
ASKER
Ark what is x?
Dim d = New System.IO.DirectoryInfo("E:\")
Dim c = d.EnumerateFiles("*.txt").Where(Function(f) Not f.Name.StartsWith("$")).ToList()
Dim g = c
@Andy:
@RIAS x as LINQ variable
The EnumerateFiles and GetFiles methods differ as follows:So, if you need ALL files (matching pattern) - there is no difference, otherwise - EnumerateFiles is faster
When you use EnumerateFiles, you can start enumerating the collection of FileInfo objects before the whole collection is returned.
When you use GetFiles, you must wait for the whole array of FileInfo objects to be returned before you can access the array.
Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
@RIAS x as LINQ variable
@Ark - I know that BUT you are processing the async method with a synchronous method to get the final array. I doubt that is more efficient than just getting the array synchronously which is why I had: in this case. Maybe you would want to test the two methods. It would be useful to know if it actually was more efficient.
I just added .ToArray to get an array of strings, Of course, faster continue with async.
Sorry, have no time just now check if your suggestion is faster then mine - will check is tomorrow
Sorry, have no time just now check if your suggestion is faster then mine - will check is tomorrow
ASKER
Ark,
Now it worked. have requested for reopening the question so cn split points.
Cheers
Now it worked. have requested for reopening the question so cn split points.
Cheers
ASKER
Thanks!
@Andy, I checked both approaches. Unfortunatelly, it can not be done directly on IO - Windows cache IO requests and I don't know how exactly. So second call to same request either async=>sync or reverse is always much faster then first. So I used following code (though Net cache LINQ queries as well):
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
Dim files = IO.Directory.GetFiles("c:\windows\system").AsEnumerable
Dim sw As New Stopwatch, ticks As Long
For attempt = 1 To 10
sw.Restart()
For i = 1 To 1000
Dim fils3 = files.Where(Function(x) Not x.StartsWith("a")).ToArray()
Next
ticks = sw.ElapsedTicks
Debug.Print("async=>sync = " & ticks)
sw.Restart()
For i = 1 To 1000
Dim fils2 = files.ToArray.Where(Function(x) Not x.StartsWith("a")).ToArray()
Next
ticks = sw.ElapsedTicks
Debug.Print("sync=>async = " & ticks)
Next
End Sub
which proove that async=>sync is faster
That code is not comparing my suggestion with yours.
@Ark, thanks for actually testing things as I found it hard to believe the EnumerateFiles offered any advantage in this specific case.
A little warmup exercise for the day.
Create loads of files in a directory
I then copied the directory twice using windows explorer
I then ran the following code (I'm not supplying the code to MagicFn)
Which consistently gives results like this:
so, sorry Ark, your 'best approach' actually consistently gives the worst performance, my suggestion runs about 10% faster and a further performance related improvement is even faster still. Now speed isn't everything but the question author didn't understand your code whereas the simple change to the original code with the 'if' statement results in simple and easy to understand code.
Back to reality - consider what is to be done and in general the EnumerateFiles is likely to be more useful but don't just use it blindly.
ps. My highest performing code (MagicFn) could still be improved but when one looks at the numbers of files involved the end user won't notice any difference. So go with the simplest code that is easy to maintain.
A little warmup exercise for the day.
Create loads of files in a directory
Dim l As Long, s As String, file As System.IO.StreamWriter
For l = 1 To 100000
s = "C:\zzTestzz1\" & l.ToString() & ".txt"
file = My.Computer.FileSystem.OpenTextFileWriter(s, True)
file.WriteLine("Here is the first string.")
file.Close()
Next
I then copied the directory twice using windows explorer
I then ran the following code (I'm not supplying the code to MagicFn)
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim sTotal As String
sTotal = "GetFiles took " & GetFilesFn("C:\zzTestzz2")
sTotal = sTotal & ", EnumerateDir took " & EnumerateDirFn("C:\zzTestzz3")
sTotal = sTotal & ", MagicFn took " & MagicFn("C:\zzTestzz1")
MessageBox.Show(sTotal)
End Sub
Private Function EnumerateDirFn(sPath As String) As String
Dim f As IO.FileInfo
Dim l As Long
Dim dir As New IO.DirectoryInfo(sPath)
Dim dt As DateTime, ts As TimeSpan
Dim fils As IO.FileInfo()
l = 0
dt = DateTime.Now
fils = dir.EnumerateFiles("*.*").Where(Function(x) Not x.Name.StartsWith("$")).ToArray()
For Each f In fils
l = l + 1
Next
ts = DateTime.Now - dt
EnumerateDirFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
Private Function GetFilesFn(sPath As String) As String
Dim f As IO.FileInfo, l As Long
Dim dir As New IO.DirectoryInfo(sPath)
Dim dt As DateTime, ts As TimeSpan
Dim fils As IO.FileInfo()
l = 0
dt = DateTime.Now
fils = dir.GetFiles("*.*")
For Each f In fils
If Not f.Name.StartsWith("$") Then
l = l + 1
End If
Next
ts = DateTime.Now - dt
GetFilesFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
Which consistently gives results like this:
so, sorry Ark, your 'best approach' actually consistently gives the worst performance, my suggestion runs about 10% faster and a further performance related improvement is even faster still. Now speed isn't everything but the question author didn't understand your code whereas the simple change to the original code with the 'if' statement results in simple and easy to understand code.
Back to reality - consider what is to be done and in general the EnumerateFiles is likely to be more useful but don't just use it blindly.
ps. My highest performing code (MagicFn) could still be improved but when one looks at the numbers of files involved the end user won't notice any difference. So go with the simplest code that is easy to maintain.
Hmmm, it's really strange. I believed to MSDN that enumeratefiles is faster, but it was wrong. And IMHO this is NOT an issue of async=>sync calls. Seems EnumerateFiles performs some additional operations (probably marshaling structutres to management code). A bit changed code to eliminate structures:
BTW, EnumerateDir without calling .ToArray (ie continue with async) gives results comparable with your "MagicFn"
Private Function EnumerateDirFn(sPath As String) As String
Dim l As Long, dt As DateTime
dt = DateTime.Now
Dim fils = IO.Directory.EnumerateFiles(sPath, "*.*").
Where(Function(x) Not x.StartsWith("$", StringComparison.Ordinal)).ToArray()
For Each f In fils
l = l + 1
Next
Dim ts = DateTime.Now - dt
EnumerateDirFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
Private Function GetFilesFn(sPath As String) As String
Dim l As Long, dt As DateTime
dt = DateTime.Now
Dim fils = IO.Directory.GetFiles(sPath, "*.*")
For Each f In fils
If Not f.StartsWith("$", StringComparison.Ordinal) Then
l = l + 1
End If
Next
Dim ts = DateTime.Now - dt
GetFilesFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
produce same results for both functions.BTW, EnumerateDir without calling .ToArray (ie continue with async) gives results comparable with your "MagicFn"
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
For i = 1 To 10
Dim sTotal As String
sTotal = "GetFiles took " & GetFilesFn("C:\zzTestzz2")
sTotal = sTotal & ", EnumerateDir took " & EnumerateDirFn("C:\zzTestzz3")
Debug.Print(sTotal)
Next
End Sub
GetFiles took 197 (for 100000 files), EnumerateDir took 201 (for 100000 files)GetFiles took 202 (for 100000 files), EnumerateDir took 201 (for 100000 files)
GetFiles took 191 (for 100000 files), EnumerateDir took 206 (for 100000 files)
GetFiles took 197 (for 100000 files), EnumerateDir took 195 (for 100000 files)
GetFiles took 201 (for 100000 files), EnumerateDir took 194 (for 100000 files)
GetFiles took 205 (for 100000 files), EnumerateDir took 202 (for 100000 files)
GetFiles took 208 (for 100000 files), EnumerateDir took 215 (for 100000 files)
GetFiles took 194 (for 100000 files), EnumerateDir took 207 (for 100000 files)
GetFiles took 190 (for 100000 files), EnumerateDir took 206 (for 100000 files)
GetFiles took 197 (for 100000 files), EnumerateDir took 195 (for 100000 files)
50-50
ASKER
Cheers Ark and Andy for the efforts!!!
And one more: your test conditions are incorrect since NO ONE file starts with "$", so both GetFiles and EnumerateFiles process ALL files. I've slightly modified your preparation sub:
GetFiles took 319 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 301 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 303 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 288 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 272 (for 50000 files)
GetFiles took 296 (for 50000 files), EnumerateDir took 312 (for 50000 files), NoMagic took 285 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 317 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 319 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 298 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
For l = 1 To 100000
s = String.Format("c:\zzTestzz1\{0}{1}.txt", If((l Mod 2) = 0, "$", ""), l)
' write files
Next
I also added EnumerateNoMagic function which is exactly same as EnumerateDirFn except of removing .ToArray() for fils:
fils = dir.EnumerateFiles("*.*").Where(Function(x) Not x.Name.StartsWith("$"))
Here is the results:GetFiles took 319 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 301 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 303 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 288 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 272 (for 50000 files)
GetFiles took 296 (for 50000 files), EnumerateDir took 312 (for 50000 files), NoMagic took 285 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 317 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 319 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 298 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
And finally for filenames instead of fileinfo:
GetFiles: 182 (for 50000 files), EnumerateDir: 172 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 183 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 176 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 182 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 175 (for 50000 files)
GetFiles: 192 (for 50000 files), EnumerateDir: 189 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 194 (for 50000 files), EnumerateDir: 177 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 196 (for 50000 files), EnumerateDir: 181 (for 50000 files), NoMagic: 178 (for 50000 files)
GetFiles: 190 (for 50000 files), EnumerateDir: 187 (for 50000 files), NoMagic: 172 (for 50000 files)
GetFiles: 193 (for 50000 files), EnumerateDir: 176 (for 50000 files), NoMagic: 178 (for 50000 files)
Private Function EnumerateDirFn2(sPath As String) As String
Dim l As Long, dt As DateTime
dt = DateTime.Now
' Directory.EnumerateFiles return full path
Dim fils = IO.Directory.EnumerateFiles(sPath, "*.*").
Where(Function(x) Not x.Contains("$")).ToArray()
For Each f In fils
l = l + 1
Next
Dim ts = DateTime.Now - dt
Return ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
Private Function GetFilesFn2(sPath As String) As String
Dim l As Long, dt As DateTime
dt = DateTime.Now
Dim fils = IO.Directory.GetFiles(sPath, "*.*")
For Each f In fils
If Not f.Contains("$") Then l = l + 1
Next
Dim ts = DateTime.Now - dt
Return ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
End Function
Result:GetFiles: 182 (for 50000 files), EnumerateDir: 172 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 183 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 176 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 182 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 175 (for 50000 files)
GetFiles: 192 (for 50000 files), EnumerateDir: 189 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 194 (for 50000 files), EnumerateDir: 177 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 196 (for 50000 files), EnumerateDir: 181 (for 50000 files), NoMagic: 178 (for 50000 files)
GetFiles: 190 (for 50000 files), EnumerateDir: 187 (for 50000 files), NoMagic: 172 (for 50000 files)
GetFiles: 193 (for 50000 files), EnumerateDir: 176 (for 50000 files), NoMagic: 178 (for 50000 files)
@Ark
>>I believed to MSDN that enumeratefiles is faster, but it was wrong
Correct and also wrong.
I believe it is taking just the same time as GetFiles. The difference is GetFiles is sync (completes before the next line of code runs) wheras EnumerateFiles is async (you can use the first files being returned before all files have been found). The problem with what you suggested is you forced the app to wait until EnumerateFiles finished and removed the performance advantage. Then using LINQ made a simple operation into a more complex line of code to understand.
>>I believed to MSDN that enumeratefiles is faster, but it was wrong
Correct and also wrong.
I believe it is taking just the same time as GetFiles. The difference is GetFiles is sync (completes before the next line of code runs) wheras EnumerateFiles is async (you can use the first files being returned before all files have been found). The problem with what you suggested is you forced the app to wait until EnumerateFiles finished and removed the performance advantage. Then using LINQ made a simple operation into a more complex line of code to understand.
Agree. I added .ToArray to allow using result array anywhere outside sub. But anyway using EnumerateFiles with inline filtering (.Where(...)) followed by synchronization (.ToArray()) is faster then GetFiles with further filtering if there is what to filter :)
otherwise
dir.GetFiles("a*.doc")
dir.GetFiles("b*.doc")
...
dir.GetFiles("z*.doc")
should they start with numbers or other characters you will have to do the equivalent for those as well