Link to home
Start Free TrialLog in
Avatar of RIAS
RIASFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Getfiles in vb.net

Hello,

 dir.GetFiles("*.doc")
gets files starting with $ as well how can I avoid that?

Thanks
Avatar of AndyAinscow
AndyAinscow
Flag of Switzerland image

Well you do ask it to return files with a $  (or any other character) at the start.  Probably the simplest is just to not actually process any files where the name starts with the $ character.

otherwise
 dir.GetFiles("a*.doc")
 dir.GetFiles("b*.doc")
...
 dir.GetFiles("z*.doc")

should they start with numbers or other characters you will have to do the equivalent for those as well
dir.EnumerateFiles("*.doc").Where(Function(x) Not x.StartsWith("$")).ToArray()

Open in new window

Avatar of RIAS

ASKER

Ark,

Thanks but, got an error
Startswith is not a member of fileinfo?
Avatar of RIAS

ASKER

  Private Sub LoadReports()
        Dim dir As New IO.DirectoryInfo("C:\Users\admin.axv\Docnts\TTemplates")
        Dim fils As IO.FileInfo() = dir.GetFiles("*.doc")


        Dim f As IO.FileInfo

        Dim dt As New DataTable
        Dim col As New DataColumn() With {.ColumnName = "Select", .DataType = GetType(Boolean)}
        dt.Columns.Add(col)
        col = New DataColumn With {.ColumnName = "FileName", .DataType = GetType(String), .Caption = "File Name"}
        dt.Columns.Add(col)

        For Each f In fils
            Dim dtrn As DataRow = dt.NewRow
            dtrn("FileName") = f.Name
            dt.Rows.Add(dtrn)
        Next
        With ReportList
            .DataSource = dt
            .DisplayMember = "Filename"
            .ValueMember = "Filename"

        End With

    End Sub

Open in new window

SOLUTION
Avatar of AndyAinscow
AndyAinscow
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of RIAS

ASKER

Andy,Thanks
Trying your solution..
Dim dir As New IO.DirectoryInfo("C:\Users\admin.axv\Docnts\TTemplates")
Dim fils As IO.FileInfo() = dir.EnumerateFiles("*.doc").Where(Function(x) Not x.Name.StartsWith("$")).ToArray()
'or, to get an array of file names instead of FileInfo:
'Dim fils As String() = dir.EnumerateFiles("*.doc").Where(Function(x) Not x.Name.StartsWith("$")).Select(Function(x) x.Name).ToArray()
'Rest of your code

Open in new window

Avatar of RIAS

ASKER

Ark,
Thanks tried your solution but,got same error.

Cheers!
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Ark
The best approach in your case is
Would EnumerateFiles be better than GetFiles in this case because you are processing the results set it returns to strip the 'false' matches out before creating the final array being used.
Avatar of RIAS

ASKER

Ark what is x?
Dim d = New System.IO.DirectoryInfo("E:\")
        Dim c = d.EnumerateFiles("*.txt").Where(Function(f) Not f.Name.StartsWith("$")).ToList()
        Dim g = c

Open in new window

@Andy:
The EnumerateFiles and GetFiles methods differ as follows:
When you use EnumerateFiles, you can start enumerating the collection of FileInfo objects before the whole collection is returned.
When you use GetFiles, you must wait for the whole array of FileInfo objects to be returned before you can access the array.
Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
So, if you need ALL files (matching pattern) - there is no difference, otherwise - EnumerateFiles is faster
@RIAS x as LINQ variable
@Ark - I know that BUT you are processing the async method with a synchronous method to get the final array.  I doubt that is more efficient than just getting the array synchronously  which is why I had: in this case.  Maybe you would want to test the two methods.  It would be useful to know if it actually was more efficient.
I just added .ToArray to get an array of strings, Of course, faster continue with async.
Sorry, have no time just now check if your suggestion is faster then mine - will check is tomorrow
Avatar of RIAS

ASKER

Ark,
Now it worked. have requested for reopening the question so cn split points.
Cheers
Avatar of RIAS

ASKER

Thanks!
@Andy, I checked both approaches. Unfortunatelly, it can not be done directly on IO - Windows cache IO requests and I don't know how exactly. So second call to same request either async=>sync or reverse is always much faster then first. So I used following code (though Net cache LINQ queries as well):
    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        Dim files = IO.Directory.GetFiles("c:\windows\system").AsEnumerable
        Dim sw As New Stopwatch, ticks As Long
        For attempt = 1 To 10
            sw.Restart()
            For i = 1 To 1000
                Dim fils3 = files.Where(Function(x) Not x.StartsWith("a")).ToArray() 
            Next
            ticks = sw.ElapsedTicks
            Debug.Print("async=>sync = " & ticks)
            sw.Restart()
            For i = 1 To 1000
                Dim fils2 = files.ToArray.Where(Function(x) Not x.StartsWith("a")).ToArray()
            Next
            ticks = sw.ElapsedTicks
            Debug.Print("sync=>async = " & ticks)
        Next
    End Sub

Open in new window

which proove that async=>sync is faster
That code is not comparing my suggestion with yours.
@Ark, thanks for actually testing things as I found it hard to believe the EnumerateFiles offered any advantage in this specific case.


A little warmup exercise for the day.

Create loads of files in a directory
        Dim l As Long, s As String, file As System.IO.StreamWriter
        For l = 1 To 100000
            s = "C:\zzTestzz1\" & l.ToString() & ".txt"
            file = My.Computer.FileSystem.OpenTextFileWriter(s, True)
            file.WriteLine("Here is the first string.")
            file.Close()
        Next

Open in new window


I then copied the directory twice using windows explorer


I then ran the following code (I'm not supplying the code to MagicFn)
   Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim sTotal As String

        sTotal = "GetFiles took " & GetFilesFn("C:\zzTestzz2")
        sTotal = sTotal & ", EnumerateDir took " & EnumerateDirFn("C:\zzTestzz3")
        sTotal = sTotal & ", MagicFn took " & MagicFn("C:\zzTestzz1")
        MessageBox.Show(sTotal)
    End Sub

    Private Function EnumerateDirFn(sPath As String) As String
        Dim f As IO.FileInfo
        Dim l As Long
        Dim dir As New IO.DirectoryInfo(sPath)
        Dim dt As DateTime, ts As TimeSpan
        Dim fils As IO.FileInfo()
        l = 0

        dt = DateTime.Now
        fils = dir.EnumerateFiles("*.*").Where(Function(x) Not x.Name.StartsWith("$")).ToArray()

        For Each f In fils
            l = l + 1
        Next
        ts = DateTime.Now - dt
        EnumerateDirFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function

    Private Function GetFilesFn(sPath As String) As String
        Dim f As IO.FileInfo, l As Long
        Dim dir As New IO.DirectoryInfo(sPath)
        Dim dt As DateTime, ts As TimeSpan
        Dim fils As IO.FileInfo()
        l = 0

        dt = DateTime.Now
        fils = dir.GetFiles("*.*")
        For Each f In fils
            If Not f.Name.StartsWith("$") Then
                l = l + 1
            End If
        Next
        ts = DateTime.Now - dt
        GetFilesFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function

Open in new window


Which consistently gives results like this:
User generated image

so, sorry Ark, your 'best approach' actually consistently gives the worst performance, my suggestion runs about 10% faster and a further performance related improvement is even faster still.  Now speed isn't everything but the question author didn't understand your code whereas the simple change to the original code with the 'if' statement results in simple and easy to understand code.

Back to reality - consider what is to be done and in general the EnumerateFiles is likely to be more useful but don't just use it blindly.

ps.  My highest performing code (MagicFn) could still be improved but when one looks at the numbers of files involved the end user won't notice any difference.  So go with the simplest code that is easy to maintain.
Hmmm, it's really strange. I believed to MSDN that enumeratefiles is faster, but it was wrong. And IMHO this is NOT an issue of async=>sync calls. Seems EnumerateFiles performs some additional operations (probably marshaling structutres to management code). A bit changed code to eliminate structures:
    Private Function EnumerateDirFn(sPath As String) As String
        Dim l As Long, dt As DateTime
        dt = DateTime.Now
        Dim fils = IO.Directory.EnumerateFiles(sPath, "*.*").
                                Where(Function(x) Not x.StartsWith("$", StringComparison.Ordinal)).ToArray()
        For Each f In fils
            l = l + 1
        Next
        Dim ts = DateTime.Now - dt
        EnumerateDirFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function

    Private Function GetFilesFn(sPath As String) As String
        Dim l As Long, dt As DateTime
        dt = DateTime.Now
        Dim fils = IO.Directory.GetFiles(sPath, "*.*")
        For Each f In fils
            If Not f.StartsWith("$", StringComparison.Ordinal) Then
                l = l + 1
            End If
        Next
        Dim ts = DateTime.Now - dt
        GetFilesFn = ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function

Open in new window

produce same results for both functions.
BTW, EnumerateDir without calling .ToArray (ie continue with async) gives results comparable with your "MagicFn"
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    For i = 1 To 10
        Dim sTotal As String
        sTotal = "GetFiles took " & GetFilesFn("C:\zzTestzz2")
        sTotal = sTotal & ", EnumerateDir took " & EnumerateDirFn("C:\zzTestzz3")
        Debug.Print(sTotal)
    Next
End Sub

Open in new window

GetFiles took 197 (for 100000 files), EnumerateDir took 201 (for 100000 files)
GetFiles took 202 (for 100000 files), EnumerateDir took 201 (for 100000 files)
GetFiles took 191 (for 100000 files), EnumerateDir took 206 (for 100000 files)
GetFiles took 197 (for 100000 files), EnumerateDir took 195 (for 100000 files)
GetFiles took 201 (for 100000 files), EnumerateDir took 194 (for 100000 files)
GetFiles took 205 (for 100000 files), EnumerateDir took 202 (for 100000 files)
GetFiles took 208 (for 100000 files), EnumerateDir took 215 (for 100000 files)
GetFiles took 194 (for 100000 files), EnumerateDir took 207 (for 100000 files)
GetFiles took 190 (for 100000 files), EnumerateDir took 206 (for 100000 files)
GetFiles took 197 (for 100000 files), EnumerateDir took 195 (for 100000 files)
50-50
Avatar of RIAS

ASKER

Cheers Ark and Andy for the efforts!!!
And one more: your test conditions are incorrect since NO ONE file starts with "$", so both GetFiles and EnumerateFiles process ALL files. I've slightly modified your preparation sub:
For l = 1 To 100000
    s = String.Format("c:\zzTestzz1\{0}{1}.txt", If((l Mod 2) = 0, "$", ""), l)
' write files
Next

Open in new window

I also added EnumerateNoMagic function which is exactly same as EnumerateDirFn except of removing .ToArray() for fils:
fils = dir.EnumerateFiles("*.*").Where(Function(x) Not x.Name.StartsWith("$"))

Open in new window

Here is the results:
GetFiles took 319 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 301 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 303 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 288 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 272 (for 50000 files)
GetFiles took 296 (for 50000 files), EnumerateDir took 312 (for 50000 files), NoMagic took 285 (for 50000 files)
GetFiles took 312 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 317 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 319 (for 50000 files), EnumerateDir took 280 (for 50000 files), NoMagic took 265 (for 50000 files)
GetFiles took 298 (for 50000 files), EnumerateDir took 296 (for 50000 files), NoMagic took 265 (for 50000 files)
And finally for filenames instead of fileinfo:
    Private Function EnumerateDirFn2(sPath As String) As String
        Dim l As Long, dt As DateTime
        dt = DateTime.Now
        ' Directory.EnumerateFiles return full path
        Dim fils = IO.Directory.EnumerateFiles(sPath, "*.*").
                                Where(Function(x) Not x.Contains("$")).ToArray()
        For Each f In fils
            l = l + 1
        Next
        Dim ts = DateTime.Now - dt
        Return ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function
    Private Function GetFilesFn2(sPath As String) As String
        Dim l As Long, dt As DateTime
        dt = DateTime.Now
        Dim fils = IO.Directory.GetFiles(sPath, "*.*")
        For Each f In fils
            If Not f.Contains("$") Then l = l + 1
        Next
        Dim ts = DateTime.Now - dt
        Return ts.Milliseconds.ToString & " (for " & l.ToString & " files)"
    End Function

Open in new window

Result:
GetFiles: 182 (for 50000 files), EnumerateDir: 172 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 183 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 176 (for 50000 files)
GetFiles: 191 (for 50000 files), EnumerateDir: 182 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 197 (for 50000 files), EnumerateDir: 180 (for 50000 files), NoMagic: 175 (for 50000 files)
GetFiles: 192 (for 50000 files), EnumerateDir: 189 (for 50000 files), NoMagic: 174 (for 50000 files)
GetFiles: 194 (for 50000 files), EnumerateDir: 177 (for 50000 files), NoMagic: 173 (for 50000 files)
GetFiles: 196 (for 50000 files), EnumerateDir: 181 (for 50000 files), NoMagic: 178 (for 50000 files)
GetFiles: 190 (for 50000 files), EnumerateDir: 187 (for 50000 files), NoMagic: 172 (for 50000 files)
GetFiles: 193 (for 50000 files), EnumerateDir: 176 (for 50000 files), NoMagic: 178 (for 50000 files)
@Ark
>>I believed to MSDN that enumeratefiles is faster, but it was wrong
Correct and also wrong.
I believe it is taking just the same time as GetFiles.  The difference is GetFiles is sync (completes before the next line of code runs) wheras EnumerateFiles is async (you can use the first files being returned before all files have been found).  The problem with what you suggested is you forced the app to wait until EnumerateFiles  finished and removed the performance advantage.  Then using LINQ made a simple operation into a more complex line of code to understand.
Agree. I added .ToArray to allow using result array anywhere outside sub. But anyway using EnumerateFiles with inline filtering (.Where(...)) followed by synchronization (.ToArray()) is faster then GetFiles with further filtering if there is what to filter :)