find filenames in a text file

Hi

In my VB net 2008 app I get a filename from the end user. That file is a text file. The text file contains a lot of information, but it also might contain one or more names of files. these files are all *.bin files.

Now I want to search through the file and build a list of filenames it contains.

I'm thinking that I somehow need to use regular expressions and do something recursive.


The file could look kind of like this:

blah blah blah file1.bin blah blah
blah blah
file2.bin
blah blah anotherfile.bin blah
blah blah

So I want my list to look like:
file1.bin
file2.bin
anotherfile.bin


What is the best and most elegant way for me to do it?
(I can use a C# example. It does not HAVE to be VB)

Thanks in advance
liversen
LVL 1
liversenAsked:
Who is Participating?
 
Pui_YunCommented:
Hi Liversen,
I think I know what you want to do, basically it looks like you want to recursive search through files the same way a web crawler searches through web sites.  I've included the code for you to do this (in C#) along with this regular expression:

(?i)\b\S*?\.bin\b

the (?i) is the case insensitive modifier in case you encounter a BIN, Bin or biN.  
\S is the negation for whitespace (i.e. everything, but whitespaces)
*? - is the non-greedy matching, it will match until the first instance of .bin
Here's a good tutorial website that I use for regular expressions:
http://www.codeproject.com/KB/dotnet/regextutorial.aspx

The list of files will be stored in a generic List of string (or List<string>), which is cleared in the CallingFunction and dumped out after ProcessFile is finished its recursive run.  I think you can figure the rest out from this.  :)

Enjoy, hope this helps.
P.
List<string> lFileNames = new List<string>();

        public void ProcessFile(string strFilePath)
        {
            string strFileContent = "";
            using (StreamReader srInput = new StreamReader(strFilePath))
            {
                strFileContent = srInput.ReadToEnd();
            }
            Regex myRegEx = new Regex(@"(?i)\b\S*?\.bin\b");
            foreach (Match myMatch in myRegEx.Matches(strFileContent))
            {
                if (!lFileNames.Contains(myMatch.Value) && System.IO.File.Exists(myMatch.Value))
                {
                    lFileNames.Add(myMatch.Value);
                    ProcessFile(myMatch.Value);
                }
            }
        }

        public void CallingFunction()
        {
            lFileNames.Clear();
            ProcessFile(@"C:\Temp\myBinFile.bin");
            foreach (string strFileNameEncountered in lFileNames)
            {
                Console.WriteLine(strFileNameEncountered);
            }
        }

Open in new window

0
 
HainKurtSr. System AnalystCommented:
simplest way is

open the file
loop until eof
 l = readline(file)
 ie = pos(l, ".bin")+4
 if ie>0 then
   is = l.lastindexof(l.left(ie), " ")
   fn = l.substring(is,ie)
   processfile(fn)
 end if
end loop

mostly pseudo code ;)
0
 
nordtorpCommented:
This is an example where it checks a selected folder for files and outputs them, not exactly what you want, but could get you started:
http://www.ostrosoft.com/vb/projects/FindFileByExt/index.asp

Regular Expressions in C# System.Text.RegularExpressions
http://p2p.wrox.com/content/articles/regular-expressions-c-systemtextregularexpressions?page=0,1
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
liversenAuthor Commented:
Right now i kind of ended up with this:

 Regex myRegEx = new Regex(@"([A-Za-z0-9-#!\$%&'\(\)@\^_`{}~+,\.;=\[\]]+\.([bB][iI][nN]))");
            MatchCollection  myMatches= myRegEx.Matches(strFileContent);
            for(int i=0;i<myMatches.Count;i++)
            {
                Console.WriteLine(myMatches[i]);
            }

Please comment/Improve  :o)
0
 
Jon500Commented:
I think what you want to do is search the whole file in one felled swoop.

To do this, read the entire file into a string, then use Regex.IsMatch to see if your string is in the string representing the file contents. (Note that this approach is good only for reasonably small files).

Regards,
Jon500
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace searchTxtFile
{
    class searchTxt
    {
        static void Main()
        {
            string fileName= @" ";   //path to text file
            StreamReader srFile = new StreamReader(fileName);
            string allRead = srFile.ReadToEnd();//Reads the whole text file to the end
            srFile.Close(); //Closes the text file after it is fully read.
            string regMatch = " ";//string to search for inside of text file. It is case sensitive.
            if (Regex.IsMatch(allRead,regMatch))//If the match is found in allRead
            {
                Console.WriteLine("found\n");

            }
            else
            {
                Console.WriteLine("not found\n");
            }

        }
    }
}

Open in new window

0
 
Helen FeddemaCommented:
I was thinking along the same lines.  Here is some code using components of the FileSystemObject:
Public Sub GetFileNames()

   Dim strCurrentPath As String
   Dim fso As New Scripting.FileSystemObject
   Dim fil As Scripting.File
   Dim ts As Scripting.TextStream
   Dim strTextFile As String
   Dim strLine As String
   Dim strSQL As String
   Dim dbs As DAO.Database
   Dim rst As DAO.Recordset
   Dim strFileName As String
   Dim intUBound As Integer
   Dim strFileNames() As String
   Dim i As Integer
   
   'Create a table to hold file names
   strSQL = "CREATE TABLE " & "tblFileNames" & _
      "(FileName TEXT (100));"
   DoCmd.RunSQL strSQL
   Set dbs = CurrentDb
   Set rst = dbs.OpenRecordset("tblFileNames")
   
   strTextFile = "File Names.txt"
   strCurrentPath = Application.CurrentProject.Path
   strTextFile = strCurrentPath & "\" & strTextFile
   Set ts = fso.OpenTextFile(FileName:=strTextFile, _
      IOMode:=ForReading)
   Do While Not ts.AtEndOfStream
      strLine = ts.ReadLine
      
      'Set up array of file names from line
      strFileNames = Split(strLine, " ", -1, vbTextCompare)
      intUBound = UBound(strFileNames)
      
      For i = 0 To intUBound
         Debug.Print strFileNames(i)
         If Right(strFileNames(i), 3) = "bin" Then
            rst.AddNew
            rst![FileName] = strFileNames(i)
            rst.Update
         End If
      Next i
   Loop
   
End Sub

Open in new window

0
 
Helen FeddemaCommented:
And here is the resulting table:
tblFileNames.jpg
0
 
Jon500Commented:
I'm sorry. I see the issue is your RegEx string--not how to read the file. Sorry. I may try again...

Jon500
0
 
Helen FeddemaCommented:
My code was VBA.
0
 
Helen FeddemaCommented:
If you have to use some other dialect, maybe it has a function similar to Split that you could use to extract the file names from the lines of text.
0
 
Helen FeddemaCommented:
In a real-life situation, you would probably want to check whether a file name is already in the table, before entering it.
0
 
Jon500Commented:
Can't the RegEx string simply be (see code block)?

Jon500
.*\.bin


Thus:
Regex myRegEx = new Regex(@"(.*\.bin
)");
           MatchCollection  myMatches= myRegEx.Matches(strFileContent);
           for(int i=0;i<myMatches.Count;i++)
           {
               Console.WriteLine(myMatches[i]);
           }

Open in new window

0
 
Jon500Commented:
Another comment: I just realized that my example would work only if you read line-by-line and if a ".bin" file name is assured of being on a separate line. You can then use the approach described by HainKurt and use my RegEx expression.

Jon500
0
 
liversenAuthor Commented:
Thanks guys  :o)

/liversen
0
 
Jon500Commented:
@Pui:

Thank you for your answer. Very very informative.

Jon500
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.