Solved

VB.net - Find filename with unique pattern using an xml file

Posted on 2013-05-29
18
383 Views
Last Modified: 2013-06-03
Afternoon,

I’m not sure where to start with this as I am still very new to VB.Net.

I have a button on my windows form that when clicked needs to search for files that have a specific name in them, these coming from my xml file, and move only the files it finds to another directory.

The file names in the directory start with an account number, followed by a dash, followed by the date, followed by another dash and then a code. So would look like this:

12345-20130529-FTO.pdf

In some cases the file will have something written after the code like:

12345-20130529-FTO B.pdf
12345-20130529-FTO Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf

The most important part of the file name which I want to go by is:

12345-20130529-FTO

Any file that does not have the above first part should not even be looked at. So for example the below files would be ignored from the pattern:

12345- 20130529-FTO
12345-20130529 FTO
12345 20130529 FTO
12345=20130529-FTO

So how would I have my code look at the xml file and find this pattern of the file name, to move or whatever the file to another location once found?

My XML structure looks like below:

<Settings>
  <ApplicationSettings>
    <code1> </code1>
    <docgroup1> </docgroup1>
    <doctype1> </doctype1>
    <docsubtype1> </docsubtype1>
    <code2> </code2>
    <docgroup2> </docgroup2>
    <doctype2> </doctype2>
    <docsubtype2> </docsubtype2>
      </ApplicationSettings>
</Settings>

I need the code1 and code2 elements used to go by what to search for.

All I have so far with my code is below, I just don’t know where to begin with finding this pattern.

Private Sub BackgroundWorker1_DoWork(sender As Object, e As ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork

        'create directory in input folder with timestamp as the directory name.

        Dim destdir As String = [String].Format("\\ServerA\ITDept\files\{0}", DateTime.Now.ToString("MMddyyyyhhmmss"))
        System.IO.Directory.CreateDirectory(destdir)

        'read directory and look for filename patterns that have code elements from xml file

        For Each fpattern As String In System.IO.Directory.GetFiles(TextBox1.Text)

        Next

Open in new window


Can anyone provide me with some sample code to get me started?

Regards,
N
0
Comment
Question by:nobushi
  • 10
  • 8
18 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 39206816
the xml doesn't contain any codes.
should there be something like FTO in the xml?
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39206824
to find all the files in a directory which follow to the naming convention u posted, use regex like this:
            Regex reg = new Regex(@"^(\d+)-(\d+)-FTO");
            var files = Directory.GetFiles(@"c:\temp", "*.pdf", SearchOption.AllDirectories).Where(path => reg.IsMatch(Path.GetFileNameWithoutExtension(path))); 

Open in new window


this code will go through all .pdf files under C:\temp (including sub directories) and match their names to the naming convention u posted.
0
 
LVL 1

Author Comment

by:nobushi
ID: 39207558
the xml example i gave is indeed empty as i was just showing the structure. Because this windows form is used as my "settings area" of my application. The information in the XML can be anything and not just FTO.

So the sample code you provided using regex wont work because the code will be different each time depending on what the users sets it to and will not always be FTO, as i want my application to look at the xml to determine what code to use to search.

Any other ideas?
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39207569
but u didn't explain how the code in the xml is used to search for the files.
can u give an example with actual codes in the xml?
0
 
LVL 1

Author Comment

by:nobushi
ID: 39207645
ah ok sorry.

So today we have the xml settings looking like below (tomorrow the user will change them to something else).

<Settings>
  <ApplicationSettings>
    <code1>FTO</code1>
    <docgroup1>Operations</docgroup1>
    <doctype1>Funds Transfer</doctype1>
    <docsubtype1>Out</docsubtype1>
    <code2>FTR</code2>
    <docgroup2>Operations</docgroup2>
    <doctype2>Funds Transfer</doctype2>
    <docsubtype2>Reversed</docsubtype2>
</ApplicationSettings>
</Settings>

So how would I be able to tell my program when i click a button to look at the xml values for <code1> (FTO) and <code2> (FTR) and use this to identify the files in a directory for a specific pattern.

So for example, we have the files below:

C:\....

12345-20130529-FTO B.pdf
12345-20130529-FTR Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf
12345- 20130529-FTO.pdf
12345-20130529 FTO.pdf
12345 20130529 FTO.pdf
12345=20130529-FTO.pdf
12345-20130530-FTO   .pdf
12345-20130530-FTR Buy Ret.pdf

Reading the <code1> (FTO) and <code2> (FTR) values from the xml file and with the specific pattern (account number-YYYYMMDD-code; example 12345-20130529 FTO) to look for in the file name the files below would have been found, because they match the pattern exactly.

12345-20130529-FTO B.pdf
12345-20130529-FTR Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf
12345-20130530-FTO   .pdf
12345-20130530-FTR Buy Ret.pdf

Hope that's clear enough, please let me know if you need me to explain anything else.

N
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39207656
so basically what i'm interesting in the xml is only the code1/code2 elements? (and maybe code3 and so on if available)
0
 
LVL 1

Author Comment

by:nobushi
ID: 39207675
Exactly, only code1/code2.

There are 4 of these in total in the XML, in case the user wants to add more.
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39207839
change the xml path before running the code:
            Regex regElemName = new Regex(@"^code");
            var root = XElement.Load(@"c:\temp\1.xml");
            var codeElements = root.Element("ApplicationSettings").Elements().Where(xe => regElemName.IsMatch(xe.Name.LocalName)).Select(xe => xe.Value);
            var codes = string.Join("|", codeElements.ToArray());
            Regex regFileName = new Regex(string.Format(@"^(\d+)-(\d+)-{0}", codes));
            var files = Directory.GetFiles(@"\\ServerA\ITDept\files", "*.pdf", SearchOption.AllDirectories).Where(path => regFileName.IsMatch(Path.GetFileNameWithoutExtension(path)));
            foreach (var file in files)
            {
                Console.WriteLine(file);
            }

Open in new window

0
 
LVL 1

Author Comment

by:nobushi
ID: 39207935
Will give this a try in a bit.

Quick question is this VB.Net or another language?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 42

Expert Comment

by:sedgwick
ID: 39207948
its C#, do u want it in vb.net?
0
 
LVL 1

Author Comment

by:nobushi
ID: 39207952
Yes please as my whole application is in VB.Net.
0
 
LVL 42

Accepted Solution

by:
sedgwick earned 500 total points
ID: 39207954
Dim regElemName As New Regex("^code")
Dim root = XElement.Load("c:\temp\1.xml")
Dim codeElements = root.Element("ApplicationSettings").Elements().Where(Function(xe) regElemName.IsMatch(xe.Name.LocalName)).Select(Function(xe) xe.Value)
Dim codes = String.Join("|", codeElements.ToArray())
Dim regFileName As New Regex(String.Format("^(\d+)-(\d+)-{0}", codes))
Dim files = Directory.GetFiles("\\ServerA\ITDept\files", "*.pdf", SearchOption.AllDirectories).Where(Function(path) regFileName.IsMatch(Path.GetFileNameWithoutExtension(path)))
For Each file As var In files
	Console.WriteLine(file)
Next

Open in new window

0
 
LVL 1

Author Comment

by:nobushi
ID: 39208858
Thank you sedgwick, this looks good, but the pattern is a little off.

Below is the list of files in the source directory which I ran through the application.

11111-20130529-FTO AMEX.pdf
12111-20130220-LNS INT PYMTpdf.pdf
12222-20130304- FTI-LN INT REC BSI.pdf
12345-20130304-FTI-PART PRIN PMT.pdf
21345-20130315- LNS PARTL PYMTS.pdf
22041 20130320 DCI.pdf
22041 20130320 INT.pdf
22042-20130514-FTO A.pdf
22042-20130514-FTO B.pdf
22043-20130524-FTO EUR.pdf
22043-20130524-FTO USD.pdf
22222-20130328-TBA.pdf
22433-20130204-ACL.pdf
32145-20130315-FTI PARTL LN PMT.pdf
33333-20130205-TBA.pdf
34214-20130320 PARTL INT PYMTpdf.pdf
44444-20130304-TBA a.pdf
54321-20130304-INT PART PRIN & INT PMT LN.pdf
54322-20130204-ACL.pdf
55555-20130315- ACL A.pdf
66666-20130402-TBA b.pdf
77777-20130402-TBA.pdf
88888-20130403- FTO B.pdf
99999=20130305-LN SCHEDULE CHG.pdf

The output that was written is below. The codes used from the XML file were "FTO", "ACL", "TBA" and "INT". Some files shouldn't have been picked up which i have identified with a comment.

11111-20130529-FTO AMEX.pdf
12111-20130220-LNS INT PYMTpdf.pdf <-- Has an LNS code after the dash.
12222-20130304- FTI-LN INT REC BSI.pdf <-- Has a space after the - followed by the code
22041 20130320 INT.pdf <-- does not contain any dashes
22042-20130514-FTO A.pdf
22042-20130514-FTO B.pdf
22043-20130524-FTO EUR.pdf
22043-20130524-FTO USD.pdf
22222-20130328-TBA.pdf
22433-20130204-ACL.pdf
33333-20130205-TBA.pdf
34214-20130320 PARTL INT PYMTpdf.pdf <-- does not have a dash and PARTL is not a code
44444-20130304-TBA a.pdf
54321-20130304-INT PART PRIN & INT PMT LN.pdf
54322-20130204-ACL.pdf
55555-20130315- ACL A.pdf <-- Has a space after the dash
66666-20130402-TBA b.pdf
77777-20130402-TBA.pdf

The application must only look at file names that have the below format.

AccNumb-YYYYMMDD-CODE
Example: 12345-20130530-TBA.pdf

Any character after the code is fine so these would be accepted:

66666-20130402-TBA b.pdf
34214-20130320-FTO PARTL INT PYMTpdf.pdf
22043-20130524-FTO USD.pdf

Are you able to filter the pattern a little more?

N
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39209926
Ill examine the files and get back to u
0
 
LVL 1

Author Comment

by:nobushi
ID: 39210576
Thanks Sedgwick.

I've been looking in to this as well and found a piece of software called "Regexbuddy" (http://www.regexbuddy.com/)

To try it out I used the software to generate this code for the pattern:

\d{5}?[-]\d{8}?[-]

Dim regFileName As New Regex(String.Format("\d{5}?[-]\d{8}?[-]", codes))

Open in new window


But I am getting an error of "Additional information: Index (zero based) must be greater than or equal to zero and less than the size of the argument list." Don't really know what this means though.

I'll keep at it until I hear back from you.

Thanks,
N
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39213936
change the regex pattern to this:
"^(\d+)-(\d+)-[{0}]"

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39213944
change line 5 to:
Dim regFileName As New Regex(string.Format(@"^(\d+){0}-(\d+){1}-[{2}]", "{5}","{8}",codes))

Open in new window

0
 
LVL 1

Author Comment

by:nobushi
ID: 39216233
Thanks for your effort Sedgwick but that didnt seem to work.

As using that new adjustment gave me errors:

1. Expression expected
2. 'regFileName' is not declared. It mat be inaccessible due to its protection level.

No worries though as I sorted myself out with this code:

Dim regFileName As New Regex(String.Format("^\d+\-(?<Year>(19|20)[0-9][0-9])(?<Month>0[1-9]|12|11|10)(?<Day>[12]\d|0[1-9]|3[01])\-{0}$", codes))

Open in new window

0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now