Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 407
  • Last Modified:

VB.net - Find filename with unique pattern using an xml file

Afternoon,

I’m not sure where to start with this as I am still very new to VB.Net.

I have a button on my windows form that when clicked needs to search for files that have a specific name in them, these coming from my xml file, and move only the files it finds to another directory.

The file names in the directory start with an account number, followed by a dash, followed by the date, followed by another dash and then a code. So would look like this:

12345-20130529-FTO.pdf

In some cases the file will have something written after the code like:

12345-20130529-FTO B.pdf
12345-20130529-FTO Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf

The most important part of the file name which I want to go by is:

12345-20130529-FTO

Any file that does not have the above first part should not even be looked at. So for example the below files would be ignored from the pattern:

12345- 20130529-FTO
12345-20130529 FTO
12345 20130529 FTO
12345=20130529-FTO

So how would I have my code look at the xml file and find this pattern of the file name, to move or whatever the file to another location once found?

My XML structure looks like below:

<Settings>
  <ApplicationSettings>
    <code1> </code1>
    <docgroup1> </docgroup1>
    <doctype1> </doctype1>
    <docsubtype1> </docsubtype1>
    <code2> </code2>
    <docgroup2> </docgroup2>
    <doctype2> </doctype2>
    <docsubtype2> </docsubtype2>
      </ApplicationSettings>
</Settings>

I need the code1 and code2 elements used to go by what to search for.

All I have so far with my code is below, I just don’t know where to begin with finding this pattern.

Private Sub BackgroundWorker1_DoWork(sender As Object, e As ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork

        'create directory in input folder with timestamp as the directory name.

        Dim destdir As String = [String].Format("\\ServerA\ITDept\files\{0}", DateTime.Now.ToString("MMddyyyyhhmmss"))
        System.IO.Directory.CreateDirectory(destdir)

        'read directory and look for filename patterns that have code elements from xml file

        For Each fpattern As String In System.IO.Directory.GetFiles(TextBox1.Text)

        Next

Open in new window


Can anyone provide me with some sample code to get me started?

Regards,
N
0
nobushi
Asked:
nobushi
  • 10
  • 8
1 Solution
 
Meir RivkinFull stack Software EngineerCommented:
the xml doesn't contain any codes.
should there be something like FTO in the xml?
0
 
Meir RivkinFull stack Software EngineerCommented:
to find all the files in a directory which follow to the naming convention u posted, use regex like this:
            Regex reg = new Regex(@"^(\d+)-(\d+)-FTO");
            var files = Directory.GetFiles(@"c:\temp", "*.pdf", SearchOption.AllDirectories).Where(path => reg.IsMatch(Path.GetFileNameWithoutExtension(path))); 

Open in new window


this code will go through all .pdf files under C:\temp (including sub directories) and match their names to the naming convention u posted.
0
 
nobushiAuthor Commented:
the xml example i gave is indeed empty as i was just showing the structure. Because this windows form is used as my "settings area" of my application. The information in the XML can be anything and not just FTO.

So the sample code you provided using regex wont work because the code will be different each time depending on what the users sets it to and will not always be FTO, as i want my application to look at the xml to determine what code to use to search.

Any other ideas?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
Meir RivkinFull stack Software EngineerCommented:
but u didn't explain how the code in the xml is used to search for the files.
can u give an example with actual codes in the xml?
0
 
nobushiAuthor Commented:
ah ok sorry.

So today we have the xml settings looking like below (tomorrow the user will change them to something else).

<Settings>
  <ApplicationSettings>
    <code1>FTO</code1>
    <docgroup1>Operations</docgroup1>
    <doctype1>Funds Transfer</doctype1>
    <docsubtype1>Out</docsubtype1>
    <code2>FTR</code2>
    <docgroup2>Operations</docgroup2>
    <doctype2>Funds Transfer</doctype2>
    <docsubtype2>Reversed</docsubtype2>
</ApplicationSettings>
</Settings>

So how would I be able to tell my program when i click a button to look at the xml values for <code1> (FTO) and <code2> (FTR) and use this to identify the files in a directory for a specific pattern.

So for example, we have the files below:

C:\....

12345-20130529-FTO B.pdf
12345-20130529-FTR Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf
12345- 20130529-FTO.pdf
12345-20130529 FTO.pdf
12345 20130529 FTO.pdf
12345=20130529-FTO.pdf
12345-20130530-FTO   .pdf
12345-20130530-FTR Buy Ret.pdf

Reading the <code1> (FTO) and <code2> (FTR) values from the xml file and with the specific pattern (account number-YYYYMMDD-code; example 12345-20130529 FTO) to look for in the file name the files below would have been found, because they match the pattern exactly.

12345-20130529-FTO B.pdf
12345-20130529-FTR Optimal CH.pdf
12345-20130529-FTO Cancel per request.pdf
12345-20130530-FTO   .pdf
12345-20130530-FTR Buy Ret.pdf

Hope that's clear enough, please let me know if you need me to explain anything else.

N
0
 
Meir RivkinFull stack Software EngineerCommented:
so basically what i'm interesting in the xml is only the code1/code2 elements? (and maybe code3 and so on if available)
0
 
nobushiAuthor Commented:
Exactly, only code1/code2.

There are 4 of these in total in the XML, in case the user wants to add more.
0
 
Meir RivkinFull stack Software EngineerCommented:
change the xml path before running the code:
            Regex regElemName = new Regex(@"^code");
            var root = XElement.Load(@"c:\temp\1.xml");
            var codeElements = root.Element("ApplicationSettings").Elements().Where(xe => regElemName.IsMatch(xe.Name.LocalName)).Select(xe => xe.Value);
            var codes = string.Join("|", codeElements.ToArray());
            Regex regFileName = new Regex(string.Format(@"^(\d+)-(\d+)-{0}", codes));
            var files = Directory.GetFiles(@"\\ServerA\ITDept\files", "*.pdf", SearchOption.AllDirectories).Where(path => regFileName.IsMatch(Path.GetFileNameWithoutExtension(path)));
            foreach (var file in files)
            {
                Console.WriteLine(file);
            }

Open in new window

0
 
nobushiAuthor Commented:
Will give this a try in a bit.

Quick question is this VB.Net or another language?
0
 
Meir RivkinFull stack Software EngineerCommented:
its C#, do u want it in vb.net?
0
 
nobushiAuthor Commented:
Yes please as my whole application is in VB.Net.
0
 
Meir RivkinFull stack Software EngineerCommented:
Dim regElemName As New Regex("^code")
Dim root = XElement.Load("c:\temp\1.xml")
Dim codeElements = root.Element("ApplicationSettings").Elements().Where(Function(xe) regElemName.IsMatch(xe.Name.LocalName)).Select(Function(xe) xe.Value)
Dim codes = String.Join("|", codeElements.ToArray())
Dim regFileName As New Regex(String.Format("^(\d+)-(\d+)-{0}", codes))
Dim files = Directory.GetFiles("\\ServerA\ITDept\files", "*.pdf", SearchOption.AllDirectories).Where(Function(path) regFileName.IsMatch(Path.GetFileNameWithoutExtension(path)))
For Each file As var In files
	Console.WriteLine(file)
Next

Open in new window

0
 
nobushiAuthor Commented:
Thank you sedgwick, this looks good, but the pattern is a little off.

Below is the list of files in the source directory which I ran through the application.

11111-20130529-FTO AMEX.pdf
12111-20130220-LNS INT PYMTpdf.pdf
12222-20130304- FTI-LN INT REC BSI.pdf
12345-20130304-FTI-PART PRIN PMT.pdf
21345-20130315- LNS PARTL PYMTS.pdf
22041 20130320 DCI.pdf
22041 20130320 INT.pdf
22042-20130514-FTO A.pdf
22042-20130514-FTO B.pdf
22043-20130524-FTO EUR.pdf
22043-20130524-FTO USD.pdf
22222-20130328-TBA.pdf
22433-20130204-ACL.pdf
32145-20130315-FTI PARTL LN PMT.pdf
33333-20130205-TBA.pdf
34214-20130320 PARTL INT PYMTpdf.pdf
44444-20130304-TBA a.pdf
54321-20130304-INT PART PRIN & INT PMT LN.pdf
54322-20130204-ACL.pdf
55555-20130315- ACL A.pdf
66666-20130402-TBA b.pdf
77777-20130402-TBA.pdf
88888-20130403- FTO B.pdf
99999=20130305-LN SCHEDULE CHG.pdf

The output that was written is below. The codes used from the XML file were "FTO", "ACL", "TBA" and "INT". Some files shouldn't have been picked up which i have identified with a comment.

11111-20130529-FTO AMEX.pdf
12111-20130220-LNS INT PYMTpdf.pdf <-- Has an LNS code after the dash.
12222-20130304- FTI-LN INT REC BSI.pdf <-- Has a space after the - followed by the code
22041 20130320 INT.pdf <-- does not contain any dashes
22042-20130514-FTO A.pdf
22042-20130514-FTO B.pdf
22043-20130524-FTO EUR.pdf
22043-20130524-FTO USD.pdf
22222-20130328-TBA.pdf
22433-20130204-ACL.pdf
33333-20130205-TBA.pdf
34214-20130320 PARTL INT PYMTpdf.pdf <-- does not have a dash and PARTL is not a code
44444-20130304-TBA a.pdf
54321-20130304-INT PART PRIN & INT PMT LN.pdf
54322-20130204-ACL.pdf
55555-20130315- ACL A.pdf <-- Has a space after the dash
66666-20130402-TBA b.pdf
77777-20130402-TBA.pdf

The application must only look at file names that have the below format.

AccNumb-YYYYMMDD-CODE
Example: 12345-20130530-TBA.pdf

Any character after the code is fine so these would be accepted:

66666-20130402-TBA b.pdf
34214-20130320-FTO PARTL INT PYMTpdf.pdf
22043-20130524-FTO USD.pdf

Are you able to filter the pattern a little more?

N
0
 
Meir RivkinFull stack Software EngineerCommented:
Ill examine the files and get back to u
0
 
nobushiAuthor Commented:
Thanks Sedgwick.

I've been looking in to this as well and found a piece of software called "Regexbuddy" (http://www.regexbuddy.com/)

To try it out I used the software to generate this code for the pattern:

\d{5}?[-]\d{8}?[-]

Dim regFileName As New Regex(String.Format("\d{5}?[-]\d{8}?[-]", codes))

Open in new window


But I am getting an error of "Additional information: Index (zero based) must be greater than or equal to zero and less than the size of the argument list." Don't really know what this means though.

I'll keep at it until I hear back from you.

Thanks,
N
0
 
Meir RivkinFull stack Software EngineerCommented:
change the regex pattern to this:
"^(\d+)-(\d+)-[{0}]"

Open in new window

0
 
Meir RivkinFull stack Software EngineerCommented:
change line 5 to:
Dim regFileName As New Regex(string.Format(@"^(\d+){0}-(\d+){1}-[{2}]", "{5}","{8}",codes))

Open in new window

0
 
nobushiAuthor Commented:
Thanks for your effort Sedgwick but that didnt seem to work.

As using that new adjustment gave me errors:

1. Expression expected
2. 'regFileName' is not declared. It mat be inaccessible due to its protection level.

No worries though as I sorted myself out with this code:

Dim regFileName As New Regex(String.Format("^\d+\-(?<Year>(19|20)[0-9][0-9])(?<Month>0[1-9]|12|11|10)(?<Day>[12]\d|0[1-9]|3[01])\-{0}$", codes))

Open in new window

0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 10
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now