Solved

.net regular expression to parse a file

Posted on 2015-01-07
10
186 Views
Last Modified: 2015-01-08
I need to parse a configuration file to parse a file similar to this:



snmp-server community "xxxx"
vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.160.2.3 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1



I want to extract all the ports that are tagged to vlan 32.  Here is what I have so far, but it doesn't group all the ports.  I only get some of them.  
vlan 32\s\n\s*name "cheese".*\n\s*ip address.*\n(?:\n?\s*tagged (?:(?:(\d\d?-\d\d?)|(\d\d)|Trk\d|(\d)),?)*)

I am using powershell to read the file and match:

[IO.File]::ReadAllText($_.FullName) -imatch $regex

How do I create a group for each port so it will output these groups?
1-3
5-6
12-13
19
22
Trk1
0
Comment
Question by:gacus
  • 4
  • 4
  • 2
10 Comments
 
LVL 1

Author Comment

by:gacus
ID: 40536847
It looks like I only get the last groups it matches.. so in the case above, I get 12-13 and 22... How do I tell it to return all the matches and not just the last two??
0
 
LVL 23

Expert Comment

by:Michael74
ID: 40536876
Instead of using a regex why not write a parser that reads each entry into a collection of objects containing the vlan entries.

So you would have a vlan class or struct holding the data for each vlan entry and then a class which hold holds a collection of vlan objects and methods for parsing the config file.

This would provide more flexibility and provide a structure that would enable future work on this config file
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 425 total points
ID: 40536898
In terms of sustainability, I agree with Michael74; however, if this is just quick-and-dirty, then as a regex:

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^,]+

Open in new window


If you'd like a breakdown of what the pattern means let me know  = )

 Example
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 23

Expert Comment

by:Michael74
ID: 40537086
@kaufmed

I don't know about @gacus but I would be interested in seeing a breakdown of this pattern, I can understand most but there are parts I don't get. Great answer

Just a thought on the subject and NOT for points. Using this pattern you would need to use a foreach loop to get to all the values, depending on what you need it could be easier to get the match on the whole tagged entry and then use the split function to create an array. Doing this would also allow for the case where there is more than 1 "vlan 32" entry. Using kaufmed's response the regex for this would be

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^\n]+

Open in new window

0
 
LVL 1

Author Comment

by:gacus
ID: 40537099
I would like to see the breakdown as well.

I am also interested in an example of a parser for this.  I am very intrigued by that idea.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 425 total points
ID: 40537108
Michael,

I used a lookbehind [ (?<=... ) ]. The lookbehind allows me to match some things without actually consuming them as a part of the normal match. Ordinarily, when you match with regex, the engine will inspect characters as it chugs along; all of the characters it inspects do not get visited again (not 100% true, but for this explanation...). The lookbehind circumvents this behavior in a sense.

The literal breakdown of the pattern:

(?<= ... )          - Positive lookbehind
vlan                - Literal text
 +                  - One or more ( + ) spaces
32                  - Literal text
(?: ... )+          - One or more ( + ) of the thing to the left, which is a non-capturing group [ (?: ... ) ]
[^t]|t(?!agged)     - Part of the non-capturing group; anything not ( ^ ) a "t" ( [^t] ) OR ( | ) a "t" that is not followed by the string "agged";
                      Negative lookahead used to check for "agged" without consuming it
tagged              - Literal text
 +                  - One or more ( + ) spaces
.*?                 - Zero or more ( * ) of any character ( . ), excluding newline; match only what is necessary/don't be greedy ( ? )
[^,]+               - One or more ( + ) of the thing to the left, which is anything not ( ^ ) a comma ( [^,] )

Open in new window


The net effect is that I still start matching at "vlan 32", but I don't actually consume that bit, so I can keep searching for that same sentinel text for each port that is found after it. Basically, for each port in the list (that I care about), I repeat the "vlan 32..." search; the dot-star at the end will intrinsically match the ports I previously found on each preceding run by the engine.

Note that this behavior only works in .NET (and I think the Boost regex library). Most regex engines do not support unbounded lookbehinds (i.e. you can't use dot-star/dot-plus in a lookbehind).
0
 
LVL 23

Assisted Solution

by:Michael74
Michael74 earned 75 total points
ID: 40537143
This is very rough but to give you an idea

First you would a data structure to hold the contents of each vlan eg

public class Vlan
{
   int num { get; set; }
   string name { get; set; } 
   string ips { get; set; }
   string tagged { get; set; }

   public Vlan () {}
   public VLan( int num, string name, string ips,  string tagged)
   {
      this.num = num;
      this.name = name;
      this.ips = ips;
      this.tagged = tagged;
   }

   public List<string> GetIPList()
   {
      //code to create and return generic list of IP addresses or NULL if empty
   }

   public List<string> GetTaggedList()
   {
      //code to create and return generic list of ports or NULL if empty
   }
   
   //Add other methods as needed to extract relevant information or hold new values
}

Open in new window


Then you would create another class which reads to the config and stores a list on entries eg
public class VlanConfig
{
   public List<Vlan> vlans {get; set;}

   public VlanConfig(string filePath)
   {
      vlans = new List<Vlan>();
      ParseConfig(filePath);
   }

   private void ParseConfig(string filePath)
   {
         using (StreamReader sr = new StreamReader(filePath))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    // Determine what line of text refers to, create/update vlan object and adding them to List<vlan> collection
                }
            }
   }

   // Here you can also add methods to modify the config, find specific value, search for entries whatever you need
}

Open in new window

0
 
LVL 23

Expert Comment

by:Michael74
ID: 40537145
Awesome kaufmed, nice article too
0
 
LVL 1

Author Comment

by:gacus
ID: 40537735
@kaufmed.  That looks great, but I don't fully understand why my regular expression doesn't capture all the groups except the last ones.  Is there something I am missing about how regex works?
0
 
LVL 1

Author Comment

by:gacus
ID: 40538007
I see now that I did not provide an all covering example.  The regex expression works great for the one I did provide, but not well in practice as the configs can have various forms.

There can be infinite vlans in the config and vlan 32 can contain both untagged and tagged sections and I am looking to capture both ports.  They can also be in an order.  For example IP address can be above or below untagged or tagged or not exist at all.  untagged and tagged can also show up in either order within the vlan.

vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   untagged 1-3,5-6,1,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit



vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit


vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   tagged 1-3,5-6,12-13,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit

I think I should just start a new question since you did answer my original question.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Sort GridView by ID Descending 1 17
RegEx with optional part 4 42
bound data table problem 2 33
Import MySQL data into MS Access using VB.Net interface 5 29
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question