Solved

.net regular expression to parse a file

Posted on 2015-01-07
10
174 Views
Last Modified: 2015-01-08
I need to parse a configuration file to parse a file similar to this:



snmp-server community "xxxx"
vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.160.2.3 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1



I want to extract all the ports that are tagged to vlan 32.  Here is what I have so far, but it doesn't group all the ports.  I only get some of them.  
vlan 32\s\n\s*name "cheese".*\n\s*ip address.*\n(?:\n?\s*tagged (?:(?:(\d\d?-\d\d?)|(\d\d)|Trk\d|(\d)),?)*)

I am using powershell to read the file and match:

[IO.File]::ReadAllText($_.FullName) -imatch $regex

How do I create a group for each port so it will output these groups?
1-3
5-6
12-13
19
22
Trk1
0
Comment
Question by:gacus
  • 4
  • 4
  • 2
10 Comments
 
LVL 1

Author Comment

by:gacus
Comment Utility
It looks like I only get the last groups it matches.. so in the case above, I get 12-13 and 22... How do I tell it to return all the matches and not just the last two??
0
 
LVL 23

Expert Comment

by:Michael74
Comment Utility
Instead of using a regex why not write a parser that reads each entry into a collection of objects containing the vlan entries.

So you would have a vlan class or struct holding the data for each vlan entry and then a class which hold holds a collection of vlan objects and methods for parsing the config file.

This would provide more flexibility and provide a structure that would enable future work on this config file
0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 425 total points
Comment Utility
In terms of sustainability, I agree with Michael74; however, if this is just quick-and-dirty, then as a regex:

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^,]+

Open in new window


If you'd like a breakdown of what the pattern means let me know  = )

 Example
0
 
LVL 23

Expert Comment

by:Michael74
Comment Utility
@kaufmed

I don't know about @gacus but I would be interested in seeing a breakdown of this pattern, I can understand most but there are parts I don't get. Great answer

Just a thought on the subject and NOT for points. Using this pattern you would need to use a foreach loop to get to all the values, depending on what you need it could be easier to get the match on the whole tagged entry and then use the split function to create an array. Doing this would also allow for the case where there is more than 1 "vlan 32" entry. Using kaufmed's response the regex for this would be

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^\n]+

Open in new window

0
 
LVL 1

Author Comment

by:gacus
Comment Utility
I would like to see the breakdown as well.

I am also interested in an example of a parser for this.  I am very intrigued by that idea.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 425 total points
Comment Utility
Michael,

I used a lookbehind [ (?<=... ) ]. The lookbehind allows me to match some things without actually consuming them as a part of the normal match. Ordinarily, when you match with regex, the engine will inspect characters as it chugs along; all of the characters it inspects do not get visited again (not 100% true, but for this explanation...). The lookbehind circumvents this behavior in a sense.

The literal breakdown of the pattern:

(?<= ... )          - Positive lookbehind
vlan                - Literal text
 +                  - One or more ( + ) spaces
32                  - Literal text
(?: ... )+          - One or more ( + ) of the thing to the left, which is a non-capturing group [ (?: ... ) ]
[^t]|t(?!agged)     - Part of the non-capturing group; anything not ( ^ ) a "t" ( [^t] ) OR ( | ) a "t" that is not followed by the string "agged";
                      Negative lookahead used to check for "agged" without consuming it
tagged              - Literal text
 +                  - One or more ( + ) spaces
.*?                 - Zero or more ( * ) of any character ( . ), excluding newline; match only what is necessary/don't be greedy ( ? )
[^,]+               - One or more ( + ) of the thing to the left, which is anything not ( ^ ) a comma ( [^,] )

Open in new window


The net effect is that I still start matching at "vlan 32", but I don't actually consume that bit, so I can keep searching for that same sentinel text for each port that is found after it. Basically, for each port in the list (that I care about), I repeat the "vlan 32..." search; the dot-star at the end will intrinsically match the ports I previously found on each preceding run by the engine.

Note that this behavior only works in .NET (and I think the Boost regex library). Most regex engines do not support unbounded lookbehinds (i.e. you can't use dot-star/dot-plus in a lookbehind).
0
 
LVL 23

Assisted Solution

by:Michael74
Michael74 earned 75 total points
Comment Utility
This is very rough but to give you an idea

First you would a data structure to hold the contents of each vlan eg

public class Vlan
{
   int num { get; set; }
   string name { get; set; } 
   string ips { get; set; }
   string tagged { get; set; }

   public Vlan () {}
   public VLan( int num, string name, string ips,  string tagged)
   {
      this.num = num;
      this.name = name;
      this.ips = ips;
      this.tagged = tagged;
   }

   public List<string> GetIPList()
   {
      //code to create and return generic list of IP addresses or NULL if empty
   }

   public List<string> GetTaggedList()
   {
      //code to create and return generic list of ports or NULL if empty
   }
   
   //Add other methods as needed to extract relevant information or hold new values
}

Open in new window


Then you would create another class which reads to the config and stores a list on entries eg
public class VlanConfig
{
   public List<Vlan> vlans {get; set;}

   public VlanConfig(string filePath)
   {
      vlans = new List<Vlan>();
      ParseConfig(filePath);
   }

   private void ParseConfig(string filePath)
   {
         using (StreamReader sr = new StreamReader(filePath))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    // Determine what line of text refers to, create/update vlan object and adding them to List<vlan> collection
                }
            }
   }

   // Here you can also add methods to modify the config, find specific value, search for entries whatever you need
}

Open in new window

0
 
LVL 23

Expert Comment

by:Michael74
Comment Utility
Awesome kaufmed, nice article too
0
 
LVL 1

Author Comment

by:gacus
Comment Utility
@kaufmed.  That looks great, but I don't fully understand why my regular expression doesn't capture all the groups except the last ones.  Is there something I am missing about how regex works?
0
 
LVL 1

Author Comment

by:gacus
Comment Utility
I see now that I did not provide an all covering example.  The regex expression works great for the one I did provide, but not well in practice as the configs can have various forms.

There can be infinite vlans in the config and vlan 32 can contain both untagged and tagged sections and I am looking to capture both ports.  They can also be in an order.  For example IP address can be above or below untagged or tagged or not exist at all.  untagged and tagged can also show up in either order within the vlan.

vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   untagged 1-3,5-6,1,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit



vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit


vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   tagged 1-3,5-6,12-13,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit

I think I should just start a new question since you did answer my original question.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Suggested Solutions

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now