Solved

.net regular expression to parse a file

Posted on 2015-01-07
10
189 Views
Last Modified: 2015-01-08
I need to parse a configuration file to parse a file similar to this:



snmp-server community "xxxx"
vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.160.2.3 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1



I want to extract all the ports that are tagged to vlan 32.  Here is what I have so far, but it doesn't group all the ports.  I only get some of them.  
vlan 32\s\n\s*name "cheese".*\n\s*ip address.*\n(?:\n?\s*tagged (?:(?:(\d\d?-\d\d?)|(\d\d)|Trk\d|(\d)),?)*)

I am using powershell to read the file and match:

[IO.File]::ReadAllText($_.FullName) -imatch $regex

How do I create a group for each port so it will output these groups?
1-3
5-6
12-13
19
22
Trk1
0
Comment
Question by:gacus
  • 4
  • 4
  • 2
10 Comments
 
LVL 1

Author Comment

by:gacus
ID: 40536847
It looks like I only get the last groups it matches.. so in the case above, I get 12-13 and 22... How do I tell it to return all the matches and not just the last two??
0
 
LVL 23

Expert Comment

by:Michael74
ID: 40536876
Instead of using a regex why not write a parser that reads each entry into a collection of objects containing the vlan entries.

So you would have a vlan class or struct holding the data for each vlan entry and then a class which hold holds a collection of vlan objects and methods for parsing the config file.

This would provide more flexibility and provide a structure that would enable future work on this config file
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 425 total points
ID: 40536898
In terms of sustainability, I agree with Michael74; however, if this is just quick-and-dirty, then as a regex:

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^,]+

Open in new window


If you'd like a breakdown of what the pattern means let me know  = )

 Example
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 
LVL 23

Expert Comment

by:Michael74
ID: 40537086
@kaufmed

I don't know about @gacus but I would be interested in seeing a breakdown of this pattern, I can understand most but there are parts I don't get. Great answer

Just a thought on the subject and NOT for points. Using this pattern you would need to use a foreach loop to get to all the values, depending on what you need it could be easier to get the match on the whole tagged entry and then use the split function to create an array. Doing this would also allow for the case where there is more than 1 "vlan 32" entry. Using kaufmed's response the regex for this would be

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^\n]+

Open in new window

0
 
LVL 1

Author Comment

by:gacus
ID: 40537099
I would like to see the breakdown as well.

I am also interested in an example of a parser for this.  I am very intrigued by that idea.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 425 total points
ID: 40537108
Michael,

I used a lookbehind [ (?<=... ) ]. The lookbehind allows me to match some things without actually consuming them as a part of the normal match. Ordinarily, when you match with regex, the engine will inspect characters as it chugs along; all of the characters it inspects do not get visited again (not 100% true, but for this explanation...). The lookbehind circumvents this behavior in a sense.

The literal breakdown of the pattern:

(?<= ... )          - Positive lookbehind
vlan                - Literal text
 +                  - One or more ( + ) spaces
32                  - Literal text
(?: ... )+          - One or more ( + ) of the thing to the left, which is a non-capturing group [ (?: ... ) ]
[^t]|t(?!agged)     - Part of the non-capturing group; anything not ( ^ ) a "t" ( [^t] ) OR ( | ) a "t" that is not followed by the string "agged";
                      Negative lookahead used to check for "agged" without consuming it
tagged              - Literal text
 +                  - One or more ( + ) spaces
.*?                 - Zero or more ( * ) of any character ( . ), excluding newline; match only what is necessary/don't be greedy ( ? )
[^,]+               - One or more ( + ) of the thing to the left, which is anything not ( ^ ) a comma ( [^,] )

Open in new window


The net effect is that I still start matching at "vlan 32", but I don't actually consume that bit, so I can keep searching for that same sentinel text for each port that is found after it. Basically, for each port in the list (that I care about), I repeat the "vlan 32..." search; the dot-star at the end will intrinsically match the ports I previously found on each preceding run by the engine.

Note that this behavior only works in .NET (and I think the Boost regex library). Most regex engines do not support unbounded lookbehinds (i.e. you can't use dot-star/dot-plus in a lookbehind).
0
 
LVL 23

Assisted Solution

by:Michael74
Michael74 earned 75 total points
ID: 40537143
This is very rough but to give you an idea

First you would a data structure to hold the contents of each vlan eg

public class Vlan
{
   int num { get; set; }
   string name { get; set; } 
   string ips { get; set; }
   string tagged { get; set; }

   public Vlan () {}
   public VLan( int num, string name, string ips,  string tagged)
   {
      this.num = num;
      this.name = name;
      this.ips = ips;
      this.tagged = tagged;
   }

   public List<string> GetIPList()
   {
      //code to create and return generic list of IP addresses or NULL if empty
   }

   public List<string> GetTaggedList()
   {
      //code to create and return generic list of ports or NULL if empty
   }
   
   //Add other methods as needed to extract relevant information or hold new values
}

Open in new window


Then you would create another class which reads to the config and stores a list on entries eg
public class VlanConfig
{
   public List<Vlan> vlans {get; set;}

   public VlanConfig(string filePath)
   {
      vlans = new List<Vlan>();
      ParseConfig(filePath);
   }

   private void ParseConfig(string filePath)
   {
         using (StreamReader sr = new StreamReader(filePath))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    // Determine what line of text refers to, create/update vlan object and adding them to List<vlan> collection
                }
            }
   }

   // Here you can also add methods to modify the config, find specific value, search for entries whatever you need
}

Open in new window

0
 
LVL 23

Expert Comment

by:Michael74
ID: 40537145
Awesome kaufmed, nice article too
0
 
LVL 1

Author Comment

by:gacus
ID: 40537735
@kaufmed.  That looks great, but I don't fully understand why my regular expression doesn't capture all the groups except the last ones.  Is there something I am missing about how regex works?
0
 
LVL 1

Author Comment

by:gacus
ID: 40538007
I see now that I did not provide an all covering example.  The regex expression works great for the one I did provide, but not well in practice as the configs can have various forms.

There can be infinite vlans in the config and vlan 32 can contain both untagged and tagged sections and I am looking to capture both ports.  They can also be in an order.  For example IP address can be above or below untagged or tagged or not exist at all.  untagged and tagged can also show up in either order within the vlan.

vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   untagged 1-3,5-6,1,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit



vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit


vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   tagged 1-3,5-6,12-13,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit

I think I should just start a new question since you did answer my original question.
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
c# DateTime Format validation 4 70
Selenium and Xpath 4 35
regex to restrict strings ending with something 2 16
Check only one toolstripmenu item 12 31
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question