[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 200
  • Last Modified:

.net regular expression to parse a file

I need to parse a configuration file to parse a file similar to this:



snmp-server community "xxxx"
vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.160.2.3 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1



I want to extract all the ports that are tagged to vlan 32.  Here is what I have so far, but it doesn't group all the ports.  I only get some of them.  
vlan 32\s\n\s*name "cheese".*\n\s*ip address.*\n(?:\n?\s*tagged (?:(?:(\d\d?-\d\d?)|(\d\d)|Trk\d|(\d)),?)*)

I am using powershell to read the file and match:

[IO.File]::ReadAllText($_.FullName) -imatch $regex

How do I create a group for each port so it will output these groups?
1-3
5-6
12-13
19
22
Trk1
0
gacus
Asked:
gacus
  • 4
  • 4
  • 2
3 Solutions
 
gacusAuthor Commented:
It looks like I only get the last groups it matches.. so in the case above, I get 12-13 and 22... How do I tell it to return all the matches and not just the last two??
0
 
Michael FowlerSolutions ConsultantCommented:
Instead of using a regex why not write a parser that reads each entry into a collection of objects containing the vlan entries.

So you would have a vlan class or struct holding the data for each vlan entry and then a class which hold holds a collection of vlan objects and methods for parsing the config file.

This would provide more flexibility and provide a structure that would enable future work on this config file
0
 
käµfm³d 👽Commented:
In terms of sustainability, I agree with Michael74; however, if this is just quick-and-dirty, then as a regex:

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^,]+

Open in new window


If you'd like a breakdown of what the pattern means let me know  = )

 Example
0
Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

 
Michael FowlerSolutions ConsultantCommented:
@kaufmed

I don't know about @gacus but I would be interested in seeing a breakdown of this pattern, I can understand most but there are parts I don't get. Great answer

Just a thought on the subject and NOT for points. Using this pattern you would need to use a foreach loop to get to all the values, depending on what you need it could be easier to get the match on the whole tagged entry and then use the split function to create an array. Doing this would also allow for the case where there is more than 1 "vlan 32" entry. Using kaufmed's response the regex for this would be

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^\n]+

Open in new window

0
 
gacusAuthor Commented:
I would like to see the breakdown as well.

I am also interested in an example of a parser for this.  I am very intrigued by that idea.
0
 
käµfm³d 👽Commented:
Michael,

I used a lookbehind [ (?<=... ) ]. The lookbehind allows me to match some things without actually consuming them as a part of the normal match. Ordinarily, when you match with regex, the engine will inspect characters as it chugs along; all of the characters it inspects do not get visited again (not 100% true, but for this explanation...). The lookbehind circumvents this behavior in a sense.

The literal breakdown of the pattern:

(?<= ... )          - Positive lookbehind
vlan                - Literal text
 +                  - One or more ( + ) spaces
32                  - Literal text
(?: ... )+          - One or more ( + ) of the thing to the left, which is a non-capturing group [ (?: ... ) ]
[^t]|t(?!agged)     - Part of the non-capturing group; anything not ( ^ ) a "t" ( [^t] ) OR ( | ) a "t" that is not followed by the string "agged";
                      Negative lookahead used to check for "agged" without consuming it
tagged              - Literal text
 +                  - One or more ( + ) spaces
.*?                 - Zero or more ( * ) of any character ( . ), excluding newline; match only what is necessary/don't be greedy ( ? )
[^,]+               - One or more ( + ) of the thing to the left, which is anything not ( ^ ) a comma ( [^,] )

Open in new window


The net effect is that I still start matching at "vlan 32", but I don't actually consume that bit, so I can keep searching for that same sentinel text for each port that is found after it. Basically, for each port in the list (that I care about), I repeat the "vlan 32..." search; the dot-star at the end will intrinsically match the ports I previously found on each preceding run by the engine.

Note that this behavior only works in .NET (and I think the Boost regex library). Most regex engines do not support unbounded lookbehinds (i.e. you can't use dot-star/dot-plus in a lookbehind).
0
 
Michael FowlerSolutions ConsultantCommented:
This is very rough but to give you an idea

First you would a data structure to hold the contents of each vlan eg

public class Vlan
{
   int num { get; set; }
   string name { get; set; } 
   string ips { get; set; }
   string tagged { get; set; }

   public Vlan () {}
   public VLan( int num, string name, string ips,  string tagged)
   {
      this.num = num;
      this.name = name;
      this.ips = ips;
      this.tagged = tagged;
   }

   public List<string> GetIPList()
   {
      //code to create and return generic list of IP addresses or NULL if empty
   }

   public List<string> GetTaggedList()
   {
      //code to create and return generic list of ports or NULL if empty
   }
   
   //Add other methods as needed to extract relevant information or hold new values
}

Open in new window


Then you would create another class which reads to the config and stores a list on entries eg
public class VlanConfig
{
   public List<Vlan> vlans {get; set;}

   public VlanConfig(string filePath)
   {
      vlans = new List<Vlan>();
      ParseConfig(filePath);
   }

   private void ParseConfig(string filePath)
   {
         using (StreamReader sr = new StreamReader(filePath))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    // Determine what line of text refers to, create/update vlan object and adding them to List<vlan> collection
                }
            }
   }

   // Here you can also add methods to modify the config, find specific value, search for entries whatever you need
}

Open in new window

0
 
Michael FowlerSolutions ConsultantCommented:
Awesome kaufmed, nice article too
0
 
gacusAuthor Commented:
@kaufmed.  That looks great, but I don't fully understand why my regular expression doesn't capture all the groups except the last ones.  Is there something I am missing about how regex works?
0
 
gacusAuthor Commented:
I see now that I did not provide an all covering example.  The regex expression works great for the one I did provide, but not well in practice as the configs can have various forms.

There can be infinite vlans in the config and vlan 32 can contain both untagged and tagged sections and I am looking to capture both ports.  They can also be in an order.  For example IP address can be above or below untagged or tagged or not exist at all.  untagged and tagged can also show up in either order within the vlan.

vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   untagged 1-3,5-6,1,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit



vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit


vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   tagged 1-3,5-6,12-13,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit

I think I should just start a new question since you did answer my original question.
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

  • 4
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now