Solved

.net regular expression to parse a file

Posted on 2015-01-07
10
180 Views
Last Modified: 2015-01-08
I need to parse a configuration file to parse a file similar to this:



snmp-server community "xxxx"
vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.160.2.3 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1



I want to extract all the ports that are tagged to vlan 32.  Here is what I have so far, but it doesn't group all the ports.  I only get some of them.  
vlan 32\s\n\s*name "cheese".*\n\s*ip address.*\n(?:\n?\s*tagged (?:(?:(\d\d?-\d\d?)|(\d\d)|Trk\d|(\d)),?)*)

I am using powershell to read the file and match:

[IO.File]::ReadAllText($_.FullName) -imatch $regex

How do I create a group for each port so it will output these groups?
1-3
5-6
12-13
19
22
Trk1
0
Comment
Question by:gacus
  • 4
  • 4
  • 2
10 Comments
 
LVL 1

Author Comment

by:gacus
ID: 40536847
It looks like I only get the last groups it matches.. so in the case above, I get 12-13 and 22... How do I tell it to return all the matches and not just the last two??
0
 
LVL 23

Expert Comment

by:Michael74
ID: 40536876
Instead of using a regex why not write a parser that reads each entry into a collection of objects containing the vlan entries.

So you would have a vlan class or struct holding the data for each vlan entry and then a class which hold holds a collection of vlan objects and methods for parsing the config file.

This would provide more flexibility and provide a structure that would enable future work on this config file
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 425 total points
ID: 40536898
In terms of sustainability, I agree with Michael74; however, if this is just quick-and-dirty, then as a regex:

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^,]+

Open in new window


If you'd like a breakdown of what the pattern means let me know  = )

 Example
0
 
LVL 23

Expert Comment

by:Michael74
ID: 40537086
@kaufmed

I don't know about @gacus but I would be interested in seeing a breakdown of this pattern, I can understand most but there are parts I don't get. Great answer

Just a thought on the subject and NOT for points. Using this pattern you would need to use a foreach loop to get to all the values, depending on what you need it could be easier to get the match on the whole tagged entry and then use the split function to create an array. Doing this would also allow for the case where there is more than 1 "vlan 32" entry. Using kaufmed's response the regex for this would be

(?<=vlan +32(?:[^t]|t(?!agged))+tagged +.*?)[^\n]+

Open in new window

0
 
LVL 1

Author Comment

by:gacus
ID: 40537099
I would like to see the breakdown as well.

I am also interested in an example of a parser for this.  I am very intrigued by that idea.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 425 total points
ID: 40537108
Michael,

I used a lookbehind [ (?<=... ) ]. The lookbehind allows me to match some things without actually consuming them as a part of the normal match. Ordinarily, when you match with regex, the engine will inspect characters as it chugs along; all of the characters it inspects do not get visited again (not 100% true, but for this explanation...). The lookbehind circumvents this behavior in a sense.

The literal breakdown of the pattern:

(?<= ... )          - Positive lookbehind
vlan                - Literal text
 +                  - One or more ( + ) spaces
32                  - Literal text
(?: ... )+          - One or more ( + ) of the thing to the left, which is a non-capturing group [ (?: ... ) ]
[^t]|t(?!agged)     - Part of the non-capturing group; anything not ( ^ ) a "t" ( [^t] ) OR ( | ) a "t" that is not followed by the string "agged";
                      Negative lookahead used to check for "agged" without consuming it
tagged              - Literal text
 +                  - One or more ( + ) spaces
.*?                 - Zero or more ( * ) of any character ( . ), excluding newline; match only what is necessary/don't be greedy ( ? )
[^,]+               - One or more ( + ) of the thing to the left, which is anything not ( ^ ) a comma ( [^,] )

Open in new window


The net effect is that I still start matching at "vlan 32", but I don't actually consume that bit, so I can keep searching for that same sentinel text for each port that is found after it. Basically, for each port in the list (that I care about), I repeat the "vlan 32..." search; the dot-star at the end will intrinsically match the ports I previously found on each preceding run by the engine.

Note that this behavior only works in .NET (and I think the Boost regex library). Most regex engines do not support unbounded lookbehinds (i.e. you can't use dot-star/dot-plus in a lookbehind).
0
 
LVL 23

Assisted Solution

by:Michael74
Michael74 earned 75 total points
ID: 40537143
This is very rough but to give you an idea

First you would a data structure to hold the contents of each vlan eg

public class Vlan
{
   int num { get; set; }
   string name { get; set; } 
   string ips { get; set; }
   string tagged { get; set; }

   public Vlan () {}
   public VLan( int num, string name, string ips,  string tagged)
   {
      this.num = num;
      this.name = name;
      this.ips = ips;
      this.tagged = tagged;
   }

   public List<string> GetIPList()
   {
      //code to create and return generic list of IP addresses or NULL if empty
   }

   public List<string> GetTaggedList()
   {
      //code to create and return generic list of ports or NULL if empty
   }
   
   //Add other methods as needed to extract relevant information or hold new values
}

Open in new window


Then you would create another class which reads to the config and stores a list on entries eg
public class VlanConfig
{
   public List<Vlan> vlans {get; set;}

   public VlanConfig(string filePath)
   {
      vlans = new List<Vlan>();
      ParseConfig(filePath);
   }

   private void ParseConfig(string filePath)
   {
         using (StreamReader sr = new StreamReader(filePath))
            {
                while ((line = sr.ReadLine()) != null)
                {
                    // Determine what line of text refers to, create/update vlan object and adding them to List<vlan> collection
                }
            }
   }

   // Here you can also add methods to modify the config, find specific value, search for entries whatever you need
}

Open in new window

0
 
LVL 23

Expert Comment

by:Michael74
ID: 40537145
Awesome kaufmed, nice article too
0
 
LVL 1

Author Comment

by:gacus
ID: 40537735
@kaufmed.  That looks great, but I don't fully understand why my regular expression doesn't capture all the groups except the last ones.  Is there something I am missing about how regex works?
0
 
LVL 1

Author Comment

by:gacus
ID: 40538007
I see now that I did not provide an all covering example.  The regex expression works great for the one I did provide, but not well in practice as the configs can have various forms.

There can be infinite vlans in the config and vlan 32 can contain both untagged and tagged sections and I am looking to capture both ports.  They can also be in an order.  For example IP address can be above or below untagged or tagged or not exist at all.  untagged and tagged can also show up in either order within the vlan.

vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   untagged 1-3,5-6,1,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit



vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   tagged 1-3,5-6,12-13,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit


vlan 1
   name "DEFAULT_VLAN"
   no ip address
   no untagged 1-22,Trk1
   exit
vlan 32
   name "cheese"
   tagged 1-3,5-6,12-13,19,22,Trk1
   ip address 192.168.1.8 255.255.254.0
   untagged 1-3,5-6,1,19,22,Trk1
   exit
vlan 5
   name "xx"
   no ip address
   tagged 1-3,5-6,12-13,22,Trk1
   ip igmp
   exit
vlan 6
   name "xx"
   no ip address
   tagged 2-3,5-6,12-13,19,22,Trk1
   ip igmp
   exit

I think I should just start a new question since you did answer my original question.
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Error in page 3 45
DateTimepicker 4 33
Need a starter for ETL protocol? 4 38
Where to download and how to install sqldmo.dll 5 33
This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now