troubleshooting Question

Text Parser for Hierarchical Structured Data via RegEx (not necessary) and VB.NET

Avatar of Cumbrowski
CumbrowskiFlag for United States of America asked on
ProgrammingVisual Basic.NETRegular Expressions
7 Comments2 Solutions776 ViewsLast Modified:

Open in new window

I am trying to try to use VB.NET to parse text data that is structured hierarchically like a classic "Tree".

The obvious solution for me was writing a lot of code to traverse and parse the text data character by character and build the tree with its chunks of data nodes slowly one byte at a time.

Well, I hope that there is a better way than that. I was thinking of Regular Expressions. I am familar with them, but far away from considering myself an Expert. I am also new to the VB.NET flavor of this and don't know what it can and cannot do. I hope that somebody here can help me to find an elegant and efficient solution.

Here are 3 Examples of Text and it's structure. I am using the characters "{" as Begin of Block and the "}" as End of Block markers, but that could also be other characters or even "Keywords".


Example 1
{
"Block 1"
Any Characters, except for block marker,
Line Breaks, White Spaces, Letters, Numbers,
+-[]:;'?/\-()*&%$#@!.,<>A-Za-z0-9 TAB LF CR
}

--------------------------------

Example 2
{
"Block 1"
}
{
"Block 2"
}

Okay, those were the easy ones, whichI don't have a problem with solving.Here is now the problematic one.

--------------------------------
Example 3
{
"Block 1"
 .. (maybe) Data ...
  {
  "Block 1.1"
  .. (maybe) Data ...         
    {
    "Block 1.1.1"
    .. Data ...
    }
    .. (maybe) Data (for Block 1.1)
    {
    "Block 1.1.N"
    .. Data ...
    }      
   .. (maybe) Data ...
  }
 .. (maybe) Data..
  {
  "Block 1.N"
   .. Data ..
  }
   .. (maybe) Data ...
}
{
"Block 2"
.. Data..
}

As you can see, there can be 0-N levels of nesting within each block regardless of the level where the block is located. A typical TREE hierarchy. And something like a tree I would like to
get back. A reversed tree would be even better, starting with the deepest levels of blocks first, working my way up to the top level, because that is how I will have to process the data eventually anyway.

I am not sure, if this can be done using Regular Expressions, but I thought it might and because I already use RegEx to parse the data within those blocks anyway, but it does not have to be a RegEx solution. The only other thing that I will require is that the Block Markers migth be only single character, but they could also be multi-character (like keywords e.G.  "While"  ... "EndWhile" etc.).

 Data Structure Graphical Illustration
ASKER CERTIFIED SOLUTION
Todd Gerbert
Senior Engineer

Our community of experts have been thoroughly vetted for their expertise and industry experience.

Top Expert 2010

The Distinguished Expert awards are presented to the top veteran and rookie experts to earn the most points in the top 50 topics.

Join our community to see this answer!
Unlock 2 Answers and 7 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 2 Answers and 7 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros