We help IT Professionals succeed at work.
Get Started

Text Parser for Hierarchical Structured Data via RegEx (not necessary) and VB.NET

774 Views
Last Modified: 2012-05-11

Open in new window

I am trying to try to use VB.NET to parse text data that is structured hierarchically like a classic "Tree".

The obvious solution for me was writing a lot of code to traverse and parse the text data character by character and build the tree with its chunks of data nodes slowly one byte at a time.

Well, I hope that there is a better way than that. I was thinking of Regular Expressions. I am familar with them, but far away from considering myself an Expert. I am also new to the VB.NET flavor of this and don't know what it can and cannot do. I hope that somebody here can help me to find an elegant and efficient solution.

Here are 3 Examples of Text and it's structure. I am using the characters "{" as Begin of Block and the "}" as End of Block markers, but that could also be other characters or even "Keywords".


Example 1
{
"Block 1"
Any Characters, except for block marker,
Line Breaks, White Spaces, Letters, Numbers,
+-[]:;'?/\-()*&%$#@!.,<>A-Za-z0-9 TAB LF CR
}

--------------------------------

Example 2
{
"Block 1"
}
{
"Block 2"
}

Okay, those were the easy ones, whichI don't have a problem with solving.Here is now the problematic one.

--------------------------------
Example 3
{
"Block 1"
 .. (maybe) Data ...
  {
  "Block 1.1"
  .. (maybe) Data ...         
    {
    "Block 1.1.1"
    .. Data ...
    }
    .. (maybe) Data (for Block 1.1)
    {
    "Block 1.1.N"
    .. Data ...
    }      
   .. (maybe) Data ...
  }
 .. (maybe) Data..
  {
  "Block 1.N"
   .. Data ..
  }
   .. (maybe) Data ...
}
{
"Block 2"
.. Data..
}

As you can see, there can be 0-N levels of nesting within each block regardless of the level where the block is located. A typical TREE hierarchy. And something like a tree I would like to
get back. A reversed tree would be even better, starting with the deepest levels of blocks first, working my way up to the top level, because that is how I will have to process the data eventually anyway.

I am not sure, if this can be done using Regular Expressions, but I thought it might and because I already use RegEx to parse the data within those blocks anyway, but it does not have to be a RegEx solution. The only other thing that I will require is that the Block Markers migth be only single character, but they could also be multi-character (like keywords e.G.  "While"  ... "EndWhile" etc.).

 Data Structure Graphical Illustration
Comment
Watch Question
Senior Engineer
CERTIFIED EXPERT
Top Expert 2010
Commented:
This problem has been solved!
Unlock 2 Answers and 7 Comments.
See Answers
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE