Question

Parse string & return nested list

Asked by: HonorGod

I'm looking for an elegant solution for string parsing problem.
The string contains bracketed name/value pairs.

I want an elegant solution.
- Don't expect to get points for a quick and dirty solution
  Especially one that "almost" works
I don't want, and won't accept a LEX/YACC solution.
I must be able to implement it in Python 2.1 (so, pyParsing won't work).
I will accept a complete pseudo-code algorithm that I can implement

Notes:
- Name & value are separated by a single space
- Value may be empty (e.g., "[name ]")
- Values may contain other bracketed name/value pairs.
- Nested result may be arbitrarily deep
  - solution must be able to handle nesting at least 7 deep

Sample input:

[[XXXXXXXXXXXXXXXXXXXX XXXX] [XXXXXXXXXXXXXXXXXX XXXX] [XXXXXXXXXXXXXXXXXXX XXXX] [XXXXXXXXXXXXXXXXXXXXXX XXXXXXXX] [XXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXX] [XXXXXXXXXXXX 999] [XXXXXXXXXXXXXXXXXXXXXX XXXXX] [XXXXXXXXXXX9XXXXXXXX XXXXX] [XXXXXXXXXXXXXXXXXX XXXXX] [XXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXX] [XXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXX] [XXXXXXXXXX [[[XXXX XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXX] [XXXXX ] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXXXX.XXX_XXXXXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXXXX.XXX_XXXXXXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXXXX.XXX_XXXXXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXX] [XXXXX XXXXXX.XXXXXXX] ][[XXXX XXX.XXX.XXXXX.XXXXXXXX.XXXX.XXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXXXXXXXXX|XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXX9XXXXXXX|XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XXXXX.XXXXXXXX.XXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XXXXX.XXXXXXXX.XXXXX.XXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XXXXX.XXXXXXXX.XXXXX.XXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XXXXX.XXXXXXXX.XXXXX.XXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXX.XXXXXXXXX9XXXXXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXXXXXXX.XXXXXXX] [XXXXX XXXXX] ][[XXXX XXX.XXX.XXXXXXXXX.XXXXXXXX.XXXXXXXXXXXXX] [XXXXX XXX.XXX.XX.XXXXXXXX.XXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXX] [XXXXX XXX.XXX.XXXXXXXX.XXXX.*:XXXXX.XXXXXXXX.XXXX.XXXXXXXX.XXXXXXXXXXX:XXXXX.XXXXXXXX.XXXX.XXXXXXXX.XXXXXXXXXXXXXX] ][[XXXX XXX.XXX.XXXXXXXXX.XXXXXXXX.XXX.XXXXXXXXXXXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXXXXXXXX.XXXXXXXX.XXX.XXXXXXXXX_XXXX] [XXXXX XXXX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXXXXXXX] [XXXXX XX=XXXXXX99.XXX.XXXXXXX.XXX.XXX,XX=XXXXXX99XXXX99XXXX,XX=XXXXXX99XXXX99,X=XXX,X=XX] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXX] [XXXXX [XX=XXXXXX99.XXX.XXXXXXX.XXX.XXX,XX=XXXX XXXXXXXXXXX,XX=XXXXXX99XXXX99XXXX,XX=XXXXXX99XXXX99,X=XXX,X=XX]] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXX] [XXXXX 9999] ][[XXXX XXX.XXX.XXX.XXXXXXXXXXXXXXXXXX] [XXXXX 999] ]]] ]

                                  
1:

Select allOpen in new window

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2009-08-16 at 18:21:02ID24657015
Topics

Python Scripting Language

,

Algorithms

,

Parsers

Participating Experts
2
Points
500
Comments
19

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. par file
    Waht is a file with .par used for and what does this extension mean? Can I list the parameters for import in a .par file and use this file in imp, like imp file=filename.par Please suggest.
  2. par io err laserjet 4 plus
    Every time I start my computer my old laserjet 4 plus comes up with an error message on the control panel "err par io" the work around is to cycle power on the printer after the computer is loaded. This problem started when I got a new computer. I have tried chan...
  3. PAR Protocol
    How has the PAR Protocol been extended for use in the Internet?
  4. Converting from PERL to Executable using PAR
    I have a PERL script that begins with the following: use Win32::OLE qw(in with); use Win32::OLE::Const 'Microsoft Excel'; When I convert my PERL script to an executable using pp -o file.exe file.pl, there is no problem. But, when I run file.exe I receive the following pop-...
  5. What does the /par in a .bat file mean?
    I am learning to do batch files, and came across this /par in my .bat file. I dont know really how it got there, as I was just manually creating the batch. I did just install easy batch creator, but I didnt use that prog. Anyway here is the code that I have in my batch, but ...

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: peprPosted on 2009-08-17 at 01:53:36ID: 25112461

There is a sequence like (spaces replaced by _ to visualize them)

[[XXXX_XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX]_[XXXXX XXXX]_]

How this should be interpreted? To simplify further, the X'es shortened to A, B, C, D respectively:

[[A_B]_[C_D]_]

If the name-value are paired with possibly missing value, what is the name for the last empty value. In other words, is it only a syntax feature and the above should be considered the same as

[[A_B]_[C_D]]

I.e. what to do with the space in front of the last ] ?

 

by: peprPosted on 2009-08-17 at 01:55:32ID: 25112472

Should also be the simple name-value pairs be represented as lists with two elements?

 

by: HonorGodPosted on 2009-08-17 at 03:58:42ID: 25113066

Unfortunately, not everything is a name value pair...
Sorry for not being more clear.

I see that you put underscores where the blanks were.

Q: How should this be interpreted?
     [[XXXX_XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX]_[XXXXX XXXX]_]

A: I would like to have a list returned with 2 name pair elements,
    the first would contain:
     [ 'XXXX', 'XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX' ]
    and the second would contain:
     ['XXXXX', 'XXXX']

So, that specific string should return the following list:

[  [ 'XXXX', XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX ],  [ 'XXXXX', 'XXXX' ] ]

The think with which I have been having real trouble is when a value contains nested brackets...

For example (with single character "items"):

'[ A [ [ [ B C ] [ D E ] [ [ F G ] [ H I ] ] ] [ [ J K ] [ L M ] [ [ N O ] [ P Q ] ] ] ] ]'

 

by: HonorGodPosted on 2009-08-17 at 04:00:19ID: 25113071

Interesting problem... eh?  ;-)

Thanks for taking a look at this.  I have been trying to force fit a few different things over the weekend, without success.

I have something that I thought of during my sleep last night that I want to try.

I'll let you know.

 

by: t0t0Posted on 2009-08-17 at 05:48:00ID: 25113748

Okay, to make things clearer.... what would be the output for these (given in the format you want the output to be in):

    1) [adam amanda]

    2) [[adam amanda][brian brenda]]

    3) [[adam amanda][brian brenda][craig ]]

    4) [adam ]

 

by: t0t0Posted on 2009-08-17 at 05:50:03ID: 25113774

and these:

    5) [adam amanda][brian brenda]

    6) [adam [brian [craig cassy]]]

 

by: t0t0Posted on 2009-08-17 at 05:50:38ID: 25113778

or am i completely off the track there?

 

by: HonorGodPosted on 2009-08-17 at 06:13:12ID: 25113982

Here is what I would like to see:

1) [adam amanda]
    ['adam', 'amanda']

2)  [[adam amanda][brian brenda]]
     [['adam', 'amanda'], ['brian', 'brenda'] ]

3) [[adam amanda][brian brenda][craig ]]
    [ ['adam', 'amanda'], ['brian', 'brenda'], ['craig', ''] ]

4) [adam ]
    ['adam', '' ]

5) [adam amanda][brian brenda]
    This would not occur.  The data is guaranteed to be well formed.
    go ahead and throw an exception

6) [adam [brian [craig cassy]]]
    ['adam', [ 'brian',  ['craig', 'cassy'] ] ]

>> or am i completely off the track there?

  Not at all.  You seem to understand my challenge... ;-)

 

by: peprPosted on 2009-08-17 at 06:16:23ID: 25114013

OK. Let's try the snippet below first to clarify how it should work. The code is based on finite automaton (processing single characters and changing the status to capture the situation). However, the finite automaton apparatus (theoretically of the same power as regular expressions) is not capable to capture nested pair structures. This way it could be done recursively (but not that easy or efficient because of the stream of processed characters) or using a stack that captures the not-finished levels. I have chosen the second approach. The resulting list can be accessed as stack[0]. The code should be cleaned, yet. It also depends on your clarification.

# Get the string from somewhere (here stored in the file).
f = open('a.txt')
s = f.read()
f.close()
 
# Introduce a stack for avoiding the need for a recursive algorithm.
# The stack is implemented using the built in list.
stack = []  # init as empty list
 
# Initialize status of the automaton.
status = 0
 
lst = None         # init -- current list
lstStr = None      # init -- collected string
 
# Process all characters in the loop.
for c in s:
 
    if status == 0:     # not collecting a string
    
        if c == '[':        # start the new list
            lst = []            # list of the next level
            stack.append(lst)   # push to the stack
            # do not change the status
            
        elif c == ' ':      
            pass            # ignore the extra separator
            
        elif c == ']':
            # If the list contains a single element, then append
            # the missing empty value.
            if len(lst) == 1:
                lst.append('')
            
            # Get the just-built list of the level (lst) and append it 
            # to the upper level list.
            stack.pop()          # remove the deeper-level list...
            if len(stack) > 0:
                stack[-1].append(lst) # append to the upper level
                lst = stack[-1]       # the upper level is the current list from now on
                status = 0            # keep the status
            else:
                stack.append(lst)     # should finish now
                status = 888          # final state
            
        else:
            assert lstStr is None
            lstStr = [c]    # first character to be collected
            status = 1      # change the status
 
            
    elif status == 1:   # collecting a string            
        
        if c == ' ':        # separate the just-collected
            s = ''.join(lstStr)  # collected string
            lstStr = None        # init
            lst.append(s)        # add to the current list
            status = 0           # change the status
 
        elif c == ']':
            s = ''.join(lstStr)  # collected string
            lstStr = None        # init
            lst.append(s)        # add to the current list
            
            # Get the just-built list of the level (lst) and append it 
            # to the upper level list.
            stack.pop()          # remove the deeper-level list...
            if len(stack) > 0:
                stack[-1].append(lst) # append to the upper level
                lst = stack[-1]       # the upper level is the current list from now on
                status = 0            # change the status
            else:
                stack.append(lst)     # should finish now
                status = 888          # final state
                
        else:
            lstStr.append(c)    # collect the next character
 
    
    elif status == 888:   # stop it
        break
    
    else:
        print 'Unknown status:', status
        print 'stack:', repr(stack)
        print 'lst:', repr(lst)
        print 'lstStr:', repr(lstStr)
    
assert len(stack) == 1
print repr(stack[0])

                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
83:
84:
85:
86:
87:
88:
89:
90:

Select allOpen in new window

 

by: HonorGodPosted on 2009-08-17 at 06:17:19ID: 25114026

And here's an ugly one...

7) [adam [[[brian brenda][craig christina]][[doug debbie][eric ethal]]]]
    ['adam', [ [ ['brian', 'brenda'],['craig', 'christina']],[['doug', 'debbie'],['eric', 'ethal']]]]

 

by: HonorGodPosted on 2009-08-17 at 06:19:01ID: 25114052

Hopefully this make sense.

Thank you for your continued interest.

 

by: HonorGodPosted on 2009-08-17 at 06:20:39ID: 25114063

Studying http:#a25114013 ...

 

by: HonorGodPosted on 2009-08-17 at 06:35:12ID: 31616375

Outstanding.  Thank you ever so much.

 

by: HonorGodPosted on 2009-08-17 at 06:50:59ID: 25114384

pepr:  Thanks again.

However, I did find one little flaw in the code.

lines ( 55-57 & 61-63) - you "reuse" s which is the same variable name as the string over which the for loop is iterating.

Fortunately, it is easily corrected.

 

by: peprPosted on 2009-08-17 at 13:44:48ID: 25118218

Yes. Sorry for the "mistake". This was not intentional. You can also completely get rid of it by replacing

            s = ''.join(lstStr)  # collected string
            lstStr = None        # init
            lst.append(s)        # add to the current list

by

            lst.append(''.join(lstStr))  # add collected string to the current list
            lstStr = None                  # init

I am not sure, but even the original code could be correct. The s is reused while the reference to the original string can be remembered by the code implementing the for loop. Anyway, even it it was the case, I do not like tricks like that. The code should be as understandable as possible.

 

by: peprPosted on 2009-08-17 at 13:50:28ID: 25118282

The doc says (http://docs.python.org/reference/compound_stmts.html#the-for-statement)

"The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the expression_list. The suite is then executed once for each item provided by the iterator..."

This means that you should not observe an error in the produced result. Anyway, you are right.

 

by: HonorGodPosted on 2009-08-17 at 14:51:34ID: 25118740

Yes, I did exactly that (http:#a25118218)

>> "The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the expression_list. The suite is then executed once for each item provided by the iterator..."

  I agree completely.  However, as I pointed out, this is for Python 2.1, which preceded iterators... :-)

 

by: peprPosted on 2009-08-17 at 23:30:13ID: 25120537

You are definitely good in programming, so you know what to do ;) The code can be simplified. For example, you can remove the

            lstStr = None        # init
and
            assert lstStr is None

It is there only to make the potential errors more visible. You know. However, other people may be reading this...

Have a nice day ;)

 

by: HonorGodPosted on 2009-08-18 at 04:39:58ID: 25122017

You too.  Thanks again for all of your help with this question, and all of the questions on this site in which you participate.  Have a super day.

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...