Link to home
Start Free TrialLog in
Avatar of invsman249
invsman249Flag for United Kingdom of Great Britain and Northern Ireland

asked on

Python parsing format similar to json?

Ok i'm trying to write a python script that will parse some datafile, its similar to json or serialized data i think.

Anyway i don't really know what the best approach is to do this and i'm looking for some help.

Everything is contained within {} and there are multiple subclasses within each see example code below.

 I was going to try and parse it and convert it to valid json format somehow but then i read about custom serialization in python and think that may work.

Any advice is much appreciated.
nameofclass1
{
	"property1" "value1"
	"property2" "value2"
	nameofclass2
	{
		"property1" "value1"
		"property2" "value2"
	}
	nameofclass2
	{
		"property1" "value1"
		"property2" "value2"
		"property3" "value3"
	}
}
nameofclass3
{
    "property1" "value1"
}

Open in new window

Avatar of pepr
pepr

There is the json module in Python -- standard module from Python 2.6.  See http://docs.python.org/library/json.html#module-json
Also, have a look at "Dive into Python 3" by Mark Pilgrim, Chapter 13. Serializing Python Objects (http://diveintopython3.org/serializing.html), namely the parts:

13.8. Saving Data to a JSON File  (http://diveintopython3.org/serializing.html#json-dump)
13.9. Mapping of Python Datatypes to JSON
13.10. Serializing Datatypes Unsupported by JSON
13.11. Loading Data from a JSON File

It is related to Python 3, but 2.6 should serve as a transitional version between Python 2 and Python 3; hence, the json things for Python 2.6 should be the same as for Python 3.
Avatar of invsman249

ASKER

I've tried the json library, the problem is in its current state it won't load into the json library because it doesn't have the commas, colons and quotes in the right places etc.

I have no control over how this data file is output so i need to work with the format that is there.
I'm not sure if the json library will do what i want how can i for example get it to read nameofclass1 when its not in quotes?
I see.  What version of Python do you use?  And what should be the wanted result of parsing?
python 2.6, ideally i'd like to be able to access each property and value of each of the classes. I imagine this may need some recursion or something as there can be classes within classes.

However the main bulk of what i want is a second level class that is repeated a few times. For the example i have called it boxclass.

I would like to be able to loop over the boxclasses and read the properties and values of this.

Hope that makes sense
nameofclass1
{
        "property1" "value1"
        "property2" "value2"
        boxclass
        {
                "property1" "value1"
                "property2" "value2"
        }
        boxclass
        {
                "property1" "value1"
                "property2" "value2"
                "property3" "value3"
        }
	boxclass
        {
                "property1" "value1"
                "property2" "value2"
                "property3" "value3"
        }
}

Open in new window

- How dirty the solution could be?
- Do you have some description of the format?  Or are you forced to guess the format only?  
- Having the example with "boxclass", how do you want the future code to be called from your script?
- What do you expect to be returned exactly in this case?

This is not JSON.  You should specify better your needs.  Are the pairs always formed by two strings?  Are they always pairs?  Are they always enclosed in {} preceded with the ambiguous name (say the same 'boxclass')?
- How dirty the solution could be?
For the time being whatever works

- Do you have some description of the format?  Or are you forced to guess the format only?  
Here is the real description of the format
http://developer.valvesoftware.com/wiki/VMF_documentation

- What do you expect to be returned exactly in this case?
list, dictionary, array? some variable to access all the properties and values?

This is not JSON.  You should specify better your needs.  Are the pairs always formed by two strings?  Are they always pairs?  Are they always enclosed in {} preceded with the ambiguous name (say the same 'boxclass')?

Of course its not JSON i never said it was........i said it had a similar format, if it was json i wouldn't have this problem now would i?
I am looking at the JSON implementation if it can be (re)used to create the VMF parser.  Probably tomorrow...
Thanks anything you can help with would be great
SOLUTION
Avatar of LunarNRG
LunarNRG
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It is definitely a good idea to use some existing parser.  For the case you want to make your own, it is always better to do it in two phases.  The first one is called a lexical analyzer (http://en.wikipedia.org/wiki/Lexical_analysis) that converts the text into the more abstract pieces called tokens (or lexemes).  See the "dirty" lexical analyzer for the purpose:

b.py
"""Dirty VMF lex parser."""

import re

rexId = re.compile(r'\s*(\w+)\s*', re.MULTILINE)
rexLBrace = re.compile(r'\s*({)', re.MULTILINE)
rexRBrace = re.compile(r'\s*(})', re.MULTILINE)
rexStr = re.compile(r'\s*"(.*?)"', re.MULTILINE)
rexComment = re.compile(r'\s*(//.*?)$', re.MULTILINE)

def lex(s, pos=0):
    while pos < len(s):

        m = rexId.match(s[pos:]) 
        if m is not None:
            yield 'id', m.group(1)
            pos += m.end()
            continue
            
        m = rexLBrace.match(s[pos:])
        if m is not None:
            yield 'lbrace', ''
            pos += m.end()
            continue
            
        m = rexRBrace.match(s[pos:])
        if m is not None:
            yield 'rbrace', ''
            pos += m.end()
            continue
            
        m = rexStr.match(s[pos:])
        if m is not None:
            yield 'str', m.group(1)
            pos += m.end()
            continue
            
        m = rexComment.match(s[pos:])
        if m is not None:
            yield 'comment', m.group(1)
            pos += m.end()
            continue
            
        yield 'error', s[pos:]
        pos = len(s)
   

s = '''  
// This is a comment.
ClassName_1
{
      "Property_1" "Value_1"
      "Property_2" "Value_2"
      ClassName_2
      {
            "Property_1" "Value_1"
      }
}'''
for t in lex(s):
    print t

Open in new window



It produces the following output (one tuple = one token):

C:\tmp\___python\invsman249\Q_27026093>python b.py
('comment', '// This is a comment.')
('id', 'ClassName_1')
('lbrace', '')
('str', 'Property_1')
('str', 'Value_1')
('str', 'Property_2')
('str', 'Value_2')
('id', 'ClassName_2')
('lbrace', '')
('str', 'Property_1')
('str', 'Value_1')
('rbrace', '')
('rbrace', '')

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks pepr thats great and give me something to work with now. Thank you!
Thank you for your input too LunarNRG!
I am glad if it helped.  Anyway, in my opinion, LunarNRG deserved at least a portion of the points.  ;)
hmm my bad i haven't used experts exchange much i should have given some points to lunarNRG, i don't think it will let me now...

You can possibly "Request attention" of the zone administrator and then re-assign the points.
Thanks i've tried that pepr, hopefully an admin sorts it out again thank you to both of you :)
I'm glad you found your answer, please don't worry about the points. I benefit from pepr's answers, as well -- as they are always very comprehensive and on target. Thank you for thinking of me! :)