Solved

Eval syntax error for long string

Posted on 2013-05-27
19
322 Views
Last Modified: 2013-05-30
Hi.
Never used Python, but am trying to debug some old python code that started failing from time to time.
After tracing the module, i pinpointed the problem to be in an eval(someString) line that fails if someString is big.
The string holds records i receive from a web service. Up to 1000 records all is well, but from there it will always fail.
Is there some way to increase that limit inside Python?

Thanks
Jaime
0
Comment
Question by:GreatSolutions
  • 11
  • 8
19 Comments
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
There's a known defect:  http://bugs.python.org/issue11383.  But it involves strings of over 70K.  Is that the kind of length you're talking about with 1000 records?  If you're really hitting the eval string length limit, there isn't a way to increase it.  If you can't change the web service, you'd need to come up with a way to parse the string into manageable chunks yourself.  What's the format of the eval string that you're getting?
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
Hi thanks for your help.
Indeed 70K is more or less where the problem starts. I am using version 2.7 by the way. Tried upgrading to 3.3 thinking it may overcome that limit, but then spent a few hours adapting the code to run under that version.
As for the string, i think overall this whole module does just string manipulations, then writes the result into a .txt file. I'll try to build a small set and paste it here, i am sure you guys will find a better way to do this without the eval()

Thanks
Jaime
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
Ok managed to capture the raw data received from the web service for a very small dataset, attached as raw.txt.
Then the following code runs, which i understand also transforms this into a dict object. The function's 'data' parameter holds the data i attached.

def parse_data(data):
    	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        	replaced = "'%s'" % (i,)
        	data = replaced.join(data.split('%s' % (i,)))

       	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return eval(data2)

Open in new window


Is there some other way to create that same dict. object without the eval?

Thanks
Jaime
raw.txt
0
 
LVL 25

Assisted Solution

by:clockwatcher
clockwatcher earned 500 total points
Comment Utility
It looks like your string ought to be parseable by a YAML parser.   Give PyYAML a try.  It works for your sample file.

from yaml import load

def parse_data(data):
    for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        replaced = "'%s'" % (i,)
        data = replaced.join(data.split('%s' % (i,)))

    data2 = data
       
    for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        replaced = "'%s'" % (i,)
        data2 = replaced.join(data2.split('%s' % (i,)))

    return load(data2)

f = open('raw.txt','r')
data = f.read()
a = parse_data(data)
print a

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
Almost there!
It works in some tests, but in others i get the following error:
  File "c:\python27\lib\site-packages\yaml\reader.py", line 165, in update
    exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #x96: invalid start byte

  in "<string>", position 1415854

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
Trying to remove the encoding error, i changed my code to
def parse_data(data):
	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
    
	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return load(data2.decode("windows-1252"))

Open in new window


It still fails in the same cases where it fails as mentioned before, but now with a different error:
  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<unicode string>", line 1, column 573852:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<unicode string>", line 1, column 573929:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
It would help to see another raw data extract that included the problem.  

But if your data really has non-escaped single quotes embedded in it (and it's not the decode doing some odd character translation), then you've got a problem.  The web service is sending stuff that isn't escaped properly and your job just got a heck of a lot tougher (if not altogether impossible).  It could have been the real problem with the eval as well:

>>> a = "['hello', 'ain't gonna happen']"
>>> a
"['hello', 'ain't gonna happen']"
>>> eval(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    ['hello', 'ain't gonna happen']
                   ^
SyntaxError: invalid syntax

Do you have access to change the code of the web service that you're pulling the data from?  There are much better ways to send this data than as a python eval string -- especially as it appears that you're not even doing any true evaluation in it.  No offense but it's kind of crazy to try and serialize data in this way.
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
I read you, and that's what i was suspecting (i.e single quotes in middle of field data). Alas, the data comes from a web service i cannot change...
I am attaching the raw data that fails (bad.txt), and also the txt file generated upon success (deals.txt), which is the whole purpose of this Python module. Maybe you have an idea...

By the way, doesn't the comma suffice in Python to distinguish the fields? If yes, i could first run some routine that completely removes single quotes from all the data received...
bad.txt
deals.txt
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
I'll take a look at it tonight, but unfortunately a comma alone isn't enough.  You need the quotes (because a string can include a comma).

a = [ 'hello, there', 'example']
a = [ 'hello', 'there', 'example']

Two completely different things.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
The decode is causing the problem.  Not really sure what your encoding is in this file but in the sample that you sent, the only bad character is the <96>.  I would stick with manually replacing that character.  Also glanced at the rest of your code and it appears that it's just enclosing the dictionary keys in quotes.  That should only be necessary for a python eval to work.  It shouldn't be necessary for a YAML file.  You can most likely get rid of it.

Try changing your code to just this:
from yaml import load
a = load(data.replace("\x96","-"))

Open in new window


Example with your bad.txt file:
from yaml import load
f = open('bad.txt','rb')
data = f.read()
a = load(data.replace("\x96","-"))
print a

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
*** Edit *** Please disregard...
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
Still, the same error message with the string...

My code looks now like:
def parse_data(data):
    

	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
        data2 = data
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	# return eval(data2)
	
	return load(data2.replace("\x96","-"))

Open in new window


Error message is:
  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<string>", line 1, column 564026:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<string>", line 1, column 564103:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
Comment Utility
Your decode wasn't causing the problem.  It's your other code that is adding the single quotes where it shouldn't.  It shouldn't be necessary any longer.  Try:

def parse_data(data):
    return load(data.decode('windows-1252'))
0
 
LVL 2

Author Closing Comment

by:GreatSolutions
Comment Utility
Thank you very much for your patience with this!!
It works perfectly now, and since it doesn't have to work that much on the string now, it's much faster :-)
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
No problem.  To be honest, I'm embarrassed I didn't notice the real issue a lot sooner.   Losing it in my old age.  Anyway... glad to have been of help. :-)
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
.......talked too fast...
Later in the module, i am now getting the following error:
  File "alp2.py", line 266, in create_txt_file
    file.write(new_line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
73662: ordinal not in range(128)

Open in new window

After the parse we saw earlier, this gets filtered a bit, then a list is built. new_line is a line from the list that is appended to a txt file.
It's as if i should somehow re-encode the result after the load...
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
..and the weirdest thing is that it all run fine when i posted before. I copied to the production server and got this error. Went back to my laptop ( where it worked ) and started getting the same error as well, that's impossible!!@!
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
It's being internally translated to unicode.  When you're writing it back out, it's trying to encode it as ascii.  Force it to whatever encoding you want to use.  Personally, I would go with utf-8.  

   file.write(new_line.encode('utf-8'))

But if whatever is using the file downstream isn't expecting utf-8, that might cause a problem.  You decoded it out from windows-1252.  So, that should be ok to put it back in as well:

   file.write(new_line.encode('windows-1252'))
0
 
LVL 2

Author Comment

by:GreatSolutions
Comment Utility
It works :-) A million thanks!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now