Solved

Eval syntax error for long string

Posted on 2013-05-27
19
325 Views
Last Modified: 2013-05-30
Hi.
Never used Python, but am trying to debug some old python code that started failing from time to time.
After tracing the module, i pinpointed the problem to be in an eval(someString) line that fails if someString is big.
The string holds records i receive from a web service. Up to 1000 records all is well, but from there it will always fail.
Is there some way to increase that limit inside Python?

Thanks
Jaime
0
Comment
Question by:GreatSolutions
  • 11
  • 8
19 Comments
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39199548
There's a known defect:  http://bugs.python.org/issue11383.  But it involves strings of over 70K.  Is that the kind of length you're talking about with 1000 records?  If you're really hitting the eval string length limit, there isn't a way to increase it.  If you can't change the web service, you'd need to come up with a way to parse the string into manageable chunks yourself.  What's the format of the eval string that you're getting?
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39200254
Hi thanks for your help.
Indeed 70K is more or less where the problem starts. I am using version 2.7 by the way. Tried upgrading to 3.3 thinking it may overcome that limit, but then spent a few hours adapting the code to run under that version.
As for the string, i think overall this whole module does just string manipulations, then writes the result into a .txt file. I'll try to build a small set and paste it here, i am sure you guys will find a better way to do this without the eval()

Thanks
Jaime
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39200739
Ok managed to capture the raw data received from the web service for a very small dataset, attached as raw.txt.
Then the following code runs, which i understand also transforms this into a dict object. The function's 'data' parameter holds the data i attached.

def parse_data(data):
    	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        	replaced = "'%s'" % (i,)
        	data = replaced.join(data.split('%s' % (i,)))

       	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return eval(data2)

Open in new window


Is there some other way to create that same dict. object without the eval?

Thanks
Jaime
raw.txt
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 25

Assisted Solution

by:clockwatcher
clockwatcher earned 500 total points
ID: 39203291
It looks like your string ought to be parseable by a YAML parser.   Give PyYAML a try.  It works for your sample file.

from yaml import load

def parse_data(data):
    for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        replaced = "'%s'" % (i,)
        data = replaced.join(data.split('%s' % (i,)))

    data2 = data
       
    for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        replaced = "'%s'" % (i,)
        data2 = replaced.join(data2.split('%s' % (i,)))

    return load(data2)

f = open('raw.txt','r')
data = f.read()
a = parse_data(data)
print a

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39203791
Almost there!
It works in some tests, but in others i get the following error:
  File "c:\python27\lib\site-packages\yaml\reader.py", line 165, in update
    exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #x96: invalid start byte

  in "<string>", position 1415854

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39204019
Trying to remove the encoding error, i changed my code to
def parse_data(data):
	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
    
	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return load(data2.decode("windows-1252"))

Open in new window


It still fails in the same cases where it fails as mentioned before, but now with a different error:
  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<unicode string>", line 1, column 573852:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<unicode string>", line 1, column 573929:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39205277
It would help to see another raw data extract that included the problem.  

But if your data really has non-escaped single quotes embedded in it (and it's not the decode doing some odd character translation), then you've got a problem.  The web service is sending stuff that isn't escaped properly and your job just got a heck of a lot tougher (if not altogether impossible).  It could have been the real problem with the eval as well:

>>> a = "['hello', 'ain't gonna happen']"
>>> a
"['hello', 'ain't gonna happen']"
>>> eval(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    ['hello', 'ain't gonna happen']
                   ^
SyntaxError: invalid syntax

Do you have access to change the code of the web service that you're pulling the data from?  There are much better ways to send this data than as a python eval string -- especially as it appears that you're not even doing any true evaluation in it.  No offense but it's kind of crazy to try and serialize data in this way.
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39205468
I read you, and that's what i was suspecting (i.e single quotes in middle of field data). Alas, the data comes from a web service i cannot change...
I am attaching the raw data that fails (bad.txt), and also the txt file generated upon success (deals.txt), which is the whole purpose of this Python module. Maybe you have an idea...

By the way, doesn't the comma suffice in Python to distinguish the fields? If yes, i could first run some routine that completely removes single quotes from all the data received...
bad.txt
deals.txt
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39205842
I'll take a look at it tonight, but unfortunately a comma alone isn't enough.  You need the quotes (because a string can include a comma).

a = [ 'hello, there', 'example']
a = [ 'hello', 'there', 'example']

Two completely different things.
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39206538
The decode is causing the problem.  Not really sure what your encoding is in this file but in the sample that you sent, the only bad character is the <96>.  I would stick with manually replacing that character.  Also glanced at the rest of your code and it appears that it's just enclosing the dictionary keys in quotes.  That should only be necessary for a python eval to work.  It shouldn't be necessary for a YAML file.  You can most likely get rid of it.

Try changing your code to just this:
from yaml import load
a = load(data.replace("\x96","-"))

Open in new window


Example with your bad.txt file:
from yaml import load
f = open('bad.txt','rb')
data = f.read()
a = load(data.replace("\x96","-"))
print a

Open in new window

0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39207340
*** Edit *** Please disregard...
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39207395
Still, the same error message with the string...

My code looks now like:
def parse_data(data):
    

	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
        data2 = data
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	# return eval(data2)
	
	return load(data2.replace("\x96","-"))

Open in new window


Error message is:
  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<string>", line 1, column 564026:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<string>", line 1, column 564103:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 39207712
Your decode wasn't causing the problem.  It's your other code that is adding the single quotes where it shouldn't.  It shouldn't be necessary any longer.  Try:

def parse_data(data):
    return load(data.decode('windows-1252'))
0
 
LVL 2

Author Closing Comment

by:GreatSolutions
ID: 39207849
Thank you very much for your patience with this!!
It works perfectly now, and since it doesn't have to work that much on the string now, it's much faster :-)
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39207938
No problem.  To be honest, I'm embarrassed I didn't notice the real issue a lot sooner.   Losing it in my old age.  Anyway... glad to have been of help. :-)
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39208041
.......talked too fast...
Later in the module, i am now getting the following error:
  File "alp2.py", line 266, in create_txt_file
    file.write(new_line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
73662: ordinal not in range(128)

Open in new window

After the parse we saw earlier, this gets filtered a bit, then a list is built. new_line is a line from the list that is appended to a txt file.
It's as if i should somehow re-encode the result after the load...
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39208106
..and the weirdest thing is that it all run fine when i posted before. I copied to the production server and got this error. Went back to my laptop ( where it worked ) and started getting the same error as well, that's impossible!!@!
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39208203
It's being internally translated to unicode.  When you're writing it back out, it's trying to encode it as ascii.  Force it to whatever encoding you want to use.  Personally, I would go with utf-8.  

   file.write(new_line.encode('utf-8'))

But if whatever is using the file downstream isn't expecting utf-8, that might cause a problem.  You decoded it out from windows-1252.  So, that should be ok to put it back in as well:

   file.write(new_line.encode('windows-1252'))
0
 
LVL 2

Author Comment

by:GreatSolutions
ID: 39208228
It works :-) A million thanks!
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Sequence is something that used to store data in it in very simple words. Let us just create a list first. To create a list first of all we need to give a name to our list which I have taken as “COURSE” followed by equals sign and finally enclosed …
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question