asked on

Eval syntax error for long string

Hi.
Never used Python, but am trying to debug some old python code that started failing from time to time.
After tracing the module, i pinpointed the problem to be in an eval(someString) line that fails if someString is big.
The string holds records i receive from a web service. Up to 1000 records all is well, but from there it will always fail.
Is there some way to increase that limit inside Python?

Thanks
Jaime

clockwatcher

There's a known defect: http://bugs.python.org/issue11383. But it involves strings of over 70K. Is that the kind of length you're talking about with 1000 records? If you're really hitting the eval string length limit, there isn't a way to increase it. If you can't change the web service, you'd need to come up with a way to parse the string into manageable chunks yourself. What's the format of the eval string that you're getting?

GreatSolutions

ASKER

Hi thanks for your help.
Indeed 70K is more or less where the problem starts. I am using version 2.7 by the way. Tried upgrading to 3.3 thinking it may overcome that limit, but then spent a few hours adapting the code to run under that version.
As for the string, i think overall this whole module does just string manipulations, then writes the result into a .txt file. I'll try to build a small set and paste it here, i am sure you guys will find a better way to do this without the eval()

Thanks
Jaime

GreatSolutions

ASKER

Ok managed to capture the raw data received from the web service for a very small dataset, attached as raw.txt.
Then the following code runs, which i understand also transforms this into a dict object. The function's 'data' parameter holds the data i attached.

def parse_data(data):
    	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        	replaced = "'%s'" % (i,)
        	data = replaced.join(data.split('%s' % (i,)))

       	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return eval(data2)

Open in new window

Is there some other way to create that same dict. object without the eval?

Thanks
Jaime
raw.txt

SOLUTION

clockwatcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

GreatSolutions

ASKER

Almost there!
It works in some tests, but in others i get the following error:

  File "c:\python27\lib\site-packages\yaml\reader.py", line 165, in update
    exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #x96: invalid start byte

  in "<string>", position 1415854

Open in new window

GreatSolutions

ASKER

Trying to remove the encoding error, i changed my code to

def parse_data(data):
	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
    
	data2 = data
	
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	return load(data2.decode("windows-1252"))

Open in new window

It still fails in the same cases where it fails as mentioned before, but now with a different error:

  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<unicode string>", line 1, column 573852:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<unicode string>", line 1, column 573929:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

clockwatcher

It would help to see another raw data extract that included the problem.

But if your data really has non-escaped single quotes embedded in it (and it's not the decode doing some odd character translation), then you've got a problem. The web service is sending stuff that isn't escaped properly and your job just got a heck of a lot tougher (if not altogether impossible). It could have been the real problem with the eval as well:

>>> a = "['hello', 'ain't gonna happen']"
>>> a
"['hello', 'ain't gonna happen']"
>>> eval(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
['hello', 'ain't gonna happen']
^
SyntaxError: invalid syntax

Do you have access to change the code of the web service that you're pulling the data from? There are much better ways to send this data than as a python eval string -- especially as it appears that you're not even doing any true evaluation in it. No offense but it's kind of crazy to try and serialize data in this way.

GreatSolutions

ASKER

I read you, and that's what i was suspecting (i.e single quotes in middle of field data). Alas, the data comes from a web service i cannot change...
I am attaching the raw data that fails (bad.txt), and also the txt file generated upon success (deals.txt), which is the whole purpose of this Python module. Maybe you have an idea...

By the way, doesn't the comma suffice in Python to distinguish the fields? If yes, i could first run some routine that completely removes single quotes from all the data received...
bad.txt
deals.txt

clockwatcher

I'll take a look at it tonight, but unfortunately a comma alone isn't enough. You need the quotes (because a string can include a comma).

a = [ 'hello, there', 'example']
a = [ 'hello', 'there', 'example']

Two completely different things.

clockwatcher

The decode is causing the problem. Not really sure what your encoding is in this file but in the sample that you sent, the only bad character is the <96>. I would stick with manually replacing that character. Also glanced at the rest of your code and it appears that it's just enclosing the dictionary keys in quotes. That should only be necessary for a python eval to work. It shouldn't be necessary for a YAML file. You can most likely get rid of it.

Try changing your code to just this:

from yaml import load
a = load(data.replace("\x96","-"))

Open in new window

Example with your bad.txt file:

from yaml import load
f = open('bad.txt','rb')
data = f.read()
a = load(data.replace("\x96","-"))
print a

Open in new window

GreatSolutions

ASKER

*** Edit *** Please disregard...

GreatSolutions

ASKER

Still, the same error message with the string...

My code looks now like:

def parse_data(data):
    

	for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
        
		replaced = "'%s'" % (i,)
        
		data = replaced.join(data.split('%s' % (i,)))
        data2 = data
	for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
        
		replaced = "'%s'" % (i,)
        
		data2 = replaced.join(data2.split('%s' % (i,)))
	
	# return eval(data2)
	
	return load(data2.replace("\x96","-"))

Open in new window

Error message is:

  File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
    "expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
  in "<string>", line 1, column 564026:
     ... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
                                         ^
expected ',' or ']', but got '<scalar>'
  in "<string>", line 1, column 564103:
     ... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
                                         ^

Open in new window

ASKER CERTIFIED SOLUTION

clockwatcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

GreatSolutions

ASKER

Thank you very much for your patience with this!!
It works perfectly now, and since it doesn't have to work that much on the string now, it's much faster :-)

clockwatcher

No problem. To be honest, I'm embarrassed I didn't notice the real issue a lot sooner. Losing it in my old age. Anyway... glad to have been of help. :-)

GreatSolutions

ASKER

.......talked too fast...
Later in the module, i am now getting the following error:

  File "alp2.py", line 266, in create_txt_file
    file.write(new_line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
73662: ordinal not in range(128)

Open in new window

After the parse we saw earlier, this gets filtered a bit, then a list is built. new_line is a line from the list that is appended to a txt file.
It's as if i should somehow re-encode the result after the load...

GreatSolutions

ASKER

..and the weirdest thing is that it all run fine when i posted before. I copied to the production server and got this error. Went back to my laptop ( where it worked ) and started getting the same error as well, that's impossible!!@!

clockwatcher

It's being internally translated to unicode. When you're writing it back out, it's trying to encode it as ascii. Force it to whatever encoding you want to use. Personally, I would go with utf-8.

file.write(new_line.encode('utf-8'))

But if whatever is using the file downstream isn't expecting utf-8, that might cause a problem. You decoded it out from windows-1252. So, that should be ok to put it back in as well:

file.write(new_line.encode('windows-1252'))

GreatSolutions

ASKER

It works :-) A million thanks!