GreatSolutions
asked on
Eval syntax error for long string
Hi.
Never used Python, but am trying to debug some old python code that started failing from time to time.
After tracing the module, i pinpointed the problem to be in an eval(someString) line that fails if someString is big.
The string holds records i receive from a web service. Up to 1000 records all is well, but from there it will always fail.
Is there some way to increase that limit inside Python?
Thanks
Jaime
Never used Python, but am trying to debug some old python code that started failing from time to time.
After tracing the module, i pinpointed the problem to be in an eval(someString) line that fails if someString is big.
The string holds records i receive from a web service. Up to 1000 records all is well, but from there it will always fail.
Is there some way to increase that limit inside Python?
Thanks
Jaime
There's a known defect: http://bugs.python.org/issue11383. But it involves strings of over 70K. Is that the kind of length you're talking about with 1000 records? If you're really hitting the eval string length limit, there isn't a way to increase it. If you can't change the web service, you'd need to come up with a way to parse the string into manageable chunks yourself. What's the format of the eval string that you're getting?
ASKER
Hi thanks for your help.
Indeed 70K is more or less where the problem starts. I am using version 2.7 by the way. Tried upgrading to 3.3 thinking it may overcome that limit, but then spent a few hours adapting the code to run under that version.
As for the string, i think overall this whole module does just string manipulations, then writes the result into a .txt file. I'll try to build a small set and paste it here, i am sure you guys will find a better way to do this without the eval()
Thanks
Jaime
Indeed 70K is more or less where the problem starts. I am using version 2.7 by the way. Tried upgrading to 3.3 thinking it may overcome that limit, but then spent a few hours adapting the code to run under that version.
As for the string, i think overall this whole module does just string manipulations, then writes the result into a .txt file. I'll try to build a small set and paste it here, i am sure you guys will find a better way to do this without the eval()
Thanks
Jaime
ASKER
Ok managed to capture the raw data received from the web service for a very small dataset, attached as raw.txt.
Then the following code runs, which i understand also transforms this into a dict object. The function's 'data' parameter holds the data i attached.
Is there some other way to create that same dict. object without the eval?
Thanks
Jaime
raw.txt
Then the following code runs, which i understand also transforms this into a dict object. The function's 'data' parameter holds the data i attached.
def parse_data(data):
for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
replaced = "'%s'" % (i,)
data = replaced.join(data.split('%s' % (i,)))
data2 = data
for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
replaced = "'%s'" % (i,)
data2 = replaced.join(data2.split('%s' % (i,)))
return eval(data2)
Is there some other way to create that same dict. object without the eval?
Thanks
Jaime
raw.txt
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Almost there!
It works in some tests, but in others i get the following error:
It works in some tests, but in others i get the following error:
File "c:\python27\lib\site-packages\yaml\reader.py", line 165, in update
exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #x96: invalid start byte
in "<string>", position 1415854
ASKER
Trying to remove the encoding error, i changed my code to
It still fails in the same cases where it fails as mentioned before, but now with a different error:
def parse_data(data):
for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
replaced = "'%s'" % (i,)
data = replaced.join(data.split('%s' % (i,)))
data2 = data
for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
replaced = "'%s'" % (i,)
data2 = replaced.join(data2.split('%s' % (i,)))
return load(data2.decode("windows-1252"))
It still fails in the same cases where it fails as mentioned before, but now with a different error:
File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
"expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
in "<unicode string>", line 1, column 573852:
... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
^
expected ',' or ']', but got '<scalar>'
in "<unicode string>", line 1, column 573929:
... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
^
It would help to see another raw data extract that included the problem.
But if your data really has non-escaped single quotes embedded in it (and it's not the decode doing some odd character translation), then you've got a problem. The web service is sending stuff that isn't escaped properly and your job just got a heck of a lot tougher (if not altogether impossible). It could have been the real problem with the eval as well:
>>> a = "['hello', 'ain't gonna happen']"
>>> a
"['hello', 'ain't gonna happen']"
>>> eval(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
['hello', 'ain't gonna happen']
^
SyntaxError: invalid syntax
Do you have access to change the code of the web service that you're pulling the data from? There are much better ways to send this data than as a python eval string -- especially as it appears that you're not even doing any true evaluation in it. No offense but it's kind of crazy to try and serialize data in this way.
But if your data really has non-escaped single quotes embedded in it (and it's not the decode doing some odd character translation), then you've got a problem. The web service is sending stuff that isn't escaped properly and your job just got a heck of a lot tougher (if not altogether impossible). It could have been the real problem with the eval as well:
>>> a = "['hello', 'ain't gonna happen']"
>>> a
"['hello', 'ain't gonna happen']"
>>> eval(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
['hello', 'ain't gonna happen']
^
SyntaxError: invalid syntax
Do you have access to change the code of the web service that you're pulling the data from? There are much better ways to send this data than as a python eval string -- especially as it appears that you're not even doing any true evaluation in it. No offense but it's kind of crazy to try and serialize data in this way.
ASKER
I read you, and that's what i was suspecting (i.e single quotes in middle of field data). Alas, the data comes from a web service i cannot change...
I am attaching the raw data that fails (bad.txt), and also the txt file generated upon success (deals.txt), which is the whole purpose of this Python module. Maybe you have an idea...
By the way, doesn't the comma suffice in Python to distinguish the fields? If yes, i could first run some routine that completely removes single quotes from all the data received...
bad.txt
deals.txt
I am attaching the raw data that fails (bad.txt), and also the txt file generated upon success (deals.txt), which is the whole purpose of this Python module. Maybe you have an idea...
By the way, doesn't the comma suffice in Python to distinguish the fields? If yes, i could first run some routine that completely removes single quotes from all the data received...
bad.txt
deals.txt
I'll take a look at it tonight, but unfortunately a comma alone isn't enough. You need the quotes (because a string can include a comma).
a = [ 'hello, there', 'example']
a = [ 'hello', 'there', 'example']
Two completely different things.
a = [ 'hello, there', 'example']
a = [ 'hello', 'there', 'example']
Two completely different things.
The decode is causing the problem. Not really sure what your encoding is in this file but in the sample that you sent, the only bad character is the <96>. I would stick with manually replacing that character. Also glanced at the rest of your code and it appears that it's just enclosing the dictionary keys in quotes. That should only be necessary for a python eval to work. It shouldn't be necessary for a YAML file. You can most likely get rid of it.
Try changing your code to just this:
Example with your bad.txt file:
Try changing your code to just this:
from yaml import load
a = load(data.replace("\x96","-"))
Example with your bad.txt file:
from yaml import load
f = open('bad.txt','rb')
data = f.read()
a = load(data.replace("\x96","-"))
print a
ASKER
*** Edit *** Please disregard...
ASKER
Still, the same error message with the string...
My code looks now like:
Error message is:
My code looks now like:
def parse_data(data):
for i in ['Deals', 'Adds', 'Outbound', 'Inbound', 'Pairs', 'Flights']:
replaced = "'%s'" % (i,)
data = replaced.join(data.split('%s' % (i,)))
data2 = data
for i in ['MaxPrice','Chain','DealFeature','DealDestination','RowId','Provider','Product','FareBasis','Days','Hotel','StarRating','RetDates','toDAN','fromDAN','toFlightId','fromFlightId','ToClass','FromClass','Filters','MinPrices']:
replaced = "'%s'" % (i,)
data2 = replaced.join(data2.split('%s' % (i,)))
# return eval(data2)
return load(data2.replace("\x96","-"))
Error message is:
File "c:\python27\lib\site-packages\yaml\parser.py", line 484, in parse_flow_s
equence_entry
"expected ',' or ']', but got %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a flow sequence
in "<string>", line 1, column 564026:
... SD','Market','0','11161','698'],['4323809','9588126','TAYELET',' ...
^
expected ',' or ']', but got '<scalar>'
in "<string>", line 1, column 564103:
... ','','02/06+','Mythos Boutique 'Hotel'','0','','4','4','BB','AA' ...
^
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much for your patience with this!!
It works perfectly now, and since it doesn't have to work that much on the string now, it's much faster :-)
It works perfectly now, and since it doesn't have to work that much on the string now, it's much faster :-)
No problem. To be honest, I'm embarrassed I didn't notice the real issue a lot sooner. Losing it in my old age. Anyway... glad to have been of help. :-)
ASKER
.......talked too fast...
Later in the module, i am now getting the following error:
It's as if i should somehow re-encode the result after the load...
Later in the module, i am now getting the following error:
File "alp2.py", line 266, in create_txt_file
file.write(new_line)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
73662: ordinal not in range(128)
After the parse we saw earlier, this gets filtered a bit, then a list is built. new_line is a line from the list that is appended to a txt file.It's as if i should somehow re-encode the result after the load...
ASKER
..and the weirdest thing is that it all run fine when i posted before. I copied to the production server and got this error. Went back to my laptop ( where it worked ) and started getting the same error as well, that's impossible!!@!
It's being internally translated to unicode. When you're writing it back out, it's trying to encode it as ascii. Force it to whatever encoding you want to use. Personally, I would go with utf-8.
file.write(new_line.encode ('utf-8'))
But if whatever is using the file downstream isn't expecting utf-8, that might cause a problem. You decoded it out from windows-1252. So, that should be ok to put it back in as well:
file.write(new_line.encode ('windows- 1252'))
file.write(new_line.encode
But if whatever is using the file downstream isn't expecting utf-8, that might cause a problem. You decoded it out from windows-1252. So, that should be ok to put it back in as well:
file.write(new_line.encode
ASKER
It works :-) A million thanks!