GVNPublic123
asked on
Encoding issues in Python
Ok so I have a string called $title that includes characters of titles of movies in all languages (English, Russian, Japanese) etc etc...
And my script is erroring out like crazy always being unable to encode and save to MySQL database the title.
So help me out, how do I encode (unicode or something) so it works for all languages and character sets. Right now my code is:
And my script is erroring out like crazy always being unable to encode and save to MySQL database the title.
So help me out, how do I encode (unicode or something) so it works for all languages and character sets. Right now my code is:
title = result.group(1).strip().replace("'", "")[0:40]+'...'
title = unicode(title, "utf-8")
ASKER
Also my table collation is utf8-general-ci
ASKER
Looks like mysql python handled I used captured collation from database, not table, so changing database collation to utf-8 fixed all latin-1 errors. Now Im stuck with utf8 ones like:
How should I sanitize strings to only allow utf-8 encodable characters?
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 39-40: invalid data
args = ('utf8', '\xd0\x92\xd0\xbb\xd0\xb0\xd0\xb4\xd0\xb8\xd0\xbc\xd0\xb8\xd1\x80 \xd0\x92\xd1\x8b\xd1\x81\xd0\xbe\xd1\x86\xd0\xba\xd0\xb8\xd0\xb9 \xd0\xb2 \xd1\x81\xd0...', 39, 41, 'invalid data')
How should I sanitize strings to only allow utf-8 encodable characters?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 246-254: ordinal not in range(256)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 39-40: invalid data
args = ('utf8', '\xd0\x92\xd0\xbb\xd0\xb0\