Pyhon with Oracle encoding issue

Zberteoc
Zberteoc used Ask the Experts™
on
I am using Python 3.3 on Windows 7x64 machine to connect to an Oracle server over the VPN tunnel to dump locally data in a CSV file. I am using cx_Oracle module to connect to Oracle. Everything is fine until the cursor hits a character in some string column, which causes the module to fail with error:

...
for row in orcl_cur.execute(sql_select):
        File "C:\Python33\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 426: character maps to <undefined>

AT this point the dump stops to whatever rows were saved so far.

How can I avoid this?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
If the database content is correct, then it does not use cp1252 encoding. The character 0x8f is not defined -- see here https://en.wikipedia.org/wiki/Windows-1252 or here http://www.cp1252.com/ Firstly, I suggest to double-check, whether the database really uses that encoding.

Another possibility is that the database should contain the cp1252, but there is a bug in the data.

If you need to solve the situation, you can wrap the decoding of the character the try-except construct, replace the character by something or very visible (to be able to see the problem) or by something very invisible (to mask the problem). You can also write some warning into a log in the case.

It is difficult to say without seeing something more.
I appears that Python uses that code file, cp1252.py, by default. Can I change that?
Is the database column od the unicode type (nvarchar or so)?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Only VARCHAR2.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
If that character is in the database I'm betting the database is Unicode.
select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

If you cannot get Python to handle that characterset, you might look at encoding the column.  I've not done a lot with multi-byte character sets so I'm not sure the "official" Oracle way.

Off the top of my head there is utl_i18n.escape_reference:
http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/u_i18n.htm#ARPLS71120
If it is VARCHAR2, then you probably have to tell the encoding explicitly somehow. The cp1252 is probably a default. However, I do not know the library. I guess it will be or the connection object parameter or the cursor parameter. The only other possibility is that the cursor.execute is also capable to return the binary type in the row (if possible, then it should be set similarly to the encoding). I do not know the answer.
The only way I could get pass the querying the Oracle server without actually failing was to change the NLS_LANG registry key value to AMERICAN_AMERICA.UTF8 on the Windows machine where I was running teh Python module and restart it.  Before was some Canadian setting. The key is here:

HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_XE\NLS_LANG

After that it won't give that error anymore and I could deal with the "strange" characters inside Python. They looked like: Ï¿Ï¿½ , this was breaking the cx_Oracle cursor when querying the Oracle server.
The solution was suggested by a co-worker of mine and I found reference to that NLS_LANG windows key on the net.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial