Avatar of Zberteoc
Zberteoc
Flag for Canada asked on

Pyhon with Oracle encoding issue

I am using Python 3.3 on Windows 7x64 machine to connect to an Oracle server over the VPN tunnel to dump locally data in a CSV file. I am using cx_Oracle module to connect to Oracle. Everything is fine until the cursor hits a character in some string column, which causes the module to fail with error:

...
for row in orcl_cur.execute(sql_select):
        File "C:\Python33\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 426: character maps to <undefined>

AT this point the dump stops to whatever rows were saved so far.

How can I avoid this?
Oracle DatabasePython

Avatar of undefined
Last Comment
Zberteoc

8/22/2022 - Mon
pepr

If the database content is correct, then it does not use cp1252 encoding. The character 0x8f is not defined -- see here https://en.wikipedia.org/wiki/Windows-1252 or here http://www.cp1252.com/ Firstly, I suggest to double-check, whether the database really uses that encoding.

Another possibility is that the database should contain the cp1252, but there is a bug in the data.

If you need to solve the situation, you can wrap the decoding of the character the try-except construct, replace the character by something or very visible (to be able to see the problem) or by something very invisible (to mask the problem). You can also write some warning into a log in the case.

It is difficult to say without seeing something more.
Zberteoc

ASKER
I appears that Python uses that code file, cp1252.py, by default. Can I change that?
pepr

Is the database column od the unicode type (nvarchar or so)?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Zberteoc

ASKER
Only VARCHAR2.
slightwv (䄆 Netminder)

If that character is in the database I'm betting the database is Unicode.
select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

If you cannot get Python to handle that characterset, you might look at encoding the column.  I've not done a lot with multi-byte character sets so I'm not sure the "official" Oracle way.

Off the top of my head there is utl_i18n.escape_reference:
http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/u_i18n.htm#ARPLS71120
pepr

If it is VARCHAR2, then you probably have to tell the encoding explicitly somehow. The cp1252 is probably a default. However, I do not know the library. I guess it will be or the connection object parameter or the cursor parameter. The only other possibility is that the cursor.execute is also capable to return the binary type in the row (if possible, then it should be set similarly to the encoding). I do not know the answer.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Zberteoc

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Zberteoc

ASKER
The solution was suggested by a co-worker of mine and I found reference to that NLS_LANG windows key on the net.