asked on

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I'm getting this error:
Exception has occurred: UnicodeDecodeError
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Using this code:

json_obj = urllib.request.urlopen(url).read() 

response = urllib.request.urlopen(url).read()

json_obj = str(response, 'utf-8')

data = json.loads(json_obj)

Open in new window

gelonida

it seems the response is not encoded with utf-8

you might try:

json_obj = str(response, 'cp1252')

Open in new window

this is probably the second popular encoding.

if the http response is 'clean', then the http headers of the response should tell you which encoding was used for the response and instead of guessing you can use this encoding.

bschwarting

ASKER

This is the response I get now.

Exception has occurred: UnicodeDecodeError
'charmap' codec can't decode byte 0x9d in position 246: character maps to <undefined>

Dr. Klahn

Perhaps I am missing something but it appears that json_obj is loaded and then almost immediately overwritten.

json_obj = urllib.request.urlopen(url).read()
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')

bschwarting

ASKER

I changed it to this, just to make sure, and the same error:

json_obj = urllib.request.urlopen(url).read()
response = urllib.request.urlopen(url).read()
json_obj2 = str(response, 'utf-8')
data = json.loads(json_obj2)

bschwarting

ASKER

What's the proper syntax I should use to send the encoding on the open? I found an example below:

with open('unicode.txt', encoding='utf-8') as f:
    for line in f:
        print(repr(line))

Open in new window

bschwarting

ASKER

Any thoughts?

pepr

Try the following code for your URL to learn what is actually read.

import binascii
import urllib.request

url = 'http://python.org/'
response = urllib.request.urlopen(url)  # it returns HTTPResponse object open for reading
buf = response.read(50)                 # read 50 bytes of the response
print(binascii.hexlify(buf))
print(repr(buf))

Open in new window

This one prints

d:\__Python\ee29119512>py a.py
b'3c21646f63747970652068746d6c3e0a3c212d2d5b6966206c7420494520375d3e2020203c68746d6c20636c6173733d226e'
b'<!doctype html>\n<!--[if lt IE 7]>   <html class="n'

Open in new window

bschwarting

ASKER

Here is the result. What is this?

b'1f8b0800000000000000ed9c7f6fdb389ac7ff9f57c10db0e9dd229445fd568b62e03869e39bb4c9d5c9748bbb43415194ad'
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xed\x9c\x7fo\xdb8\x9a\xc7\xff\x9fW\xc1\r\xb0\xe9\xdd"\x94E\xfdV\x8bb\xe08i\xe3\x9b\xb4\xc9\xd5\xc9t\x8b\xbbCAQ\x94\xad'

ASKER CERTIFIED SOLUTION

pepr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial