bschwarting
asked on
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I'm getting this error:
Exception has occurred: UnicodeDecodeError
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Using this code:
Exception has occurred: UnicodeDecodeError
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Using this code:
json_obj = urllib.request.urlopen(url).read()
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
data = json.loads(json_obj)
ASKER
This is the response I get now.
Exception has occurred: UnicodeDecodeError
'charmap' codec can't decode byte 0x9d in position 246: character maps to <undefined>
Exception has occurred: UnicodeDecodeError
'charmap' codec can't decode byte 0x9d in position 246: character maps to <undefined>
Perhaps I am missing something but it appears that json_obj is loaded and then almost immediately overwritten.
json_obj = urllib.request.urlopen(url ).read()
response = urllib.request.urlopen(url ).read()
json_obj = str(response, 'utf-8')
json_obj = urllib.request.urlopen(url
response = urllib.request.urlopen(url
json_obj = str(response, 'utf-8')
ASKER
I changed it to this, just to make sure, and the same error:
json_obj = urllib.request.urlopen(url ).read()
response = urllib.request.urlopen(url ).read()
json_obj2 = str(response, 'utf-8')
data = json.loads(json_obj2)
json_obj = urllib.request.urlopen(url
response = urllib.request.urlopen(url
json_obj2 = str(response, 'utf-8')
data = json.loads(json_obj2)
ASKER
What's the proper syntax I should use to send the encoding on the open? I found an example below:
with open('unicode.txt', encoding='utf-8') as f:
for line in f:
print(repr(line))
ASKER
Any thoughts?
Try the following code for your URL to learn what is actually read.
This one prints
import binascii
import urllib.request
url = 'http://python.org/'
response = urllib.request.urlopen(url) # it returns HTTPResponse object open for reading
buf = response.read(50) # read 50 bytes of the response
print(binascii.hexlify(buf))
print(repr(buf))
This one prints
d:\__Python\ee29119512>py a.py
b'3c21646f63747970652068746d6c3e0a3c212d2d5b6966206c7420494520375d3e2020203c68746d6c20636c6173733d226e'
b'<!doctype html>\n<!--[if lt IE 7]> <html class="n'
ASKER
Here is the result. What is this?
b'1f8b0800000000000000ed9c 7f6fdb389a c7ff9f57c1 0db0e9dd22 9445fd568b 62e03869e3 9bb4c9d5c9 748bbb4341 5194ad'
b'\x1f\x8b\x08\x00\x00\x00 \x00\x00\x 00\x00\xed \x9c\x7fo\ xdb8\x9a\x c7\xff\x9f W\xc1\r\xb 0\xe9\xdd" \x94E\xfdV \x8bb\xe08 i\xe3\x9b\ xb4\xc9\xd 5\xc9t\x8b \xbbCAQ\x9 4\xad'
b'1f8b0800000000000000ed9c
b'\x1f\x8b\x08\x00\x00\x00
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
you might try:
Open in new window
this is probably the second popular encoding.
if the http response is 'clean', then the http headers of the response should tell you which encoding was used for the response and instead of guessing you can use this encoding.