?
Solved

Strange codec error

Posted on 2005-04-04
12
Medium Priority
?
215 Views
Last Modified: 2012-05-05
I have been playing around with pyGoogle (python wrapper for the google API), and I have come across a strange error that I have never seen before.  Here is a code snippet:

for r in data.results:
    print 'Title: ',r.title
    print 'URL: ',r.URL
    print 'Summary: ',r.snippet
    print

And here is the error:
UnicodeEncodeError: 'ascii' codec can't encode character '\ua9' in position 119: ordinal not in range(128)

This error specifically happens when the script tries to print r.snippet (the summary snippet of the site that google returns).

I have found very little on the net about this error message or how to correct it.  Any thoughts?

Thanks,
Brian
0
Comment
Question by:bnblazer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 6
12 Comments
 
LVL 1

Author Comment

by:bnblazer
ID: 13700026
OK, I think I have a fix for this, but it requires that I know what the text encoding of the data is:

print unicode(r.snippet, 'whatverTheEncodingIs').encode('whateverYouWantItEncodedInto')

So here is the next question.  How in python can Idetermine what the encoding is?  Is there function for this?

Thanks,
brian
0
 
LVL 15

Expert Comment

by:mish33
ID: 13704523
According to http://www.google.com/apis/reference.html#2_5
r.snippet (as well as all other results) are UTF-8.
0
 
LVL 1

Author Comment

by:bnblazer
ID: 13706575
Then doesn't python handle UTF-8 well?  \ua9 is the copyright sign, right?  Do I need to encode it over to ASCII?
0
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

 
LVL 15

Expert Comment

by:mish33
ID: 13708072
u'\u00a9' is unicode (C) sign and it doesn't exist in ascii.
In windows encodings '\xa9' is ©.
0
 
LVL 15

Expert Comment

by:mish33
ID: 13709304
To print it <b>in windows</b> environment use:
  print u'\u00a9'.encode('latin1')
0
 
LVL 1

Author Comment

by:bnblazer
ID: 13709342
I should say that this is happening on a Mac OS 10.3.7.  I am beginning to wonder if this is an issue with the wrapper that is around the original java based google api.

Brian
0
 
LVL 15

Expert Comment

by:mish33
ID: 13709500
Try .encode('mac_roman')
0
 
LVL 1

Author Comment

by:bnblazer
ID: 13709875
OK, tried the encoding and no help - same error.  Just tried it on a windows and a linux box and got the same error message.

I am really begining to think I may have stumbled on a bug here.

Brian
0
 
LVL 15

Expert Comment

by:mish33
ID: 13711717
Yes I saw your bug #1177232. It's not a bug, it's a feature. ;)
Please post data.results[3].encode('latin1') from mac run as I don't have mac around.
0
 
LVL 1

Author Comment

by:bnblazer
ID: 13712188
Here is the output from using that line....

Traceback (most recent call last):
  File "/Users/brianblazer/Documents/workspace/pythonStuff/search.py", line 24, in ?
    print data.results[3].encode('latin1')
AttributeError: SearchResult instance has no attribute 'encode'
0
 
LVL 15

Accepted Solution

by:
mish33 earned 1000 total points
ID: 13712252
Well that one was one unicode (as it was in my run).
Than
for r in data.results:
  if isinstance(r.snippet, unicode):
    print r.snippet.encode('latin1')
  else:
    print r.snippet
0
 
LVL 1

Author Comment

by:bnblazer
ID: 13712317
You are the winner!!!!  That one solved it all!

Thank you for your help.

Brian
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question