• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3915
  • Last Modified:

What is the best way to remove unicode character from a tuple in Python w/Sqlite?

When I execute a select on a sqlite database, the returning data has a u'.  It returns a tuple like ([(u'ATT',), (u'TIER',), (u'TIO',)].  How can I get it into a list like ('ATT',, 'TIER', 'TMO')?
0
abuhaneef
Asked:
abuhaneef
3 Solutions
 
clockwatcherCommented:
The encode() method of a string (http://docs.python.org/library/stdtypes.html#str.encode) will change the encoding from unicode to ascii.  

Your parens don't match so it's hard to tell what you've really got there.  Assuming the opening paren is something you added it looks like a list of tuples.  If that's the case,



origlist=[(u'ATT',), (u'TIER',), (u'TIO',)]
newlist = []
for tup in origlist:
    newlist = newlist + [item.encode('ascii','backslashreplace') for item in tup]

print newlist

Open in new window

0
 
gelonidaCommented:
Please explain exactly what you would like to happen exactly.

Would you like, that all unicode characters not existing with ASCII encoding are replaced by a special character,
that special characters are escaped,  that the characters are removed or that the string will be unreadable, but be saved as it is?

for example to ignore (skip) any unicode character

you had to change clockwatcher' s script from

encode('ascii','backslashreplace') to
encode('ascii','ignore')

backslashreplace is probably what you want though




0
 
peprCommented:
The u'ATT' does not mean that there is some extra u.  It is only the way how Python tells you that the 'ATT' is a Unicode string.  In other words, all of the 'A', 'T', and 'T' characters are in Unicode.  You probably do not want to remove them.

It could be the case that you want to convert the unicode to ASCII or to some other encoding. Clockwatcher has shown this at the line 4 and gelonida added some notes to that.  The second argument is related to error handling (when conversion of a character cannot be done).

I can also imagine that you may be confused by (whathever, ) -- the trailing comma. It only says (together) with the parenthesis) that the visual representation means representation of a tuple with a single element.

If the tuples contain a single element, the clockwatcher's code could be replaced by one-liner using the list comprehension construct (the clockwatcher's line 4) -- see the last line below.
lst1 = [(u'ATT',), (u'TIER',), (u'TIOX',)]

lst2 = [ t[0] for t in lst1 ]  # without removing the unicode here
print lst2

print [ t[0].encode('ascii', 'backslashreplace') for t in lst1 ]  # with conversion to ASCII

Open in new window

0
 
abuhaneefAuthor Commented:
Thanks to all.  I actually came up with this:

li=[]
for row in data:
      li.append(str(row[0]))

but I find clockwatcher's to be more acceptable.

Thanks
0
 
peprCommented:
Your code may be shortened to the single line below (identical behaviour).  The truth is that the str() built in function returns "informal" string representation of the object.  You are probably right to use the explicit encoding (clockwatcher).  On the other hand, the str() will work also in Python 3, because the str() will is the empty operation with respect to the string type.
li = [ str(row[0]) for row in data ]

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now