Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

What is the best way to remove unicode character from  a tuple in Python w/Sqlite?

Posted on 2010-08-21
5
Medium Priority
?
1,663 Views
Last Modified: 2012-05-10
When I execute a select on a sqlite database, the returning data has a u'.  It returns a tuple like ([(u'ATT',), (u'TIER',), (u'TIO',)].  How can I get it into a list like ('ATT',, 'TIER', 'TMO')?
0
Comment
Question by:abuhaneef
5 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 720 total points
ID: 33494238
The encode() method of a string (http://docs.python.org/library/stdtypes.html#str.encode) will change the encoding from unicode to ascii.  

Your parens don't match so it's hard to tell what you've really got there.  Assuming the opening paren is something you added it looks like a list of tuples.  If that's the case,



origlist=[(u'ATT',), (u'TIER',), (u'TIO',)]
newlist = []
for tup in origlist:
    newlist = newlist + [item.encode('ascii','backslashreplace') for item in tup]

print newlist

Open in new window

0
 
LVL 17

Assisted Solution

by:gelonida
gelonida earned 160 total points
ID: 33496027
Please explain exactly what you would like to happen exactly.

Would you like, that all unicode characters not existing with ASCII encoding are replaced by a special character,
that special characters are escaped,  that the characters are removed or that the string will be unreadable, but be saved as it is?

for example to ignore (skip) any unicode character

you had to change clockwatcher' s script from

encode('ascii','backslashreplace') to
encode('ascii','ignore')

backslashreplace is probably what you want though




0
 
LVL 29

Expert Comment

by:pepr
ID: 33518352
The u'ATT' does not mean that there is some extra u.  It is only the way how Python tells you that the 'ATT' is a Unicode string.  In other words, all of the 'A', 'T', and 'T' characters are in Unicode.  You probably do not want to remove them.

It could be the case that you want to convert the unicode to ASCII or to some other encoding. Clockwatcher has shown this at the line 4 and gelonida added some notes to that.  The second argument is related to error handling (when conversion of a character cannot be done).

I can also imagine that you may be confused by (whathever, ) -- the trailing comma. It only says (together) with the parenthesis) that the visual representation means representation of a tuple with a single element.

If the tuples contain a single element, the clockwatcher's code could be replaced by one-liner using the list comprehension construct (the clockwatcher's line 4) -- see the last line below.
lst1 = [(u'ATT',), (u'TIER',), (u'TIOX',)]

lst2 = [ t[0] for t in lst1 ]  # without removing the unicode here
print lst2

print [ t[0].encode('ascii', 'backslashreplace') for t in lst1 ]  # with conversion to ASCII

Open in new window

0
 

Author Comment

by:abuhaneef
ID: 33519382
Thanks to all.  I actually came up with this:

li=[]
for row in data:
      li.append(str(row[0]))

but I find clockwatcher's to be more acceptable.

Thanks
0
 
LVL 29

Assisted Solution

by:pepr
pepr earned 120 total points
ID: 33519539
Your code may be shortened to the single line below (identical behaviour).  The truth is that the str() built in function returns "informal" string representation of the object.  You are probably right to use the explicit encoding (clockwatcher).  On the other hand, the str() will work also in Python 3, because the str() will is the empty operation with respect to the string type.
li = [ str(row[0]) for row in data ]

Open in new window

0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Suggested Courses

963 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question