Solved

What is the best way to remove unicode character from  a tuple in Python w/Sqlite?

Posted on 2010-08-21
5
567 Views
Last Modified: 2012-05-10
When I execute a select on a sqlite database, the returning data has a u'.  It returns a tuple like ([(u'ATT',), (u'TIER',), (u'TIO',)].  How can I get it into a list like ('ATT',, 'TIER', 'TMO')?
0
Comment
Question by:abuhaneef
5 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 180 total points
ID: 33494238
The encode() method of a string (http://docs.python.org/library/stdtypes.html#str.encode) will change the encoding from unicode to ascii.  

Your parens don't match so it's hard to tell what you've really got there.  Assuming the opening paren is something you added it looks like a list of tuples.  If that's the case,




origlist=[(u'ATT',), (u'TIER',), (u'TIO',)]

newlist = []

for tup in origlist:

    newlist = newlist + [item.encode('ascii','backslashreplace') for item in tup]



print newlist

Open in new window

0
 
LVL 16

Assisted Solution

by:gelonida
gelonida earned 40 total points
ID: 33496027
Please explain exactly what you would like to happen exactly.

Would you like, that all unicode characters not existing with ASCII encoding are replaced by a special character,
that special characters are escaped,  that the characters are removed or that the string will be unreadable, but be saved as it is?

for example to ignore (skip) any unicode character

you had to change clockwatcher' s script from

encode('ascii','backslashreplace') to
encode('ascii','ignore')

backslashreplace is probably what you want though




0
 
LVL 28

Expert Comment

by:pepr
ID: 33518352
The u'ATT' does not mean that there is some extra u.  It is only the way how Python tells you that the 'ATT' is a Unicode string.  In other words, all of the 'A', 'T', and 'T' characters are in Unicode.  You probably do not want to remove them.

It could be the case that you want to convert the unicode to ASCII or to some other encoding. Clockwatcher has shown this at the line 4 and gelonida added some notes to that.  The second argument is related to error handling (when conversion of a character cannot be done).

I can also imagine that you may be confused by (whathever, ) -- the trailing comma. It only says (together) with the parenthesis) that the visual representation means representation of a tuple with a single element.

If the tuples contain a single element, the clockwatcher's code could be replaced by one-liner using the list comprehension construct (the clockwatcher's line 4) -- see the last line below.
lst1 = [(u'ATT',), (u'TIER',), (u'TIOX',)]

lst2 = [ t[0] for t in lst1 ]  # without removing the unicode here
print lst2

print [ t[0].encode('ascii', 'backslashreplace') for t in lst1 ]  # with conversion to ASCII

Open in new window

0
 

Author Comment

by:abuhaneef
ID: 33519382
Thanks to all.  I actually came up with this:

li=[]
for row in data:
      li.append(str(row[0]))

but I find clockwatcher's to be more acceptable.

Thanks
0
 
LVL 28

Assisted Solution

by:pepr
pepr earned 30 total points
ID: 33519539
Your code may be shortened to the single line below (identical behaviour).  The truth is that the str() built in function returns "informal" string representation of the object.  You are probably right to use the explicit encoding (clockwatcher).  On the other hand, the str() will work also in Python 3, because the str() will is the empty operation with respect to the string type.
li = [ str(row[0]) for row in data ]

Open in new window

0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Variable is a place holder or reserved memory locations to store any value. Which means whenever we create a variable, indirectly we are reserving some space in the memory. The interpreter assigns or allocates some space in the memory based on the d…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now