Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Python 2.7 - Normalizing French strings

Posted on 2016-09-22
14
Medium Priority
?
42 Views
Last Modified: 2016-10-01
Hi there,

In Python 2.7, I need a simple way to replace French accents by there non-accented strings.

For example:
a="René"
print a
[output] >Rene

I tried the following but it did not work:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', "René").encode('ascii','ignore')

Thanks for your help,
René
0
Comment
Question by:ReneGe
  • 7
  • 7
14 Comments
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41810568
Here is something that works:
import unicodedata
import string


def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixão Côrtes'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41810904
Hi Walter,

Thanks for your prompt reply :)

I tried you script and I got the following error message;

  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 8
SyntaxError: Non-ASCII character '\xe9' in file C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Thanks and cheers
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811777
I think it is because the shebang for utf 8 is missing.
put this as the first line of your script:

# -*- coding: utf-8 -*-

Open in new window

0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 10

Author Comment

by:ReneGe
ID: 41811847
Hi Walter,

Aint lines starting with # are just comments?
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811861
Yes, but this line have a purpose of set the encoding of the code.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811863
Ok thanks :)

Here is what I get.

C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames>1.py
Traceback (most recent call last):
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 9, in <module>
    print(remove_accents('PaixÚo C¶rtes'))
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 7, in remove_accents
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")
TypeError: must be unicode, not str
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811865
From

# -*- coding: utf-8 -*-
import unicodedata
import string

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixéo Côrtes'))

Open in new window

0
 
LVL 16

Assisted Solution

by:Walter Ritzel
Walter Ritzel earned 2000 total points
ID: 41812217
Try this:
# -*- coding: utf-8 -*-
import unicodedata
import string


def remove_accents(data):
    return u''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == u" ")

print(remove_accents(u'Paixão Côrtes é è ò'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41812230
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe3 in position 0: unexpected end of data
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41812250
this should be environment related. The code works fine.
0
 
LVL 10

Accepted Solution

by:
ReneGe earned 0 total points
ID: 41816595
Hi Walter,

Here what worked for me.

# coding=utf-8
from unidecode import unidecode
import sys
""" Normalise (normalize) unicode data in Python to remove umlauts, accents etc. """
encoding=sys.stdout.encoding
data = raw_input("enter the string : ")
data = data.decode(encoding)
normal = unidecode(data)
print normal

Open in new window

0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816836
Ok, good that you have your problem solved. But the other solution was working fine as well.
At the end, you have installed a module to deal with a problem that python can solve for itself (with the solution I wrote, for example).
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816840
Oh, a note: If you have solved your own question, please remember to close the question saying it so.
0
 
LVL 10

Author Closing Comment

by:ReneGe
ID: 41824641
Thanks for all your help :)
Greatly appreciated!
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses

876 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question