• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 74
  • Last Modified:

Python 2.7 - Normalizing French strings

Hi there,

In Python 2.7, I need a simple way to replace French accents by there non-accented strings.

For example:
a="René"
print a
[output] >Rene

I tried the following but it did not work:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', "René").encode('ascii','ignore')

Thanks for your help,
René
0
ReneGe
Asked:
ReneGe
  • 7
  • 7
2 Solutions
 
Walter RitzelSenior Software EngineerCommented:
Here is something that works:
import unicodedata
import string


def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixão Côrtes'))

Open in new window

0
 
ReneGeAuthor Commented:
Hi Walter,

Thanks for your prompt reply :)

I tried you script and I got the following error message;

  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 8
SyntaxError: Non-ASCII character '\xe9' in file C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Thanks and cheers
0
 
Walter RitzelSenior Software EngineerCommented:
I think it is because the shebang for utf 8 is missing.
put this as the first line of your script:

# -*- coding: utf-8 -*-

Open in new window

0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
ReneGeAuthor Commented:
Hi Walter,

Aint lines starting with # are just comments?
0
 
Walter RitzelSenior Software EngineerCommented:
Yes, but this line have a purpose of set the encoding of the code.
0
 
ReneGeAuthor Commented:
Ok thanks :)

Here is what I get.

C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames>1.py
Traceback (most recent call last):
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 9, in <module>
    print(remove_accents('PaixÚo C¶rtes'))
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 7, in remove_accents
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")
TypeError: must be unicode, not str
0
 
ReneGeAuthor Commented:
From

# -*- coding: utf-8 -*-
import unicodedata
import string

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixéo Côrtes'))

Open in new window

0
 
Walter RitzelSenior Software EngineerCommented:
Try this:
# -*- coding: utf-8 -*-
import unicodedata
import string


def remove_accents(data):
    return u''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == u" ")

print(remove_accents(u'Paixão Côrtes é è ò'))

Open in new window

0
 
ReneGeAuthor Commented:
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe3 in position 0: unexpected end of data
0
 
Walter RitzelSenior Software EngineerCommented:
this should be environment related. The code works fine.
0
 
ReneGeAuthor Commented:
Hi Walter,

Here what worked for me.

# coding=utf-8
from unidecode import unidecode
import sys
""" Normalise (normalize) unicode data in Python to remove umlauts, accents etc. """
encoding=sys.stdout.encoding
data = raw_input("enter the string : ")
data = data.decode(encoding)
normal = unidecode(data)
print normal

Open in new window

0
 
Walter RitzelSenior Software EngineerCommented:
Ok, good that you have your problem solved. But the other solution was working fine as well.
At the end, you have installed a module to deal with a problem that python can solve for itself (with the solution I wrote, for example).
0
 
Walter RitzelSenior Software EngineerCommented:
Oh, a note: If you have solved your own question, please remember to close the question saying it so.
0
 
ReneGeAuthor Commented:
Thanks for all your help :)
Greatly appreciated!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

  • 7
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now