Solved

Python 2.7 - Normalizing French strings

Posted on 2016-09-22
14
18 Views
Last Modified: 2016-10-01
Hi there,

In Python 2.7, I need a simple way to replace French accents by there non-accented strings.

For example:
a="René"
print a
[output] >Rene

I tried the following but it did not work:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', "René").encode('ascii','ignore')

Thanks for your help,
René
0
Comment
Question by:ReneGe
  • 7
  • 7
14 Comments
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
Here is something that works:
import unicodedata
import string


def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixão Côrtes'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Hi Walter,

Thanks for your prompt reply :)

I tried you script and I got the following error message;

  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 8
SyntaxError: Non-ASCII character '\xe9' in file C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Thanks and cheers
0
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
I think it is because the shebang for utf 8 is missing.
put this as the first line of your script:

# -*- coding: utf-8 -*-

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Hi Walter,

Aint lines starting with # are just comments?
0
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
Yes, but this line have a purpose of set the encoding of the code.
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Ok thanks :)

Here is what I get.

C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames>1.py
Traceback (most recent call last):
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 9, in <module>
    print(remove_accents('PaixÚo C¶rtes'))
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 7, in remove_accents
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")
TypeError: must be unicode, not str
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
From

# -*- coding: utf-8 -*-
import unicodedata
import string

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixéo Côrtes'))

Open in new window

0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 15

Assisted Solution

by:Walter Ritzel
Walter Ritzel earned 500 total points
Comment Utility
Try this:
# -*- coding: utf-8 -*-
import unicodedata
import string


def remove_accents(data):
    return u''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == u" ")

print(remove_accents(u'Paixão Côrtes é è ò'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe3 in position 0: unexpected end of data
0
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
this should be environment related. The code works fine.
0
 
LVL 10

Accepted Solution

by:
ReneGe earned 0 total points
Comment Utility
Hi Walter,

Here what worked for me.

# coding=utf-8
from unidecode import unidecode
import sys
""" Normalise (normalize) unicode data in Python to remove umlauts, accents etc. """
encoding=sys.stdout.encoding
data = raw_input("enter the string : ")
data = data.decode(encoding)
normal = unidecode(data)
print normal

Open in new window

0
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
Ok, good that you have your problem solved. But the other solution was working fine as well.
At the end, you have installed a module to deal with a problem that python can solve for itself (with the solution I wrote, for example).
0
 
LVL 15

Expert Comment

by:Walter Ritzel
Comment Utility
Oh, a note: If you have solved your own question, please remember to close the question saying it so.
0
 
LVL 10

Author Closing Comment

by:ReneGe
Comment Utility
Thanks for all your help :)
Greatly appreciated!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now