Solved

Python 2.7 - Normalizing French strings

Posted on 2016-09-22
14
25 Views
Last Modified: 2016-10-01
Hi there,

In Python 2.7, I need a simple way to replace French accents by there non-accented strings.

For example:
a="René"
print a
[output] >Rene

I tried the following but it did not work:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', "René").encode('ascii','ignore')

Thanks for your help,
René
0
Comment
Question by:ReneGe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
14 Comments
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41810568
Here is something that works:
import unicodedata
import string


def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixão Côrtes'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41810904
Hi Walter,

Thanks for your prompt reply :)

I tried you script and I got the following error message;

  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 8
SyntaxError: Non-ASCII character '\xe9' in file C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Thanks and cheers
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811777
I think it is because the shebang for utf 8 is missing.
put this as the first line of your script:

# -*- coding: utf-8 -*-

Open in new window

0
Is Your DevOps Pipeline Leaking?

Is your CI/CD pipeline a hodge-podge of randomly connected tools? You’ve likely got a tool to fix one problem & then a different tool to fix another, resulting in a cluster of tools with overlapping functionality. Learn how to optimize your pipeline with Gartner's recommendations

 
LVL 10

Author Comment

by:ReneGe
ID: 41811847
Hi Walter,

Aint lines starting with # are just comments?
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811861
Yes, but this line have a purpose of set the encoding of the code.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811863
Ok thanks :)

Here is what I get.

C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames>1.py
Traceback (most recent call last):
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 9, in <module>
    print(remove_accents('PaixÚo C¶rtes'))
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 7, in remove_accents
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")
TypeError: must be unicode, not str
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811865
From

# -*- coding: utf-8 -*-
import unicodedata
import string

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixéo Côrtes'))

Open in new window

0
 
LVL 16

Assisted Solution

by:Walter Ritzel
Walter Ritzel earned 500 total points
ID: 41812217
Try this:
# -*- coding: utf-8 -*-
import unicodedata
import string


def remove_accents(data):
    return u''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == u" ")

print(remove_accents(u'Paixão Côrtes é è ò'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41812230
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe3 in position 0: unexpected end of data
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41812250
this should be environment related. The code works fine.
0
 
LVL 10

Accepted Solution

by:
ReneGe earned 0 total points
ID: 41816595
Hi Walter,

Here what worked for me.

# coding=utf-8
from unidecode import unidecode
import sys
""" Normalise (normalize) unicode data in Python to remove umlauts, accents etc. """
encoding=sys.stdout.encoding
data = raw_input("enter the string : ")
data = data.decode(encoding)
normal = unidecode(data)
print normal

Open in new window

0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816836
Ok, good that you have your problem solved. But the other solution was working fine as well.
At the end, you have installed a module to deal with a problem that python can solve for itself (with the solution I wrote, for example).
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816840
Oh, a note: If you have solved your own question, please remember to close the question saying it so.
0
 
LVL 10

Author Closing Comment

by:ReneGe
ID: 41824641
Thanks for all your help :)
Greatly appreciated!
0

Featured Post

Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to count occurrences of each item in an array.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question