Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Python 2.7 - Normalizing French strings

Posted on 2016-09-22
14
Medium Priority
?
39 Views
Last Modified: 2016-10-01
Hi there,

In Python 2.7, I need a simple way to replace French accents by there non-accented strings.

For example:
a="René"
print a
[output] >Rene

I tried the following but it did not work:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', "René").encode('ascii','ignore')

Thanks for your help,
René
0
Comment
Question by:ReneGe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
14 Comments
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41810568
Here is something that works:
import unicodedata
import string


def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixão Côrtes'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41810904
Hi Walter,

Thanks for your prompt reply :)

I tried you script and I got the following error message;

  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 8
SyntaxError: Non-ASCII character '\xe9' in file C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Thanks and cheers
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811777
I think it is because the shebang for utf 8 is missing.
put this as the first line of your script:

# -*- coding: utf-8 -*-

Open in new window

0
URL rewriting in AWS CloudFront

A quick how-to guide to implement with a Lambda function!

 
LVL 10

Author Comment

by:ReneGe
ID: 41811847
Hi Walter,

Aint lines starting with # are just comments?
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41811861
Yes, but this line have a purpose of set the encoding of the code.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811863
Ok thanks :)

Here is what I get.

C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames>1.py
Traceback (most recent call last):
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 9, in <module>
    print(remove_accents('PaixÚo C¶rtes'))
  File "C:\Users\Rene\Documents\_Python\2.7\NormalizingFolderNames\1.py", line 7, in remove_accents
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")
TypeError: must be unicode, not str
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41811865
From

# -*- coding: utf-8 -*-
import unicodedata
import string

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == " ")

print(remove_accents('Paixéo Côrtes'))

Open in new window

0
 
LVL 16

Assisted Solution

by:Walter Ritzel
Walter Ritzel earned 2000 total points
ID: 41812217
Try this:
# -*- coding: utf-8 -*-
import unicodedata
import string


def remove_accents(data):
    return u''.join(x for x in unicodedata.normalize('NFKD', data) if x in string.ascii_letters or x == u" ")

print(remove_accents(u'Paixão Côrtes é è ò'))

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41812230
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe3 in position 0: unexpected end of data
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41812250
this should be environment related. The code works fine.
0
 
LVL 10

Accepted Solution

by:
ReneGe earned 0 total points
ID: 41816595
Hi Walter,

Here what worked for me.

# coding=utf-8
from unidecode import unidecode
import sys
""" Normalise (normalize) unicode data in Python to remove umlauts, accents etc. """
encoding=sys.stdout.encoding
data = raw_input("enter the string : ")
data = data.decode(encoding)
normal = unidecode(data)
print normal

Open in new window

0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816836
Ok, good that you have your problem solved. But the other solution was working fine as well.
At the end, you have installed a module to deal with a problem that python can solve for itself (with the solution I wrote, for example).
0
 
LVL 16

Expert Comment

by:Walter Ritzel
ID: 41816840
Oh, a note: If you have solved your own question, please remember to close the question saying it so.
0
 
LVL 10

Author Closing Comment

by:ReneGe
ID: 41824641
Thanks for all your help :)
Greatly appreciated!
0

Featured Post

Basic Security of Your VPC

So, you’ve got this shiny new VPC and a fancy new application configured on your EC2 servers ready to go. This application is only accessible from your computer, which is great for security, but you need your users to be able to access it! So, what’s the easiest way to do this?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show, step by step, how to integrate R code into a R Sweave document
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question