Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Python 2.7 - French characters

Posted on 2016-09-17
6
Medium Priority
?
340 Views
Last Modified: 2016-10-02
Hi there,

1.html contains French characters (éèÉç etc...)

In Python 2.7, I need to print the file content with the proper French characters.

Thanks for your help,
Rene

f = open('1.html', 'r')
file_contents = f.read()
print (file_contents)
f.close()

Open in new window

0
Comment
Question by:ReneGe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 29

Accepted Solution

by:
pepr earned 2000 total points
ID: 41803395
In Python 2.x, the open() function returns the open file object that pretends to be the one that returns a text file content. Actually, it does not work with any encoding, and it returns streams of bytes in a string variable. Actually, Python string object is a string of bytes. The only thing to help you reliably with national alphabet are unicode strings (the u'prefixed string literals' and the like converted strings of bytes.

In Python 2.x you can use codecs.open() function of the standard codecs module. It differs from the open() by the encoding arguments that tells how the bytes from the file should be converted to the unicode string.

When the unicode string is printed to console, it is likely to be converte to the correct encoding.

In Python 3.x, the string type is actually what the u'string' is in Python 2, and the open() is what codecs.open() was in Python 2.

import codecs
with codecs.open('1.html', 'r', encoding='utf-8') as f:
    content = f.read()
    print content

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41803408
Hi pepr,

Thanks for your prompt response, explanation, and code.

I tried your code and this is what I got.

Traceback (most recent call last):
  File "1.py", line 3, in <module>
    content = f.read()
  File "C:\Python27\lib\codecs.py", line 674, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 480, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xab in position 1246: invalid start byte


Cheers
0
 
LVL 29

Assisted Solution

by:pepr
pepr earned 2000 total points
ID: 41803942
If your file uses a different encoding, you have to pass that encoding, not UTF-8. If the file was generated on Windows, then you probably should use 'cp1252' instead. If it was stored on a Unix-based system, it can be ISO-8859-15.

If the HTML was constructed properly, you can find the encoding at the beginning, in the head section.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 29

Expert Comment

by:pepr
ID: 41825667
Hi ReneGe. Have you found a solution?
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41825669
Hi pepr,

Sorry for taking so long to reply.

cp1252 worked :)

Thank you so much for your help :)

Cheers mate!
1
 
LVL 10

Author Comment

by:ReneGe
ID: 41825670
Thanks
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Having just graduated from college and entered the workforce, I don’t find myself always using the tools and programs I grew accustomed to over the past four years. However, there is one program I continually find myself reverting back to…R.   So …
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question