Solved

Python 2.7 - French characters

Posted on 2016-09-17
6
273 Views
Last Modified: 2016-10-02
Hi there,

1.html contains French characters (éèÉç etc...)

In Python 2.7, I need to print the file content with the proper French characters.

Thanks for your help,
Rene

f = open('1.html', 'r')
file_contents = f.read()
print (file_contents)
f.close()

Open in new window

0
Comment
Question by:ReneGe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 29

Accepted Solution

by:
pepr earned 500 total points
ID: 41803395
In Python 2.x, the open() function returns the open file object that pretends to be the one that returns a text file content. Actually, it does not work with any encoding, and it returns streams of bytes in a string variable. Actually, Python string object is a string of bytes. The only thing to help you reliably with national alphabet are unicode strings (the u'prefixed string literals' and the like converted strings of bytes.

In Python 2.x you can use codecs.open() function of the standard codecs module. It differs from the open() by the encoding arguments that tells how the bytes from the file should be converted to the unicode string.

When the unicode string is printed to console, it is likely to be converte to the correct encoding.

In Python 3.x, the string type is actually what the u'string' is in Python 2, and the open() is what codecs.open() was in Python 2.

import codecs
with codecs.open('1.html', 'r', encoding='utf-8') as f:
    content = f.read()
    print content

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41803408
Hi pepr,

Thanks for your prompt response, explanation, and code.

I tried your code and this is what I got.

Traceback (most recent call last):
  File "1.py", line 3, in <module>
    content = f.read()
  File "C:\Python27\lib\codecs.py", line 674, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 480, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xab in position 1246: invalid start byte


Cheers
0
 
LVL 29

Assisted Solution

by:pepr
pepr earned 500 total points
ID: 41803942
If your file uses a different encoding, you have to pass that encoding, not UTF-8. If the file was generated on Windows, then you probably should use 'cp1252' instead. If it was stored on a Unix-based system, it can be ISO-8859-15.

If the HTML was constructed properly, you can find the encoding at the beginning, in the head section.
0
[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

 
LVL 29

Expert Comment

by:pepr
ID: 41825667
Hi ReneGe. Have you found a solution?
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41825669
Hi pepr,

Sorry for taking so long to reply.

cp1252 worked :)

Thank you so much for your help :)

Cheers mate!
1
 
LVL 10

Author Comment

by:ReneGe
ID: 41825670
Thanks
0

Featured Post

Get HTML5 Certified

Want to be a web developer? You'll need to know HTML. Prepare for HTML5 certification by enrolling in July's Course of the Month! It's free for Premium Members, Team Accounts, and Qualified Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
A quick Powershell script I wrote to find old program installations and check versions of a specific file across the network.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question