Solved

Python 2.7 - French characters

Posted on 2016-09-17
6
46 Views
Last Modified: 2016-10-02
Hi there,

1.html contains French characters (éèÉç etc...)

In Python 2.7, I need to print the file content with the proper French characters.

Thanks for your help,
Rene

f = open('1.html', 'r')
file_contents = f.read()
print (file_contents)
f.close()

Open in new window

0
Comment
Question by:ReneGe
  • 3
  • 3
6 Comments
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 41803395
In Python 2.x, the open() function returns the open file object that pretends to be the one that returns a text file content. Actually, it does not work with any encoding, and it returns streams of bytes in a string variable. Actually, Python string object is a string of bytes. The only thing to help you reliably with national alphabet are unicode strings (the u'prefixed string literals' and the like converted strings of bytes.

In Python 2.x you can use codecs.open() function of the standard codecs module. It differs from the open() by the encoding arguments that tells how the bytes from the file should be converted to the unicode string.

When the unicode string is printed to console, it is likely to be converte to the correct encoding.

In Python 3.x, the string type is actually what the u'string' is in Python 2, and the open() is what codecs.open() was in Python 2.

import codecs
with codecs.open('1.html', 'r', encoding='utf-8') as f:
    content = f.read()
    print content

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 41803408
Hi pepr,

Thanks for your prompt response, explanation, and code.

I tried your code and this is what I got.

Traceback (most recent call last):
  File "1.py", line 3, in <module>
    content = f.read()
  File "C:\Python27\lib\codecs.py", line 674, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 480, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xab in position 1246: invalid start byte


Cheers
0
 
LVL 28

Assisted Solution

by:pepr
pepr earned 500 total points
ID: 41803942
If your file uses a different encoding, you have to pass that encoding, not UTF-8. If the file was generated on Windows, then you probably should use 'cp1252' instead. If it was stored on a Unix-based system, it can be ISO-8859-15.

If the HTML was constructed properly, you can find the encoding at the beginning, in the head section.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 28

Expert Comment

by:pepr
ID: 41825667
Hi ReneGe. Have you found a solution?
0
 
LVL 10

Author Comment

by:ReneGe
ID: 41825669
Hi pepr,

Sorry for taking so long to reply.

cp1252 worked :)

Thank you so much for your help :)

Cheers mate!
1
 
LVL 10

Author Comment

by:ReneGe
ID: 41825670
Thanks
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now