Solved

Reading text file in Python gives strange results

Posted on 2016-11-23
7
92 Views
Last Modified: 2016-11-26
I have a simple text file that looks like this (Please ignore all the line numbers in the output.):
YES
NO
YES
YES
NO
...

Open in new window

It's just a bunch of yes's and no's in a straight text file in Windows 7.  However, I've tried reading the file several ways, and I get unexpected results. When I do this:
file1 = list(open(sys.argv[1]))

Open in new window

I get:
['\xff\xfeY\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '
\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\ .......]

Open in new window

and when I do:
f1 = open(sys.argv[1], "r")
lines1 = f1.readlines()
for i in lines1:
  print i.strip()

Open in new window

I get:
 ■Y E S

 Y E S

 N O

 N O

 N O

 N O

 Y E S

 Y E S

 Y E S

 N O

 Y E S

 Y E S

Open in new window

What is going on here and how do I read this as I expect it to be in the file?  


Python 2.7
Windows 7
0
Comment
Question by:ugeb
  • 4
  • 3
7 Comments
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41900092
basically the following will return a list with the file lines
open("filename.txt").readlines()

Open in new window


your 1st test should work but I think its due to your  encoding
try thins instead
import codecs
f = codecs.open("filename.txt", "r", "cp1252");
list(f);

in your 2nd test you are using strip with no param ! you should strip '\n'  

Cheers
0
 
LVL 11

Author Comment

by:ugeb
ID: 41901959
Unfortunately that didn't fix the problem.

In Emacs it looks like the file contains the proper characters, and I don't see the extraneous spaces and null characters etc.  The file was generated as redirected output from a script run in PowerShell on Windows 7.  When I read in the file and print out every character and it's ascii code, I can see that, for some reason, there are apparently null characters in between the ascii letters.

 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
" ":0,

Open in new window

I don't know how those extra nulls got there, as it was a python file that generated the output as they were simple print statements.  You can see that each character is quoted for identification, followed by it's ascii code.

This is very confusing to me.
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41901973
Can you provide exactly how you generate the output file ?
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 11

Author Comment

by:ugeb
ID: 41902430
The powershell command is:
type .\stdin2.txt | python .\hackerrank2.py > con_output.txt

Open in new window

and the python print statements in hackerrank2 are:
  if balanced_brackets(expression) == True:
    print "YES"
  else:
    print "NO"

Open in new window

0
 
LVL 9

Accepted Solution

by:
Moussa Mokhtari earned 500 total points
ID: 41902644
I think I see the problem when you use > to redirect the output to text file in Powershell it saves it using Unicode and that's why when you attempt to read the file via python you don't get the desired result !
To change the output file encoding you can redirect the output like this

type .\stdin2.txt | python .\hackerrank2.py | Out-File con_output.txt -encoding ASCII

Open in new window


Cheers
0
 
LVL 11

Author Closing Comment

by:ugeb
ID: 41902665
Wow, that's an insidious error.  Thanks!
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41902667
You're welcome
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

"The time has come," the Walrus said, "To talk of many things: Of sets--and lists--and dictionaries-- Of variable kinks-- And why you see it changing not-- And why so strange are strings." This part describes how variables and references (see …
This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question