Reading text file in Python gives strange results

I have a simple text file that looks like this (Please ignore all the line numbers in the output.):
YES
NO
YES
YES
NO
...

Open in new window

It's just a bunch of yes's and no's in a straight text file in Windows 7.  However, I've tried reading the file several ways, and I get unexpected results. When I do this:
file1 = list(open(sys.argv[1]))

Open in new window

I get:
['\xff\xfeY\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '
\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\ .......]

Open in new window

and when I do:
f1 = open(sys.argv[1], "r")
lines1 = f1.readlines()
for i in lines1:
  print i.strip()

Open in new window

I get:
 ■Y E S

 Y E S

 N O

 N O

 N O

 N O

 Y E S

 Y E S

 Y E S

 N O

 Y E S

 Y E S

Open in new window

What is going on here and how do I read this as I expect it to be in the file?  


Python 2.7
Windows 7
LVL 11
ugebAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Moussa MokhtariConnect With a Mentor EnterpreneurCommented:
I think I see the problem when you use > to redirect the output to text file in Powershell it saves it using Unicode and that's why when you attempt to read the file via python you don't get the desired result !
To change the output file encoding you can redirect the output like this

type .\stdin2.txt | python .\hackerrank2.py | Out-File con_output.txt -encoding ASCII

Open in new window


Cheers
0
 
Moussa MokhtariEnterpreneurCommented:
basically the following will return a list with the file lines
open("filename.txt").readlines()

Open in new window


your 1st test should work but I think its due to your  encoding
try thins instead
import codecs
f = codecs.open("filename.txt", "r", "cp1252");
list(f);

in your 2nd test you are using strip with no param ! you should strip '\n'  

Cheers
0
 
ugebAuthor Commented:
Unfortunately that didn't fix the problem.

In Emacs it looks like the file contains the proper characters, and I don't see the extraneous spaces and null characters etc.  The file was generated as redirected output from a script run in PowerShell on Windows 7.  When I read in the file and print out every character and it's ascii code, I can see that, for some reason, there are apparently null characters in between the ascii letters.

 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
" ":0,

Open in new window

I don't know how those extra nulls got there, as it was a python file that generated the output as they were simple print statements.  You can see that each character is quoted for identification, followed by it's ascii code.

This is very confusing to me.
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
Moussa MokhtariEnterpreneurCommented:
Can you provide exactly how you generate the output file ?
0
 
ugebAuthor Commented:
The powershell command is:
type .\stdin2.txt | python .\hackerrank2.py > con_output.txt

Open in new window

and the python print statements in hackerrank2 are:
  if balanced_brackets(expression) == True:
    print "YES"
  else:
    print "NO"

Open in new window

0
 
ugebAuthor Commented:
Wow, that's an insidious error.  Thanks!
0
 
Moussa MokhtariEnterpreneurCommented:
You're welcome
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.