Solved

Reading text file in Python gives strange results

Posted on 2016-11-23
7
116 Views
Last Modified: 2016-11-26
I have a simple text file that looks like this (Please ignore all the line numbers in the output.):
YES
NO
YES
YES
NO
...

Open in new window

It's just a bunch of yes's and no's in a straight text file in Windows 7.  However, I've tried reading the file several ways, and I get unexpected results. When I do this:
file1 = list(open(sys.argv[1]))

Open in new window

I get:
['\xff\xfeY\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '
\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\ .......]

Open in new window

and when I do:
f1 = open(sys.argv[1], "r")
lines1 = f1.readlines()
for i in lines1:
  print i.strip()

Open in new window

I get:
 ■Y E S

 Y E S

 N O

 N O

 N O

 N O

 Y E S

 Y E S

 Y E S

 N O

 Y E S

 Y E S

Open in new window

What is going on here and how do I read this as I expect it to be in the file?  


Python 2.7
Windows 7
0
Comment
Question by:ugeb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41900092
basically the following will return a list with the file lines
open("filename.txt").readlines()

Open in new window


your 1st test should work but I think its due to your  encoding
try thins instead
import codecs
f = codecs.open("filename.txt", "r", "cp1252");
list(f);

in your 2nd test you are using strip with no param ! you should strip '\n'  

Cheers
0
 
LVL 11

Author Comment

by:ugeb
ID: 41901959
Unfortunately that didn't fix the problem.

In Emacs it looks like the file contains the proper characters, and I don't see the extraneous spaces and null characters etc.  The file was generated as redirected output from a script run in PowerShell on Windows 7.  When I read in the file and print out every character and it's ascii code, I can see that, for some reason, there are apparently null characters in between the ascii letters.

 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
" ":0,

Open in new window

I don't know how those extra nulls got there, as it was a python file that generated the output as they were simple print statements.  You can see that each character is quoted for identification, followed by it's ascii code.

This is very confusing to me.
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41901973
Can you provide exactly how you generate the output file ?
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 11

Author Comment

by:ugeb
ID: 41902430
The powershell command is:
type .\stdin2.txt | python .\hackerrank2.py > con_output.txt

Open in new window

and the python print statements in hackerrank2 are:
  if balanced_brackets(expression) == True:
    print "YES"
  else:
    print "NO"

Open in new window

0
 
LVL 9

Accepted Solution

by:
Moussa Mokhtari earned 500 total points
ID: 41902644
I think I see the problem when you use > to redirect the output to text file in Powershell it saves it using Unicode and that's why when you attempt to read the file via python you don't get the desired result !
To change the output file encoding you can redirect the output like this

type .\stdin2.txt | python .\hackerrank2.py | Out-File con_output.txt -encoding ASCII

Open in new window


Cheers
0
 
LVL 11

Author Closing Comment

by:ugeb
ID: 41902665
Wow, that's an insidious error.  Thanks!
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41902667
You're welcome
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Less strange, but still introduction This introduction was added (1st August, 2011) to reflect some reactions.  Firstly, the term basics in the title of the article...  As any other word, it is a symbol with meaning attached to the word by some a…
Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question