Solved

Reading text file in Python gives strange results

Posted on 2016-11-23
7
83 Views
Last Modified: 2016-11-26
I have a simple text file that looks like this (Please ignore all the line numbers in the output.):
YES
NO
YES
YES
NO
...

Open in new window

It's just a bunch of yes's and no's in a straight text file in Windows 7.  However, I've tried reading the file several ways, and I get unexpected results. When I do this:
file1 = list(open(sys.argv[1]))

Open in new window

I get:
['\xff\xfeY\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '
\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\ .......]

Open in new window

and when I do:
f1 = open(sys.argv[1], "r")
lines1 = f1.readlines()
for i in lines1:
  print i.strip()

Open in new window

I get:
 ■Y E S

 Y E S

 N O

 N O

 N O

 N O

 Y E S

 Y E S

 Y E S

 N O

 Y E S

 Y E S

Open in new window

What is going on here and how do I read this as I expect it to be in the file?  


Python 2.7
Windows 7
0
Comment
Question by:ugeb
  • 4
  • 3
7 Comments
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41900092
basically the following will return a list with the file lines
open("filename.txt").readlines()

Open in new window


your 1st test should work but I think its due to your  encoding
try thins instead
import codecs
f = codecs.open("filename.txt", "r", "cp1252");
list(f);

in your 2nd test you are using strip with no param ! you should strip '\n'  

Cheers
0
 
LVL 11

Author Comment

by:ugeb
ID: 41901959
Unfortunately that didn't fix the problem.

In Emacs it looks like the file contains the proper characters, and I don't see the extraneous spaces and null characters etc.  The file was generated as redirected output from a script run in PowerShell on Windows 7.  When I read in the file and print out every character and it's ascii code, I can see that, for some reason, there are apparently null characters in between the ascii letters.

 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
" ":0,

Open in new window

I don't know how those extra nulls got there, as it was a python file that generated the output as they were simple print statements.  You can see that each character is quoted for identification, followed by it's ascii code.

This is very confusing to me.
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41901973
Can you provide exactly how you generate the output file ?
0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 
LVL 11

Author Comment

by:ugeb
ID: 41902430
The powershell command is:
type .\stdin2.txt | python .\hackerrank2.py > con_output.txt

Open in new window

and the python print statements in hackerrank2 are:
  if balanced_brackets(expression) == True:
    print "YES"
  else:
    print "NO"

Open in new window

0
 
LVL 9

Accepted Solution

by:
Moussa Mokhtari earned 500 total points
ID: 41902644
I think I see the problem when you use > to redirect the output to text file in Powershell it saves it using Unicode and that's why when you attempt to read the file via python you don't get the desired result !
To change the output file encoding you can redirect the output like this

type .\stdin2.txt | python .\hackerrank2.py | Out-File con_output.txt -encoding ASCII

Open in new window


Cheers
0
 
LVL 11

Author Closing Comment

by:ugeb
ID: 41902665
Wow, that's an insidious error.  Thanks!
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41902667
You're welcome
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Python 2.7 windows 7 Looping through and executing SQL files in a directory 11 88
Need a python script 5 94
Macports Import Problem 4 59
Python Regex Problem 24 134
Variable is a place holder or reserved memory locations to store any value. Which means whenever we create a variable, indirectly we are reserving some space in the memory. The interpreter assigns or allocates some space in the memory based on the d…
This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now