[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Reading text file in Python gives strange results

Posted on 2016-11-23
7
Medium Priority
?
162 Views
Last Modified: 2016-11-26
I have a simple text file that looks like this (Please ignore all the line numbers in the output.):
YES
NO
YES
YES
NO
...

Open in new window

It's just a bunch of yes's and no's in a straight text file in Windows 7.  However, I've tried reading the file several ways, and I get unexpected results. When I do this:
file1 = list(open(sys.argv[1]))

Open in new window

I get:
['\xff\xfeY\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '
\x00N\x00O\x00\r\x00\n', '\x00N\x00O\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\x00E\x00S\x00\r\x00\n', '\x00Y\ .......]

Open in new window

and when I do:
f1 = open(sys.argv[1], "r")
lines1 = f1.readlines()
for i in lines1:
  print i.strip()

Open in new window

I get:
 ■Y E S

 Y E S

 N O

 N O

 N O

 N O

 Y E S

 Y E S

 Y E S

 N O

 Y E S

 Y E S

Open in new window

What is going on here and how do I read this as I expect it to be in the file?  


Python 2.7
Windows 7
0
Comment
Question by:ugeb
  • 4
  • 3
7 Comments
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41900092
basically the following will return a list with the file lines
open("filename.txt").readlines()

Open in new window


your 1st test should work but I think its due to your  encoding
try thins instead
import codecs
f = codecs.open("filename.txt", "r", "cp1252");
list(f);

in your 2nd test you are using strip with no param ! you should strip '\n'  

Cheers
0
 
LVL 11

Author Comment

by:ugeb
ID: 41901959
Unfortunately that didn't fix the problem.

In Emacs it looks like the file contains the proper characters, and I don't see the extraneous spaces and null characters etc.  The file was generated as redirected output from a script run in PowerShell on Windows 7.  When I read in the file and print out every character and it's ascii code, I can see that, for some reason, there are apparently null characters in between the ascii letters.

 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
 :13, " ":0,  Y E S , "E":69, " ":0, "S":83, " ":0, "
" ":0,

Open in new window

I don't know how those extra nulls got there, as it was a python file that generated the output as they were simple print statements.  You can see that each character is quoted for identification, followed by it's ascii code.

This is very confusing to me.
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41901973
Can you provide exactly how you generate the output file ?
0
Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

 
LVL 11

Author Comment

by:ugeb
ID: 41902430
The powershell command is:
type .\stdin2.txt | python .\hackerrank2.py > con_output.txt

Open in new window

and the python print statements in hackerrank2 are:
  if balanced_brackets(expression) == True:
    print "YES"
  else:
    print "NO"

Open in new window

0
 
LVL 9

Accepted Solution

by:
Moussa Mokhtari earned 2000 total points
ID: 41902644
I think I see the problem when you use > to redirect the output to text file in Powershell it saves it using Unicode and that's why when you attempt to read the file via python you don't get the desired result !
To change the output file encoding you can redirect the output like this

type .\stdin2.txt | python .\hackerrank2.py | Out-File con_output.txt -encoding ASCII

Open in new window


Cheers
0
 
LVL 11

Author Closing Comment

by:ugeb
ID: 41902665
Wow, that's an insidious error.  Thanks!
0
 
LVL 9

Expert Comment

by:Moussa Mokhtari
ID: 41902667
You're welcome
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The really strange introduction Once upon a time there were individuals who intentionally put the grass seeds to the soil with anticipation of solving their nutrition problems. Or they maybe only played with seeds and noticed what happened... Som…
Variable is a place holder or reserved memory locations to store any value. Which means whenever we create a variable, indirectly we are reserving some space in the memory. The interpreter assigns or allocates some space in the memory based on the d…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
Suggested Courses
Course of the Month19 days, 11 hours left to enroll

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question