sara_bellum
asked on
python, reading file input by line number fails
I'm trying to copy lines from file A to file B, starting from a point where a station ID is declared and covering the 15 lines of parameters following each station ID. I copy my code below.
The only things that copy (when it works) are the station IDs themselves - the data following each station ID is not copied, and the latest error is: 'invalid syntax for index < 16'
I tried using the csv reader to get this to work, but it also gives me errors, among which are '_csv.writer' object is not iterable
What am I missing??
The only things that copy (when it works) are the station IDs themselves - the data following each station ID is not copied, and the latest error is: 'invalid syntax for index < 16'
I tried using the csv reader to get this to work, but it also gives me errors, among which are '_csv.writer' object is not iterable
What am I missing??
import csv
configFile = open('/home/user/Documents/python/config.ini', 'r')
hash_source = open('/home/user/Documents/python/myhash.csv', 'r')
hash_dest = open('/home/user/Documents/python/myhash2.csv', 'w')
configSrc = csv.reader(configFile)
hashSrc = csv.writer(hash_source)
hashDest = csv.writer(hash_dest)
data = []
#for row in configSrc:
for line in configFile:
if line:
#print 'config file is readable'
if "stn1" in line:
#print 'Found a 1-series station'
stnLines = 16
index = 0
for (index < 16):
data.append(line)
elif "stn 2" in line:
#print 'Found a 2-series station'
stnLines = 16
data.append(line)
else:
#print 'Nothing could be read'
continue
for line in range(stnLines):
while index < 16
data.append(configFile.next())
print data
#hash_dest.write(data)
configFile.close()
hash_source.close()
hash_dest.close()
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
You're right, the result is to print each station name 16 times. I fail to understand why the index doesn't increment so that the next line of text is copied instead - maybe if I try this:
configSrc = csv.reader(configFile) it would work? ( Actually I tried and it doesn't, which I also fail to understand)
The config file format is copied below
configSrc = csv.reader(configFile) it would work? ( Actually I tried and it doesn't, which I also fail to understand)
The config file format is copied below
# code:
for index, line in enumerate(configFile):
if "stn1" in line:
#print 'Found a stn-1 station'
index = 0
while index < 16:
data.append(line)
index = index + 1
# config file:
<lots of lines omitted>
[[stn1-A]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15
[[stn1-B]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15, for 8 stations in series
[[stn2-A]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15
[[stn2-B]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc...to param 15, for 4 stations in series
<lots of lines omitted>
Well, I will try to explain the reason briefly.
for index, line in enumerate(configFile): # here the index is the number of the line
if "stn1" in line:
#print 'Found a stn-1 station'
index = 0 # here the index variable is reused for another purpose
while index < 16:
data.append(line) # the line is constant here, so you collect 16 times the same line including its \n
index = index + 1
Now, try to simplify the script first to process correctly the config.ini. Study the snippet below. In such cases, you should always avoid temptation to decide by counting the lines. You never know how the config.ini will look in future. You should be able to process it even when it changes and report the unexpected situations to help to reimplement (to fix) the script.
The finite automaton now has only 2 states, but refuse the temptation to simplify the status to bool. You should keep it visible that it is finite automaton -- it will be easier to recognize it in the code later, draw the picture, modify it, etc. For the config.ini sample above, it prints in my case:
C:\tmp\___python\sara_bell um>scr0907 20.py
Ignored0: # config file:
Ignored0: <lots of lines omitted>
Param: stn1-A param 1 = int
Param: stn1-A param 2 = float
Param: stn1-A param 3 = string
Param: stn1-A param 4 = list
Ignored1: etc... to param 15
Param: stn1-B param 1 = int
Param: stn1-B param 2 = float
Param: stn1-B param 3 = string
Param: stn1-B param 4 = list
Ignored1: etc... to param 15, for 8 stations in series
Param: stn2-A param 1 = int
Param: stn2-A param 2 = float
Param: stn2-A param 3 = string
Param: stn2-A param 4 = list
Ignored1: etc... to param 15
Param: stn2-B param 1 = int
Param: stn2-B param 2 = float
Param: stn2-B param 3 = string
Param: stn2-B param 4 = list
Ignored1: etc...to param 15, for 4 stations in series
Ignored0: <lots of lines omitted>
Notice it recognized param lines with the remembered station identification. Now, you should say how do you want to process params of the station.
for index, line in enumerate(configFile): # here the index is the number of the line
if "stn1" in line:
#print 'Found a stn-1 station'
index = 0 # here the index variable is reused for another purpose
while index < 16:
data.append(line) # the line is constant here, so you collect 16 times the same line including its \n
index = index + 1
Now, try to simplify the script first to process correctly the config.ini. Study the snippet below. In such cases, you should always avoid temptation to decide by counting the lines. You never know how the config.ini will look in future. You should be able to process it even when it changes and report the unexpected situations to help to reimplement (to fix) the script.
The finite automaton now has only 2 states, but refuse the temptation to simplify the status to bool. You should keep it visible that it is finite automaton -- it will be easier to recognize it in the code later, draw the picture, modify it, etc. For the config.ini sample above, it prints in my case:
C:\tmp\___python\sara_bell
Ignored0: # config file:
Ignored0: <lots of lines omitted>
Param: stn1-A param 1 = int
Param: stn1-A param 2 = float
Param: stn1-A param 3 = string
Param: stn1-A param 4 = list
Ignored1: etc... to param 15
Param: stn1-B param 1 = int
Param: stn1-B param 2 = float
Param: stn1-B param 3 = string
Param: stn1-B param 4 = list
Ignored1: etc... to param 15, for 8 stations in series
Param: stn2-A param 1 = int
Param: stn2-A param 2 = float
Param: stn2-A param 3 = string
Param: stn2-A param 4 = list
Ignored1: etc... to param 15
Param: stn2-B param 1 = int
Param: stn2-B param 2 = float
Param: stn2-B param 3 = string
Param: stn2-B param 4 = list
Ignored1: etc...to param 15, for 4 stations in series
Ignored0: <lots of lines omitted>
Notice it recognized param lines with the remembered station identification. Now, you should say how do you want to process params of the station.
f = open('/home/user/Documents/python/config.ini') # use the wanted config.ini
status = 0 # status of the finite automaton
section = 'init' # auxiliary variable that stores the section name (station)
for line in f:
if status == 0:
if line.startswith('[['):
# Get the name of the section by removing the special
# parentheses. Then jump to another status where item
# lines are expected.
section = line.strip()[2:-2]
status = 1
else:
# Ignore the lines until you detect the section (i.e.
# keep the status.
print 'Ignored0:', line.strip()
elif status == 1: # Expecting param items
if line.startswith('param'):
# Process the parameter of the section -- here only displayed.
print 'Param:', section, line.strip()
elif line.startswith('[['):
# Get the name of the section by removing the special
# parentheses. Then keep the status 1 where item
# lines are expected.
section = line.strip()[2:-2]
else:
# Unknown form of line. Ignore it and jump to the state 0.
print 'Ignored1:', line.strip()
section = 'init'
status = 0
else:
print 'Unknown status:', status, '(i.e. not implemented).'
f.close()
ASKER
Thanks very much, I can appreciate the need not to select line numbers as the primary criterium for file input. But although the parameter names themselves are the same for each station, each of the 15 parameters has a different name, so string matching would be very cumbersome. I guess the script could iterate and copy lines until it gets to the next station - I'll try that when I get to the office, where I have wing IDE software that helps me with the indentation errors.
Finally, I'd like to stop the iteration once the stations are finished but I could use string matching for that - will try that and get back to you.
Finally, I'd like to stop the iteration once the stations are finished but I could use string matching for that - will try that and get back to you.
Yes. This is the case when finite automatons are handy. Here the line is the processed unit of the automaton; therefore, the automaton is fed by lines in the loop. The above code could be drawn on the paper as shown at the left image. Actually, the file is rather simple and the full finite automaton could be collapsed to have the only status -- see the right image. This way the parameters by stations could be collected the way shown at the snippet below.
f = open('config.ini')
status = 0
station = 'dummy'
d = {} # dictionary of stations
for line in f:
line = line.rstrip() # remove the end \n and the whitespaces
if line.startswith('[['):
# Get the name of the section by removing the special
# parentheses. Introduce new subdictionary for the station.
station = line[2:-2]
assert station not in d
d[station] = {}
elif line.startswith('#') or line == '':
# Ignore the comments and empty lines.
print 'Comment:', line
elif line.find('=') > 0:
# Let it be the param line. Split it to key and value.
k, v = line.split('=')
d[station][k.strip()] = v.strip()
else:
# Unrecognized kind of line.
print 'Bad line:', line
f.close()
# Display the collected info.
for station in d:
print station
ds = d[station]
for key in ds:
print '\t' + key + ':\t' + ds[key]
print
finiteAutomaton.png
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Brilliant! It's late so I haven't researched the sort function: I wanted the stations to print in alphanumeric order, but when I tried 'print sorted(d.keys())' it prints only the station names of course.
Then the station parameters, although consistently listed, aren't listed in the same order as in the original config file. I suppose that if I want them to print in the same order, I have to assign them to d[station] = {} like so: d{'param1', 'param2', 'param3'}?
Let me know thanks :)
Then the station parameters, although consistently listed, aren't listed in the same order as in the original config file. I suppose that if I want them to print in the same order, I have to assign them to d[station] = {} like so: d{'param1', 'param2', 'param3'}?
Let me know thanks :)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
When it comes to associating keys and values, I can only count to two so far: since my initial goal was simply to copy an excerpt from a config file and format the data, I'm sticking to that until I can learn more. Thanks for pointing me in the right direction!
I realigned the parameters for each station horizontally so it's easier to read in a spreadsheet - turns out I have 35 stations so this helps! I spent a lot of time trying to figure out why my columns weren't lining up in an excel spreadsheet until I figured out that about a third of the stations had no key/value pair for magnetic declination so I removed that line. The header line should print first but I couldn't figure out how to do that, and somehow the StationID prints out once again at the end of the header line, but those are easy edits to make in Excel.
On to learning dictionaries...
I realigned the parameters for each station horizontally so it's easier to read in a spreadsheet - turns out I have 35 stations so this helps! I spent a lot of time trying to figure out why my columns weren't lining up in an excel spreadsheet until I figured out that about a third of the stations had no key/value pair for magnetic declination so I removed that line. The header line should print first but I couldn't figure out how to do that, and somehow the StationID prints out once again at the end of the header line, but those are easy edits to make in Excel.
On to learning dictionaries...
configFile = open('config.ini', 'r')
destFile = open('output.csv', 'w')
stations = []
params = []
header = []
for line in configFile:
line = line.rstrip()
if '[[' in line:
station = line[4:-2]
header.append('StationID')
destFile.write('\n' + station + '; ')
if line.startswith('#') or line == '':
continue
elif line.find('=') > 0:
if 'MagneticDeclination' in line:
del line
else:
k, v = line.split('=')
params.append((k.strip(),v.strip()))
header.append(k.strip())
if isinstance(v, list):
destFile.write(','.join(item))
else:
destFile.write( v )
destFile.write( '; ' )
else:
continue
destFile.write('\n')
for index, item in enumerate(header):
if index < 17:
destFile.write(item + '; ')
configFile.close()
destFile.close()
Well, thanks for the points. However, it was only the first part of the question. It is time to use the csv module. Attach here a sample of config.ini and the example how it should look in Excel, finally.
ASKER
Wow, ok then! I copy a sample station from the config file below (I ended up removing the non-station data from the config file manually because it had a lot of '=' signs and other formatting that would have required a bunch of arguments to remove those lines, and the station data is grouped together anyway).
I attach the csv file output which I open in Excel, using a semi-colon as the delimiter. Each station has a unique name but some use an alphanumeric sequence as a file prefix and others use a location name. Other sets of stations have subdirectories and these do not, so there's a blank field.
I attach the csv file output which I open in Excel, using a semi-colon as the delimiter. Each station has a unique name but some use an alphanumeric sequence as a file prefix and others use a location name. Other sets of stations have subdirectories and these do not, so there's a blank field.
[[DUS2]]
PakbusID = 145
StationDescriptiveName = Some River Water Level Station
StationLocationDescription = Some River
Latitude = 11 22 33.00
Longitude = -111 22 333.00
ElevationFt = 371
ElevationM = 113
DataFileDirectory =
DataFileNameBase = SOMERIVER_
ImportDataFileSuffixes = Daily.dat,HrlyAtms.dat,HrlyDiag.dat,HrlyRaw.dat,HrlySubs.dat,HrlyWtr.dat,QtrHrWtr.dat,SR50.dat,SR50Q.dat,TwoMinWd.dat,RainEvnt.dat
CurrentConditionsDataFileSuffixes = HrlyAtms.dat,HrlyDiag.dat,HrlyWtr.dat
OutputDir = DUS2
ColumnMap = stn
ColumnOffset = 3
Graphs = Air,Wind,Diagnostics,Water_Level,Water_Level_15min
output-xls.csv
Try the following code for processing your data. Notice also that the 'MagneticDeclination' is also processed. The "wanted" list defined at the beginning says what elements are to be exported and in what order. The 'StationID' is added to the parameters to make their later processing more uniform. Notice also that the parameters are converted to a dictionary (unordered). This way, the code could returned to the earlier example with subdictionaries.
Notice also that the value of a parameter is not obtained via param[key] syntax. The param.get(key, default) is used instead. This way it would work also in cases when the parameter is not defined (say 'MagneticDeclination'). The wanted list is directly used as the header. However, you could also define a fixed transformation (pre-filled dictionary) with more readable texts for the purpose.
Notice also that the value of a parameter is not obtained via param[key] syntax. The param.get(key, default) is used instead. This way it would work also in cases when the parameter is not defined (say 'MagneticDeclination'). The wanted list is directly used as the header. However, you could also define a fixed transformation (pre-filled dictionary) with more readable texts for the purpose.
import csv
f = open('config.ini')
status = 0
station = 'dummy'
wanted = [ 'StationID', 'PakbusID', 'StationDescriptiveName', 'StationLocationDescription',
'Latitude', 'Longitude', 'ElevationFt', 'ElevationM', 'DataFileDirectory', 'DataFileNameBase',
'ImportDataFileSuffixes', 'CurrentConditionsDataFileSuffixes', 'OutputDir', 'ColumnMap',
'ColumnOffset', 'Graphs' ]
d = {} # dictionary of stations
for line in f:
line = line.rstrip() # remove the end \n and the whitespaces
if line.startswith('[['):
# Get the name of the section by removing the special
# parentheses. Introduce the list of the processed parameters
# with StationID repeated as the first one.
station = line[2:-2]
assert station not in d
d[station] = [ ('StationID', station) ]
elif line.startswith('#') or line == '':
# Ignore the comments and empty lines.
print 'Comment:', line
elif line.find('=') > 0:
# Let it be the param line. Split it to key and value.
# Then append the tuple to the list for the station.
k, v = line.split('=')
d[station].append((k.strip(), v.strip()))
else:
# Unrecognized kind of line.
print 'Bad line:', line
f.close()
# Export the collected info.
fout = open('output.csv', 'wb')
writer = csv.writer(fout, dialect='excel', delimiter=';', quoting=csv.QUOTE_ALL)
# Export the header
writer.writerow(wanted)
# For all stations
for station in sorted(d):
param = dict( d[station] ) # dictionary of parameters for the station
row = [] # init the row
for key in wanted:
row.append(param.get(key, '')) # default if unknown
writer.writerow(row)
fout.close()
ASKER
Brilliant!! Thanks very much :)
Next, I tried importing the header from another file with
reader = csv.reader(otherFile)
fieldNames = reader.next()
header = []
for item in fieldNames:
header.append(item)
but then when processing the rows, making lists of files (data file suffixes, graphs) appear as one entry in the Excel spreadsheet didn't work, so that'll be another question for another day :-)
Thanks again for your persistence in helping me get this right.
Next, I tried importing the header from another file with
reader = csv.reader(otherFile)
fieldNames = reader.next()
header = []
for item in fieldNames:
header.append(item)
but then when processing the rows, making lists of files (data file suffixes, graphs) appear as one entry in the Excel spreadsheet didn't work, so that'll be another question for another day :-)
Thanks again for your persistence in helping me get this right.
while index < 16:
data.append(line)
index = index + 1
Another syntax and possibly also the intention bug can be observed on the line 29. There should be "if" instead of "while" and the line must be ended by colon. Also, the indentation must be corrected. On the other hand, the line is never processed even in case it would be syntactically correct...
From another point of view, the solution seems to be rather "unpleasant". It would probably be better if you attached the example of the processed file and wrote what should be done. The solution will probably look somehow different.