asked on

python, reading file input by line number fails

I'm trying to copy lines from file A to file B, starting from a point where a station ID is declared and covering the 15 lines of parameters following each station ID. I copy my code below.

The only things that copy (when it works) are the station IDs themselves - the data following each station ID is not copied, and the latest error is: 'invalid syntax for index < 16'

I tried using the csv reader to get this to work, but it also gives me errors, among which are '_csv.writer' object is not iterable

What am I missing??

import csv
 
configFile = open('/home/user/Documents/python/config.ini', 'r')
hash_source = open('/home/user/Documents/python/myhash.csv', 'r')
hash_dest = open('/home/user/Documents/python/myhash2.csv', 'w')
 
configSrc = csv.reader(configFile)
hashSrc = csv.writer(hash_source)
hashDest = csv.writer(hash_dest)
data = []
#for row in configSrc:
for line in configFile:
    if line:
	#print 'config file is readable'
    	if "stn1" in line:
		#print 'Found a 1-series station'
		stnLines = 16	
		index = 0
		for (index < 16):	
			data.append(line)
    	elif "stn 2" in line:
		#print 'Found a 2-series station'
		stnLines = 16	
		data.append(line)
    	else: 
		#print 'Nothing could be read'
		continue
    		for line in range(stnLines):
       			while index < 16
        		data.append(configFile.next())
 
print data
#hash_dest.write(data)
 
configFile.close()
hash_source.close()
hash_dest.close()

Open in new window

pepr

The error is probably related to the line 19. It really has invalid syntax in Python. But you must also increase the index somewhere. Mechanically, you could repair it this way:

while index < 16:
data.append(line)
index = index + 1

Another syntax and possibly also the intention bug can be observed on the line 29. There should be "if" instead of "while" and the line must be ended by colon. Also, the indentation must be corrected. On the other hand, the line is never processed even in case it would be syntactically correct...

From another point of view, the solution seems to be rather "unpleasant". It would probably be better if you attached the example of the processed file and wrote what should be done. The solution will probably look somehow different.

SOLUTION

pepr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sara_bellum

ASKER

You're right, the result is to print each station name 16 times. I fail to understand why the index doesn't increment so that the next line of text is copied instead - maybe if I try this:
configSrc = csv.reader(configFile) it would work? ( Actually I tried and it doesn't, which I also fail to understand)

The config file format is copied below

# code:
 
for index, line in enumerate(configFile):
if "stn1" in line:
	#print 'Found a stn-1 station'
	index = 0
	while index < 16:	
            data.append(line)
            index = index + 1
 
# config file:
<lots of lines omitted>
[[stn1-A]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15
[[stn1-B]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15, for 8 stations in series
[[stn2-A]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc... to param 15
[[stn2-B]]
param 1 = int
param 2 = float
param 3 = string
param 4 = list
etc...to param 15, for 4 stations in series
<lots of lines omitted>

Open in new window

pepr

Well, I will try to explain the reason briefly.

for index, line in enumerate(configFile): # here the index is the number of the line
if "stn1" in line:
#print 'Found a stn-1 station'
index = 0 # here the index variable is reused for another purpose
while index < 16:
data.append(line) # the line is constant here, so you collect 16 times the same line including its \n
index = index + 1

Now, try to simplify the script first to process correctly the config.ini. Study the snippet below. In such cases, you should always avoid temptation to decide by counting the lines. You never know how the config.ini will look in future. You should be able to process it even when it changes and report the unexpected situations to help to reimplement (to fix) the script.

The finite automaton now has only 2 states, but refuse the temptation to simplify the status to bool. You should keep it visible that it is finite automaton -- it will be easier to recognize it in the code later, draw the picture, modify it, etc. For the config.ini sample above, it prints in my case:

C:\tmp\___python\sara_bellum>scr090720.py
Ignored0: # config file:
Ignored0: <lots of lines omitted>
Param: stn1-A param 1 = int
Param: stn1-A param 2 = float
Param: stn1-A param 3 = string
Param: stn1-A param 4 = list
Ignored1: etc... to param 15
Param: stn1-B param 1 = int
Param: stn1-B param 2 = float
Param: stn1-B param 3 = string
Param: stn1-B param 4 = list
Ignored1: etc... to param 15, for 8 stations in series
Param: stn2-A param 1 = int
Param: stn2-A param 2 = float
Param: stn2-A param 3 = string
Param: stn2-A param 4 = list
Ignored1: etc... to param 15
Param: stn2-B param 1 = int
Param: stn2-B param 2 = float
Param: stn2-B param 3 = string
Param: stn2-B param 4 = list
Ignored1: etc...to param 15, for 4 stations in series
Ignored0: <lots of lines omitted>

Notice it recognized param lines with the remembered station identification. Now, you should say how do you want to process params of the station.

f = open('/home/user/Documents/python/config.ini') # use the wanted config.ini
 
status = 0            # status of the finite automaton
section = 'init'      # auxiliary variable that stores the section name (station)
 
for line in f:
    if status == 0:   
        if line.startswith('[['):  
            # Get the name of the section by removing the special 
            # parentheses. Then jump to another status where item
            # lines are expected.
            section = line.strip()[2:-2]
            status = 1
            
        else:    
            # Ignore the lines until you detect the section (i.e. 
            # keep the status.
            print 'Ignored0:', line.strip()
        
    elif status == 1:  # Expecting param items
        if line.startswith('param'):
            # Process the parameter of the section -- here only displayed.
            print 'Param:', section, line.strip()
            
        elif line.startswith('[['):  
            # Get the name of the section by removing the special 
            # parentheses. Then keep the status 1 where item
            # lines are expected.
            section = line.strip()[2:-2]
            
        else:
            # Unknown form of line. Ignore it and jump to the state 0.
            print 'Ignored1:', line.strip()
            section = 'init'
            status = 0
            
    else:
        print 'Unknown status:', status, '(i.e. not implemented).'
 
f.close()

Open in new window

sara_bellum

ASKER

Thanks very much, I can appreciate the need not to select line numbers as the primary criterium for file input. But although the parameter names themselves are the same for each station, each of the 15 parameters has a different name, so string matching would be very cumbersome. I guess the script could iterate and copy lines until it gets to the next station - I'll try that when I get to the office, where I have wing IDE software that helps me with the indentation errors.

Finally, I'd like to stop the iteration once the stations are finished but I could use string matching for that - will try that and get back to you.

pepr

Yes. This is the case when finite automatons are handy. Here the line is the processed unit of the automaton; therefore, the automaton is fed by lines in the loop. The above code could be drawn on the paper as shown at the left image. Actually, the file is rather simple and the full finite automaton could be collapsed to have the only status -- see the right image. This way the parameters by stations could be collected the way shown at the snippet below.

f = open('config.ini')
 
status = 0
station = 'dummy'
 
d = {}  # dictionary of stations
 
for line in f:
    line = line.rstrip()   # remove the end \n and the whitespaces
    
    if line.startswith('[['):  
        # Get the name of the section by removing the special 
        # parentheses. Introduce new subdictionary for the station.
        station = line[2:-2]
        assert station not in d
        d[station] = {}
        
    elif line.startswith('#') or line == '':    
        # Ignore the comments and empty lines.
        print 'Comment:', line
    
    elif line.find('=') > 0:
        # Let it be the param line. Split it to key and value.
        k, v = line.split('=')
        d[station][k.strip()] = v.strip()
        
    else:
        # Unrecognized kind of line.
        print 'Bad line:', line
        
 
f.close()
 
# Display the collected info.
for station in d:
    print station
    ds = d[station]
    for key in ds:
        print '\t' + key + ':\t' + ds[key]
        
    print

Open in new window

finiteAutomaton.png

SOLUTION

pepr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sara_bellum

ASKER

Brilliant! It's late so I haven't researched the sort function: I wanted the stations to print in alphanumeric order, but when I tried 'print sorted(d.keys())' it prints only the station names of course.

Then the station parameters, although consistently listed, aren't listed in the same order as in the original config file. I suppose that if I want them to print in the same order, I have to assign them to d[station] = {} like so: d{'param1', 'param2', 'param3'}?

Let me know thanks :)

ASKER CERTIFIED SOLUTION

pepr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sara_bellum

ASKER

When it comes to associating keys and values, I can only count to two so far: since my initial goal was simply to copy an excerpt from a config file and format the data, I'm sticking to that until I can learn more. Thanks for pointing me in the right direction!

I realigned the parameters for each station horizontally so it's easier to read in a spreadsheet - turns out I have 35 stations so this helps! I spent a lot of time trying to figure out why my columns weren't lining up in an excel spreadsheet until I figured out that about a third of the stations had no key/value pair for magnetic declination so I removed that line. The header line should print first but I couldn't figure out how to do that, and somehow the StationID prints out once again at the end of the header line, but those are easy edits to make in Excel.

On to learning dictionaries...

configFile = open('config.ini', 'r')
destFile  = open('output.csv', 'w')
 
stations = []
params = []
header = []
 
for line in configFile:
    line = line.rstrip()   
    if '[[' in line:
        station = line[4:-2]
        header.append('StationID')
        destFile.write('\n' + station + '; ')
 
    if line.startswith('#') or line == '':
        continue
 
    elif line.find('=') > 0:
        if 'MagneticDeclination' in line:
            del line
        else:
            k, v = line.split('=')
            params.append((k.strip(),v.strip()))
            header.append(k.strip())
            if isinstance(v, list):
                destFile.write(','.join(item))
            else:
                destFile.write( v )
                destFile.write( '; ' )
 
    else:
        continue
 
destFile.write('\n')
 
for index, item in enumerate(header):
    if index < 17:
        destFile.write(item + '; ')
 
configFile.close()
destFile.close()

Open in new window

pepr

Well, thanks for the points. However, it was only the first part of the question. It is time to use the csv module. Attach here a sample of config.ini and the example how it should look in Excel, finally.

sara_bellum

ASKER

Wow, ok then! I copy a sample station from the config file below (I ended up removing the non-station data from the config file manually because it had a lot of '=' signs and other formatting that would have required a bunch of arguments to remove those lines, and the station data is grouped together anyway).

I attach the csv file output which I open in Excel, using a semi-colon as the delimiter. Each station has a unique name but some use an alphanumeric sequence as a file prefix and others use a location name. Other sets of stations have subdirectories and these do not, so there's a blank field.

[[DUS2]]
  PakbusID = 145
  StationDescriptiveName = Some River Water Level Station
  StationLocationDescription = Some River
  Latitude = 11 22 33.00
  Longitude = -111 22 333.00
  ElevationFt = 371
  ElevationM = 113
  DataFileDirectory =
  DataFileNameBase = SOMERIVER_
  ImportDataFileSuffixes = Daily.dat,HrlyAtms.dat,HrlyDiag.dat,HrlyRaw.dat,HrlySubs.dat,HrlyWtr.dat,QtrHrWtr.dat,SR50.dat,SR50Q.dat,TwoMinWd.dat,RainEvnt.dat
  CurrentConditionsDataFileSuffixes = HrlyAtms.dat,HrlyDiag.dat,HrlyWtr.dat
  OutputDir = DUS2
  ColumnMap = stn
  ColumnOffset = 3
  Graphs = Air,Wind,Diagnostics,Water_Level,Water_Level_15min

Open in new window

output-xls.csv

pepr

Try the following code for processing your data. Notice also that the 'MagneticDeclination' is also processed. The "wanted" list defined at the beginning says what elements are to be exported and in what order. The 'StationID' is added to the parameters to make their later processing more uniform. Notice also that the parameters are converted to a dictionary (unordered). This way, the code could returned to the earlier example with subdictionaries.

Notice also that the value of a parameter is not obtained via param[key] syntax. The param.get(key, default) is used instead. This way it would work also in cases when the parameter is not defined (say 'MagneticDeclination'). The wanted list is directly used as the header. However, you could also define a fixed transformation (pre-filled dictionary) with more readable texts for the purpose.

import csv
 
f = open('config.ini')
 
status = 0
station = 'dummy'
 
wanted = [ 'StationID', 'PakbusID', 'StationDescriptiveName', 'StationLocationDescription',
           'Latitude', 'Longitude', 'ElevationFt', 'ElevationM', 'DataFileDirectory', 'DataFileNameBase',
           'ImportDataFileSuffixes', 'CurrentConditionsDataFileSuffixes', 'OutputDir', 'ColumnMap',
           'ColumnOffset', 'Graphs' ]
 
d = {}  # dictionary of stations
 
for line in f:
    line = line.rstrip()   # remove the end \n and the whitespaces
    
    if line.startswith('[['):  
        # Get the name of the section by removing the special 
        # parentheses. Introduce the list of the processed parameters
        # with StationID repeated as the first one.
        station = line[2:-2]
        assert station not in d
        d[station] = [ ('StationID', station) ]
        
    elif line.startswith('#') or line == '':    
        # Ignore the comments and empty lines.
        print 'Comment:', line
    
    elif line.find('=') > 0:
        # Let it be the param line. Split it to key and value.
        # Then append the tuple to the list for the station.
        k, v = line.split('=')
        d[station].append((k.strip(), v.strip()))
        
    else:
        # Unrecognized kind of line.
        print 'Bad line:', line
        
f.close()
 
 
# Export the collected info.
fout = open('output.csv', 'wb')
writer = csv.writer(fout, dialect='excel', delimiter=';', quoting=csv.QUOTE_ALL)
 
# Export the header
writer.writerow(wanted)
 
# For all stations
for station in sorted(d):
    param = dict( d[station] )  # dictionary of parameters for the station
    row = []                    # init the row
    for key in wanted:
        row.append(param.get(key, ''))  # default if unknown
       
    writer.writerow(row)
    
fout.close()

Open in new window

sara_bellum

ASKER

Brilliant!! Thanks very much :)

Next, I tried importing the header from another file with
reader = csv.reader(otherFile)
fieldNames = reader.next()
header = []
for item in fieldNames:
header.append(item)

but then when processing the rows, making lists of files (data file suffixes, graphs) appear as one entry in the Excel spreadsheet didn't work, so that'll be another question for another day :-)

Thanks again for your persistence in helping me get this right.