I have a script that grabs lines from a raw data (csv) file and copies the lines associated with 1-minute intervals to one file, those associated with 60-minute intervals to another, and cleans up the headers (see attached code).
In each of the 24 archive files I need to process, there are 39 headers / parameters followed by the raw data (see attached excerpt). Right now each header parameter is listed sequentially, followed by data in timestamp sequence. What I'm trying to do is put the headers at the top of the file in a single row, and line up the metrics into 39 columns. The timestamps begin in December 07 and go to present, so there are a huge number of lines, 150,000 or more.
I've tried to create a single dictionary list from the headers, but all I've managed to do is print a selected item from the original data by row (39 separate lists with 1 item each). I will need to associate the raw data with the header array, presumably with name-value pairs, but didn't get that far. Let me know what you can do to help, TIA.
fin = open('rawData.csv', 'rb')
source = csv.reader(fin, quoting=csv.QUOTE_NONNUMERIC)
fout_hr = open('hr_sorted.csv', 'wb')
hr_writer = csv.writer(fout_hr, quoting=csv.QUOTE_NONNUMERIC)
fout_min = open('min_sorted.csv', 'wb')
min_writer = csv.writer(fout_min, quoting=csv.QUOTE_NONNUMERIC)
status = 0 # wait for the header
header = None # collecting header lines of the section
writer = None # reference to the wanted output writer
for row in source:
if status == 0: # wait a section header rows
if row == 'Point Name:':
header = [ row[1:] ] # the list with the info from the first header row
status = 1
elif status == 1: # collecting some header rows
if row == '1 Minutes':
writer = min_writer
elif row == '60 Minutes':
writer = hr_writer
elif row == 'Date Range':
pass # Ignore this header row.
elif row == 'Report Timings:':
# Ignore this header row. Do output the collected header.
for r in header:
status = 2 # Get ready for for the value rows.
elif status == 2: # collecting the values
if len(row) == 4: # detect the row with values
writer.writerow(row[0:3]) # output the first three columns only
writer.writerow(['']) # separator
status = 0 # wait for another section
print 'Unexpected status', status