This is a variation on my previous posted question, since the actual input file format is slightly
different from what we had initially anticipated, and the required output is also slightly different.
We receive a flat text file that we need to convert to a "\" delimited csv format.
The "records" in the file have some fields that are always at the same # line
in the "record" and some that are not.
The "record" always starts with a date, and there are no blank lines in the file.
So a "record" in this flat file looks like this:
Date ... always the 1st line of the "record"
Author ... always the 2nd line of the "record"
Title ... always the 3rd line of the "record"
Descript1 ... first line of description is always 4th line of "record"
.
. ... any number of description strings;
. but there must be at least one description line
.
DescriptNN
Required Reading ... fixed string "Required Reading" always in the "record"
<filenameA1>CLASS ... may or may not be present; if present, it will be terminated by "CLASS"
<filenameA2>LAB ... may of may not be present; if present, it will be terminated by "LAB"
Optional ... fixed string "Optional" always in the "record"
<filenameB1>CLASS ... may or may not be present; if present, it will be terminated by "CLASS"
<filenameB2>LAB ... may of may not be present; if present, it will be terminated by "LAB"
Sample input file is:
12/30/07
Dr. J. Smith
Working with Genes
A short overview of Basic DNA.
Required prerequisite courses are mandatory.
Required Reading
<basic-dna.txt>LAB
Optional
<basic-dna.doc>CLASS
<basic-dna.pdf>LAB
1/10/08
Dr. J. Smith
Advanced Gene Therapy
This course requires knowledge of basic dna structures.
Also needed are pre-reqs: adv-bio, adv-physics.
Students are expected to attend all labs.
Required Reading
<Advanced-gene.txt>CLASS
<Advanced-bio.txt>LAB
Optional
<advanced-gene2.doc>CLASS
2/15/08
Dr. J. Smith
Preq Bio
Prequsite for other courses.
Required Reading
Optional
<prereq-bio.pdf>CLASS
The desired output is: (each line starts with date; lines are not wrapped in output file)
12/30/07\Dr. J. Smith\Working with Genes\A short overview of Basic DNA\Required prerequisite courses are mandatory.\Must1=LAB=basic
-dna.txt\O
ptional1=C
LASS=basic
-dna.doc\O
ptional2=L
AB=basic-d
na.pdf
1/10/08\Dr. J. Smith\Advanced Gene Therapy\This course requires knowledge of basic dna structures.\Also needed are pre-reqs: adv-bio, adv-physics.\Students are expected to attend all labs.\Must1=CLASS=Advanced
-gene.txt\
Must2=LAB=
Advanced-b
io.txt\Opt
ional1=CLA
SS=advance
d-gene2.do
c
2/15/08\Dr. J. Smith\Preq Bio\Prequsite for other courses.\Optional1=CLASS=p
rereq-bio.
pdf
The answer to my previous question no longer works for the sections that start with "Required Reading" and "Optional".
Again, this "conversion" process will be run on a Windows XP Pro workstation, which has a basic set of unix utilities installed from unixutls.sourceforge.net. We would prefer NOT having to get into installing perl and/or other programming type languages. We're hoping to do it this using the typical grep/awk/sed/tr/cut etc basic commands.
Any help out there?
Start Free Trial