help with formating text file in shell

I have this file "generated" from a script but I want to get the formatting better so that when the text is imported into Excel as CSV, each "counter" type gets a column.

If you look at the attached document, it has a bunch of data as per below...

0323-1135      DB Writes      2315
0323-1205      DB Writes      3078
0304-0705      BI Reads      25
0304-0735      BI Reads      9
0304-0705      BI Writes      1642
0304-0735      BI Writes      301
0304-0705      AI Writes      1148
0304-0735      AI Writes      232

Ideally, I want the script to format this better, and the output to be as below:

Date,DB Writes,Bi Writes,AI Writes
0323-1135,2315,25,1642,1148,1148

For each of the dates, there is a value for the multiple counters.

If I am able to archive this, it makes it much easier to import into excel, and have excel graph the data.

Any help would be appreciated.

Thanks.

chart-20100415-1915.csv
mirdeAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Just_RCCommented:
awk -F" " 'BEGIN {print "Date", "DBWrite", "BiWrites" "AIWrites" }  {print $1"," $2 "," $3}' datafile
0
Just_RCCommented:
DOH...you had 4 columns....add the comma and the last field:

awk -F" " 'BEGIN {print "Date", "DBWrite", "BiWrites" "AIWrites" }  {print $1"," $2 "," $3 "," $4}' datafile
0
simon3270Commented:
You need to sort the input so that all of the entries for a single date-time are together, then extract the entries you want for each day.

Save the attached code as a file called chrt.awk in the same directory as the csv file.

Then using your csv file, run the command

   sort < chart-20100415-1915.csv | awk -F, -f chrt.awk > output.csv

The first few lines would be:

Date,DB Writes,BI Writes,AI Writes
0304-0705,3428,1642,1148
0304-0735,2094,301,232
0304-0805,3132,2358,1632
0304-0835,1810,324,257
0304-0905,3805,2175,1714

One limitation with this is that it only reads the last of each field type - for example, there are always two "BI Reads" lines for each date-time - luckily they always report the same value!

If you want to add more fields to the output, just add another scan line to the awk script, such as:
  $2 ~ /^BI Reads$/{BIR=$3}
then add BIR to the list of output fields in the "printf" line, add ",BI Reads" to the heading in the BEGIN line, and set BIR to zero where  the others are set.  The heading would now be:
  print "Date,DB Writes,BI Writes,AI Writes,BI Reads"
and the printf line would become:
   printf("%s,%s,%s,%s,%s\n", old_date, DB, BI, AI, BIR);

If a particular item doesn't appear for a specific date-time, 0 will be written to the output file.

BEGIN{print "Date,DB Writes,BI Writes,AI Writes";old_date="0000-0000"}
{if (old_date != $1) {
   if (old_date != "0000-0000") {
     printf("%s,%s,%s,%s\n", old_date, DB, BI, AI);
   }
   DB=0;
   AI=0;
   BI=0;
   old_date=$1;
 }
}
$2 ~ /^AI Writes$/{AI=$3}
$2 ~ /^BI Writes$/{BI=$3}
$2 ~ /^DB Writes$/{DB=$3}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Introduction to R

R is considered the predominant language for data scientist and statisticians. Learn how to use R for your own data science projects.

mirdeAuthor Commented:
Simon,

Thank you, that works like wonders and saves a heck of a lot of time not having to manually cut and paste stuff into excel when I want to create a statistic chart.

One question that comes to mind, to expand it further.. if you look at the whole code.. you will see there are a few formatting steps etc.. this is to get the data to the point where your code above takes over (the actual reports are long and ugly, so i pull the important counters out of the reports).

Anyway, onto my question.. right now the script is called with "perf2csv [dbname]".

Where the dbname tells the script in which directory to look for, for the report log. Could I control the final output with a date range?

Because all the counters in the file that is sorted by awk have a date stamp, can i somehow add a function so that I can pass a date i want the report on?

example.. perf2csv [dbname] [start date] [end date] dbperf 0401 0415

then, only data between that date range is passed through the awk code you provided.

There would be dates "missing", in the range above since we don't take samples on sunday (dbs offline, maintenance day).

Thanks for your help.

#!/bin/ksh

if [[ ! -n $1 ]] ; then
 clear
 echo " Call the script with: perf2csv [database]"
 exit 99
fi

BASE=/home/mirde
DBNAME=$1

case "$1" in
	oppy1)		DBNAME=oppy1;;
	oppy2)		DBNAME=oppy2;;
	oppy3)		DBNAME=oppy3;;
	oppy4)		DBNAME=oppy4;;
	oppy5)		DBNAME=oppy4;;
	data450)	DBNAME=data450;;
	datar450)	DBNAME=datar450;;
	oppyh)		DBNAME=oppyh;;
	oppyw)		DBNAME=oppyw;;
	sbs450)		DBNAME=sbs450;;
 *) clear
echo "Your syntax is incorrect!" ; exit 99;;
esac

LOGDIR=$BASE/dbperf/log/${DBNAME}
CSVDIR=$BASE/dbperf/csv/data.csv
TMPDIR=$BASE/dbperf/csv/data.tmp
AWK1=$BASE/dbperf/awk/chart.awk
REPORT=$BASE/dbperf/csv/data.rpt

cd $LOGDIR
#
# Formatting phase 1!
#
awk -f /home/mirde/dbperf/awk/filter.awk perf*.txt > $TMPDIR
sleep 1
#
# Formatting phase 2!
#
for x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
  case "$x" in
    1) cat $TMPDIR | grep "DB Writes" >> $CSVDIR ;;
    2) cat $TMPDIR | grep "BI Reads" >> $CSVDIR ;;
    3) cat $TMPDIR | grep "BI Reads" >> $CSVDIR ;;
    4) cat $TMPDIR | grep "BI Writes" >> $CSVDIR ;;
    5) cat $TMPDIR | grep "AI Writes" >> $CSVDIR ;;
    6) cat $TMPDIR | grep "Record Reads" >> $CSVDIR ;;
    7) cat $TMPDIR | grep "Record Updates" >> $CSVDIR ;;
    8) cat $TMPDIR | grep "Record Creates" >> $CSVDIR ;;
    9) cat $TMPDIR | grep "Record Deletes" >> $CSVDIR ;;
    10) cat $TMPDIR | grep "Record Locks" >> $CSVDIR ;;
    11) cat $TMPDIR | grep "Index operations" >> $CSVDIR ;;
    12) cat $TMPDIR | grep "Record operations" >> $CSVDIR ;;
    13) cat $TMPDIR | grep "Total OS IO" >> $CSVDIR ;;
    14) cat $TMPDIR | grep "Total OS Reads" >> $CSVDIR ;;
    15) cat $TMPDIR | grep "Total OS Writes" >> $CSVDIR ;;
  esac
done > $CSVDIR
#
# Formatting phase 3!
#
cat $CSVDIR | sed 's/2010//g' > $TMPDIR
sleep 1
rm $CSVDIR
time_stamp=$(date +%Y%m%d-%H%M)
#
# Formatting phase 4!
#
sort < $TMPDIR | awk -F, -f $AWK1 > $REPORT
rm $TMPDIR
cp $REPORT $BASE/dbperf/csv/${DBNAME}-${time_stamp}.csv
rm $REPORT

Open in new window

0
simon3270Commented:
You can either add this to one of your existing awk scripts, or just add another processing step.  At the start of the script (after the "DBNAME=$1" line) have:

STARTD=$2
ENDD=$3

then, later on, possibly at the end of the script (and certainly after the date format has been set to the same as in your output file):

if [ "$ENDD" != "" ]
then
  awk -F, -vstartd=$STARTD -vendd=$ENDD '{if (($1 >= startd) && ($1 <= endd)) {print}}' input_file > output_file
else
  cp input_file output_file
fi

This will select only those dates from the start to the end, including both specified dates (it doesn't matter if those exact dates aren't actually in the file - it will just pick dates between the specified ones).  The "if" test is so that if ENDD is empty, it assumes you didn't want to filter the date range at all.  if you don't filter the date range, the script just copies the input file to the output file (to be consistent).

I've got a couple of other comments about the script:

The bit which scans $TMPDIR for the 15 different strings seems a bit confused - you append the output of each indivdual "grep" to $CVSDIR (lines 45 to 59), but then redirect the output of the entire "for" loop to $CVSDIR (line 61).  You don't even need the loop - just have the 15 "grep" lines, one after the other (delete lines 41 to 44, and 60 and 61, and remove the "n)" part of lines 45 to 59).  Even better would be to put the 15 names in a file, and use that as an argument to grep - for example:
  cat > name.lst <<EOF
DB Writes
BI Reads
BI Reads
BI Writes
AI Writes
Record Reads
Record Updates
Record Creates
Record Deletes
Record Locks
Index operations
Record operations
Total OS IO
Total OS Reads
Total OS Writes
EOF
  grep -f name.lst $TMPDIR > $CVSDIR
  rm name.lst

or you could have name.lst as a permanently available file, and not recreate it each time you run the program - in that case you would just have:
  MYNAMELST=$BASE/dbperf/awk/name.lst
in the bit where you initialise variables such as AWK1 (where $BASE/dbperf/awk/name.lst contains the 15 lines), then the entire section from line 42 to line 61 would be replaced by:
  grep -f $MYNAMELST $TMPDIR > $CVSDIR


The following is a bit dangerous:
  cat $CSVDIR | sed 's/2010//g' > $TMPDIR
If any of your data values happen to be (or contain) "2010", that value would also be deleted!  It would be safer to use the following, which removes "2010" from the start of any 8-digit value.
  cat $CVSDIR | sed 's/2010\([0-9][0-9][0-9][0-9]\)/\1/' > $TMPDIR
Even better, if you know that there is only one date in the line and it is at the beginning, would be this, which just removes the first 4 digits in the line:
  cat $CVSDIR | sed 's/^[0-9][0-9][0-9][0-9]//' > $TMPDIR
That also means that the script will still work in 2011!
0
mirdeAuthor Commented:
Hi Simon,

Thanks for your suggestions, i have implemented them and the date stuff, it works perfectly.

Below is the end result, anything else you can spot out that could be dangerous/changed?

You are right about the 2010, with the amount of data going in, i am sure there would have been occurrences of 2010 digits, and end results could have been thrown off, database activity appearing "less" busy than what it actually is. :)


#!/bin/ksh

if [[ ! -n $1 ]] ; then
 clear
 echo "Call the script with: perf2csv [database] [date-start] [date-end]"
 echo ""
 echo "-> The example below will produce a report for month of April"
 echo "-> example: perf2csv oppy1 0401 0431"
 echo ""
 echo "If you do not provide a start/end date, it will parse the entire db perflog."
 exit 99
fi

BASE=/home/mirde
DBNAME=$1
STARTD=$2
ENDD=$3
LOGDIR=$BASE/dbperf/log/${DBNAME}

case "$1" in
	oppy1)		DBNAME=oppy1;;
	oppy2)		DBNAME=oppy2;;
	oppy3)		DBNAME=oppy3;;
	oppy4)		DBNAME=oppy4;;
	oppy5)		DBNAME=oppy5;;
	data450)	DBNAME=data450;;
	datar450)	DBNAME=datar450;;
	oppyh)		DBNAME=oppyh;;
	oppyw)		DBNAME=oppyw;;
	sbs450)		DBNAME=sbs450;;
 *) clear
echo "Your syntax is incorrect!" ; exit 99;;
esac

# Required input files
AWK1=$BASE/dbperf/include/chart.awk
REPORT=$BASE/dbperf/csv/data.rpt
COUNTERS=$BASE/dbperf/include/name.lst

# Temp files.
TMPDIR=$BASE/dbperf/csv/data.tmp
CSVDIR=$BASE/dbperf/csv/data.csv
OUTPUT=$BASE/dbperf/csv/data.out

# Formatting (1/4).
cd $LOGDIR
awk -f /home/mirde/dbperf/include/filter.awk perf*.txt > $TMPDIR

# Formatting (2/4).
grep -f $COUNTERS $TMPDIR > $CSVDIR
rm $TMPDIR

# Formatting (3/4).
cat $CSVDIR | sed 's/^[0-9][0-9][0-9][0-9]//' > $TMPDIR
rm $CSVDIR
if [ "$ENDD" != "" ]
then
  awk -F, -vstartd=$STARTD -vendd=$ENDD '{if (($1 >= startd) && ($1 <= endd)) {print}}' $TMPDIR > $OUTPUT
else
  cp $TMPDIR $OUTPUT
fi
rm $TMPDIR

# Formatting (4/4).
sort < $OUTPUT | awk -F, -f $AWK1 > $REPORT
rm $OUTPUT
time_stamp=$(date +%Y%m%d-%H%M)
cp $REPORT $BASE/dbperf/csv/${DBNAME}.csv
rm $REPORT

# Report to screen.
clear
echo "Performance counters extracted.

Open in new window

0
simon3270Commented:
That all looks fine now.  There are a couple of things I might do differently, but nothing which is necessarily "better" than what you already have.  If it works now, leave well alone!
0
mirdeAuthor Commented:
Works as expected, thank you.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.