Script for converting html to text

Hi People,

Am trying to convert an html report by copying the script below that was written by someone I don't know. but before I can do anything, I need to understand what every line below is doing so i can adjust it to work for my report. Could some one explain each line so that I can copy the idea and adjust mine. I urgently need help please and am very new to scripting. Am trying to read about each command and i can't seem to arrive at the whole logic.

for I in `find ./ -name tb_report.htm`;

echo "converting" $I;

TMPFILE=`/usr/local/bin/mktemp -t rpt.XXXXXX`;
TMPFILE2=`/usr/local/bin/mktemp -t rpt.XXXXXX`;
chmod 777 $TMPFILE $TMPFILE2;
cat $I|tr -d '_' > $TMPFILE2;
cat $TMPFILE2|sed -e 's/COLSPAN=//g' > $I;
/usr/local/bin/elinks -dump -dump-width 130 $I|sed 's/[ \t]*$//' > $TMPFILE;
echo $TMPFILE;
brn=`cat $TMPFILE |sed -n '5p'|awk -F"  " '{print $15}'`
echo $brn;

mkdir -p  `date +%d.%m.%y`/`echo rep`;
cat $TMPFILE > `date +%d.%m.%y`/`echo rep`/`echo BL011.out,$brn`;
echo "done";

Any detailed explanation will be appreciated.

Who is Participating?
for I in `find ./ -name tb_report.htm`; # loop through every file under this directory call tb_report.htm
   do # start loop

echo "converting" $I; #print "Converting" and filename

TMPFILE=`/usr/local/bin/mktemp -t rpt.XXXXXX`;  # create a tempfile
TMPFILE2=`/usr/local/bin/mktemp -t rpt.XXXXXX`; # create another tempfile
chmod 777 $TMPFILE $TMPFILE2; # make both temp files world read- and write-able
cat $I|tr -d '_' > $TMPFILE2;  # use tr to remove "_" character
cat $TMPFILE2|sed -e 's/COLSPAN=//g' > $I; # use sed to remore COLSPAN=
/usr/local/bin/elinks -dump -dump-width 130 $I|sed 's/[ \t]*$//' > $TMPFILE; # use elinks to convert webpage to screen output, and sed to delete from  " \t" to end of line
echo $TMPFILE;  # print first tempfile
brn=`cat $TMPFILE |sed -n '5p'|awk -F"  " '{print $15}'` # use sed and awk to get 15th field of 5th line (brn)
echo $brn; # print 15th field of 5th line (brn)

mkdir -p  `date +%d.%m.%y`/`echo rep`; # create a new directory named after todays date
cat $TMPFILE > `date +%d.%m.%y`/`echo rep`/`echo BL011.out,$brn`; # capy tempfile to new directory into a file named after the brn variable above
echo "done"; # print "done"
done # complete loop

You could wget the html, and then run a html stripper script on it.
ackimcAuthor Commented:
coredatarecovery, do you have any html  stripper script available to play with.
Lynx will convert it for you for free...

lynx --dump 'URL' > txtfile.txt

ackimcAuthor Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.