Hi,
I have been give a task to collect the usage stats from multiple virtual hosts we have on linux web server and put the summary of the data into new text or excel spreadsheet. Below is the explanation:
We have a linux web server which host multiple virtual hosts. Each of host site generate the html file at the end of each month which contain the information as follows :
Total Hits 418741
Total Files 377639
Total Pages 143324
Total Visits 29678
Total KBytes 15715617
Total Unique Sites 11912
Total Unique URLs 2722
Total Unique Referrers 2314
Total Unique Usernames 2
Total Unique User Agents 1251
This html file is generated by Webalizer. The html file is stored in /var/www/vhosts/<vhostname
>/statisti
cs/webstat
/usage_<ye
ar><month>
.html
For example : var/www/vhosts/xyz.com/sta
tistics/we
bstat/usag
e_200806.h
tml
Following are the html codes from the html file which displays the above data.
<TR><TD WIDTH=380><FONT SIZE="-1">Total Hits</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>279242</B></F
ONT></TD><
/TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Files</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>209861</B></F
ONT></TD><
/TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Pages</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>44464</B></FO
NT></TD></
TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Visits</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>9029</B></FON
T></TD></T
R>
<TR><TD WIDTH=380><FONT SIZE="-1">Total KBytes</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>15578780</B><
/FONT></TD
></TR>
<TR><TH HEIGHT=4></TH></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique Sites</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>5606</B></FON
T></TD></T
R>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique URLs</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>204</B></FONT
></TD></TR
>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique Referrers</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>1025</B></FON
T></TD></T
R>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique User Agents</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>605</B></FONT
></TD></TR
>
Now i have one more linux box which can access this web server box via ssh or sftp with root credentials. I want a shell script which will be executed on the linux box (not web server) should do the following :
1. copy the latest usage_<year><month>.html from each virtual host's webstats directory from the web server ( please note the html file's names are similer for each host so the vhost name should be appended in the begining of the file so the new file will be <vhostname>usage_<year><mo
nth>.html
2. grab the data i want ( i have mention the required data in begining) from html files and put them into one single in one vhost data in one line format so i can paste them in excel. So the format should be somthing like this : <vhostname> <Total Hits> <Total Files> <Total Pages> <Total Visits> and so on ....
I am a absolute beginner in shell scripting area.
Start Free Trial