troubleshooting Question

Need a shell script to copy html files from remote sever, read those html file and extract the particular data and put the data in new file

Avatar of kvjajoo
kvjajoo asked on
LinuxShell Scripting
6 Comments1 Solution1048 ViewsLast Modified:
Hi,

I have been give a task to collect the usage stats from multiple virtual hosts we have on linux web server and put the summary of the data into new text or excel spreadsheet. Below is the explanation:

We have a linux web server which host multiple virtual hosts. Each of host site generate the html file at the end of each month which contain the information as follows :

Total Hits       418741
Total Files       377639
Total Pages       143324
Total Visits       29678
Total KBytes       15715617
Total Unique Sites       11912
Total Unique URLs       2722
Total Unique Referrers       2314
Total Unique Usernames       2
Total Unique User Agents       1251

This html file is generated by  Webalizer.  The html file is stored in /var/www/vhosts/<vhostname>/statistics/webstat/usage_<year><month>.html
For example : var/www/vhosts/xyz.com/statistics/webstat/usage_200806.html

Following are the html codes from the html file which displays the above data.

<TR><TD WIDTH=380><FONT SIZE="-1">Total Hits</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>279242</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Files</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>209861</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Pages</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>44464</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Visits</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>9029</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total KBytes</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>15578780</B></FONT></TD></TR>
<TR><TH HEIGHT=4></TH></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique Sites</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>5606</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique URLs</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>204</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique Referrers</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>1025</B></FONT></TD></TR>
<TR><TD WIDTH=380><FONT SIZE="-1">Total Unique User Agents</FONT></TD>
<TD ALIGN=right COLSPAN=2><FONT SIZE="-1"><B>605</B></FONT></TD></TR>


Now i have one more linux box which can access this web server box via ssh or sftp with root credentials. I want a shell script which will be executed on the linux box (not web server) should do the following :

1. copy the latest usage_<year><month>.html from each virtual host's webstats directory from the web server  ( please note the html file's names are similer for each host so the vhost name should be appended in the begining of the file so the new file will be <vhostname>usage_<year><month>.html

2. grab the data i want ( i have mention the required data in begining) from html files and put them into one single in one vhost data in one line format so i can paste them in excel. So the format should be somthing like this : <vhostname>  <Total Hits> <Total Files> <Total Pages> <Total Visits> and so on ....

I am a absolute beginner in shell scripting area.
Join the community to see this answer!
Join our exclusive community to see this answer & millions of others.
Unlock 1 Answer and 6 Comments.
Join the Community
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 6 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros