Solved

Shell Script To Monitor Webpage For specific word

Posted on 2010-08-26
19
1,067 Views
Last Modified: 2013-12-21
I want to have a shell script monitor a webpage for the word "Offline" and have it e-mail me if the word appears on the page.  How can I do this? Thanks
0
Comment
Question by:newmacguy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 3
  • +3
19 Comments
 
LVL 40

Expert Comment

by:omarfarid
ID: 33536426
you may monitor it with wget (see man wget) to get the page and then use grep for that word and if it is there then use mail or mailx o send mail
0
 

Author Comment

by:newmacguy
ID: 33536446
Any chance you can provide some sample code? I'm not very good at shell scripting. Thanks
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 33536513
lynx might be easier:

lynx -dump http://my.url.com | grep -q "Offline" && echo "$(date) Offline Alert!" | mailx -s 'String "Offline" found!' newmacguy@domain.tld

wmp
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 

Author Comment

by:newmacguy
ID: 33536646
I tried that but I get this result:
  Not Acceptable

   An appropriate representation of the requested resource / could not be
   found on this server.

   Additionally, a 404 Not Found error was encountered while trying to use
   an ErrorDocument to handle the request.

This command works on other websites, but not this one for some reason.
0
 
LVL 18

Expert Comment

by:TobiasHolm
ID: 33536679
406 Error (Not Acceptable)

Are you seeing this error when trying to access a page? This is due to Apache mod_security that is turned on by default. While you can use the following to diagnose the problem (turning the filter off should resolve the issue):
SecFilterEngine off

It's important to leave the filter on as it helps prevent spam and injection attacks. What you need to do is figure out (ask your hosting provider if you need help) the phrases that are triggering the filter. Typically these will include phrases that include possible commands such as "ftp," "telnet," "ssh," etc. Once you know which ones are, you can modify the filter to permit the page that is not loading (scan your logs or Google sitemap for 406 errors to find all the pages experiencing this problem):

Ref: http://community.contractwebdevelopment.com/406-error-not-acceptable-an-appropriate-representation-of-the-requested-resource-could-not-be-found-on-this-server
0
 

Author Comment

by:newmacguy
ID: 33536696
I don't control that particular web server.  I'm trying the wget method as that seems to work.  How can I make it send the e-mail if the grep returns results from the wgetted file?
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 33536705
try this

wget http://domain.com/file.html
grep -w Offline file.html
if [ $? -eq 0 ]
then
      mailx -s "Offline found" user@yourdomain
fi
rm file.html
0
 

Author Comment

by:newmacguy
ID: 33536765
I'm getting an error on the if statement line.
0
 

Author Comment

by:newmacguy
ID: 33536770
line 3: [1: command not found
0
 
LVL 18

Expert Comment

by:TobiasHolm
ID: 33536777
Try omarfarid's code in a bash-script. But I suppose you know how to make such script-file?

In case you don't:

Put the code in a file. Make the file executable with: chmod a+x yourfilename.sh

Then run the script with: ./yourfilename.sh
#!/bin/bash
wget http://domain.com/file.html
grep -w Offline file.html
if [ $? -eq 0 ]
then
      mailx -s "Offline found" user@yourdomain
fi
rm file.html

Open in new window

0
 
LVL 40

Expert Comment

by:omarfarid
ID: 33536791
did you copy the script exact ? what shell are you using?
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 33536798
you need to leave space before $?
0
 

Author Comment

by:newmacguy
ID: 33536880
I copied it exactly. I think the problem is that there are some $ symbols in the grepped text
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 33536909
the error is related to space before $?
can you post the script you run?
0
 
LVL 16

Expert Comment

by:gelonida
ID: 33536910
The  '[' AND ']'  need to be separated by a space from preceeding and following words
as omarfarid mentioned


please note also, you have to remove file.html after having grepped for it
otherwise the next wget command would save the result as file.html.1

if you want to keep the html file, then yyou could add following command
between line 6 and 7 of TobiasHolm's script

cp file.html saved_file.html


0
 
LVL 3

Expert Comment

by:egarciat
ID: 33536941
You may want to try:


#!/bin/bash

alert_string="offline"

alertme() {
mailx -s "Offline found" user@yourdomain
}

wget -t 1 -T 5 --quiet -O - "http://yourdomain.com" 2>/dev/null | grep -i "$alert_string" && alertme

Open in new window

0
 
LVL 3

Expert Comment

by:egarciat
ID: 33536994
The "-t" argument to wget is the number of tries it does before exiting, if you do not set this param, I think wget tries several times (which may be desired), the "-T" is the time out time in secons, if you do not specify this and for some reason the page is not reachable, wget may block your script for some predefined default wich I think is 45 secs.

You can specify whathever you want in the function "alertme".. personally I use sendmail for such operations, but any other program may work..

alert() {
echo -e "Subject: Offline Alert " | sendmail -falert@yourdomain.com youruser@youremaildomain &
}
0
 
LVL 3

Accepted Solution

by:
egarciat earned 500 total points
ID: 33537024
I Apologize for posting again..

I forgot to tell you that altough not necesary, I would be a good idea to set the content type of the web page you are downloading to "text/plain", you may want to use an HTML META tag.
0
 

Author Closing Comment

by:newmacguy
ID: 33673237
Sorry for being away so long. This appears to have fixed it though.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
liboauth-php x oauth-1.2.3 3 105
Ubunto LTS Server 16.04 for Intel Processors 2 72
AWK: Pytthagoras bp script Part deux 22 43
Shell script issue 4 55
This article will explain how to establish a SSH connection to Ubuntu through the firewall and using a different port other then 22. I have set up a Ubuntu virtual machine in Virtualbox and I am running a Windows 7 workstation. From the Ubuntu vi…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question