[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

bash script logging missing hops during outage

Posted on 2014-07-17
23
Medium Priority
?
500 Views
Last Modified: 2014-07-26
I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp

Currently, my outage function is kind of nuts and mainly for static setups and is not quite what I need. I would rather send a few variables to the php app which in turn stores these things into mysql. I also would like to remove all of the tmp files and other useless stuff that seems to have become a part of this function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'
}
function outage()
{
    echo "Server is down looking for the problem"
    #while the network error persists check where is the error
    #this needs to be accurate in terms of timestamps

    # clean report file
    > $OUTAGE_REPORT

    while [ -z $(primary_check) ]; do
        NOW="$(date +"%F %T")#"
        mtr_report > ${FILE_TMP}
        if ! cmp -s $FILE_TMP $FILE_TMP_OLD  ; then
          echo -n "$NOW" >> ${OUTAGE_REPORT}
          # hop down
          grep -vf ${FILE_STATIC_TRACE} ${FILE_TMP} |head -n 1 | tr '\n' '#' | sed 's/#/,0#/'  >> ${OUTAGE_REPORT}
          # hop up
          grep -vf ${FILE_TMP} ${FILE_TMP_OLD} |head -n 1 | tr '\n' '#' | sed 's/#/,1#/'  >> ${OUTAGE_REPORT}
          #cat ${FILE_TMP} | tr '\n' '#' >> ${OUTAGE_REPORT}
          echo >> ${OUTAGE_REPORT}
        fi
        mv $FILE_TMP $FILE_TMP_OLD
    done
    rm "$FILE_TMP_OLD"

    # when server is back send a report
    $CURL -F function=outage  -F "level=@${OUTAGE_REPORT}" $SERVER_URL/app.php
}
0
Comment
Question by:projects
  • 11
  • 5
  • 4
  • +1
21 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40203830
Routing globally does not work like this. You need to monitor BGP to get any sense out of routers down.
0
 

Author Comment

by:projects
ID: 40204870
I'm posting here to get a code solution which would allow me to replace this function. I am not a programmer but am having to take care of some software problems. I can do some basic things however such as replacing functions, changing some php settings to accept different variables to be put into mysql and changing mysql tables/fields etc. I often ask for snippets on this site.

I am not wanting to monitor all of the internet, I just want to know which hop/s went down where there is a problem.

And, I could want to get into trouble by having my icmp traffic blocked either.
0
 
LVL 62

Accepted Solution

by:
gheist earned 1000 total points
ID: 40205169
Your script assumes static routes that end couple of hops outside your house.
You can check some popular service like facebook or gmail.
0
New Tabletop Appliances Blow Competitors Away!

WatchGuard’s new T15, T35 and T55 tabletop UTMs provide the highest-performing security inspection in their class, allowing users at small offices, home offices and distributed enterprises to experience blazing-fast Internet speeds without sacrificing enterprise-grade security.

 

Author Comment

by:projects
ID: 40205558
I know it is meant for static, that's why I asked for help in the question needing code so that I can fix this function.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40205942
Your script is incomplete. Can you share example input files at least?
0
 

Author Comment

by:projects
ID: 40206405
The input is the following function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'

There is no value in sharing the full script and I can't anyhow. That said, it's useless as it is using this function which is why I am asking for help to get rid of all this for;

I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
 
LVL 62

Expert Comment

by:gheist
ID: 40209940
In 2-3 hops you have multiple routing paths.
No offence but where is the example input FILE
0
 

Author Comment

by:projects
ID: 40210041
No offense taken, not sure what you are asking me. This example is what the last programmer left behind and it's useless. There is no input file and I am not trying to fix that mess.

I am looking for a solution which does the following;.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
 
LVL 62

Expert Comment

by:gheist
ID: 40213694
Thats how it looks... do you think you can use some monitoring package for your purposes? E.g. smokeping, nagios?
0
 
LVL 71

Assisted Solution

by:Qlemo
Qlemo earned 1000 total points
ID: 40213950
Leaving all scripting aside, the basics are wrong, as said. Unless you have full control over how packets are routed, e.g. because all is "inside" of your company area (be it VPN, Leased Line or whatever if WAN), you never know which would be the next failing hop. It has been said -packets can go different ways at different time.

Technically you would need to ask the last responding router for its specific routing info, and extract the gateway from there.
An additional issue is that some routers do not respond to the ICMP packets.

So it doesn't make sense to even consider any scripting. Of course you could record the hops when all is fine, but you would have to need to record all possible combinations, and then guess (most of the time) which exact route has been chosen on failure.
0
 

Author Comment

by:projects
ID: 40214126
Come on now experts... there's always an answer :).

First of all, yes, we use a number of software packages, large and small but sometimes, we just need some very basic tools such as in this case.

@Qlemo; I'm not trying to map the internet so I don't care about routing information from routers. I think you are looking at this in a more complicated way than I am asking :).
I simply want to log which hop was down and to make a note of it. Since it's down, obviously I cannot record that but I CAN record that I could reach the previous hop.
If I was mapping something, then yes, I would need a lot more detail but I don't. I just want to know which hop went down. Knowing which hop was last seen will help me to determine where the problem was/is.

The programmer who created the above code feels that it is complete, works as it should and is of value. He also thinks I asked him to write the code based on static nets which I would never ask for because on huge corporate networks, things aren't that static.
The code is useless and is overly complicated for what I need which is again, as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
 
LVL 29

Expert Comment

by:Jan Springer
ID: 40214178
The mtr command that you are running is good.

There are problems with the logic of monitoring path:

1) a missing hop (because ICMP is blocked or the interface is RFC1918)  does not mean that you will not reach the destination.

2) if you check for a missing hop and some intermediate router re-routes, it will look like a particular hop came up when in face another hop was used.
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 40214197
Again, you can record the last answering hop before the failing one, but not the failing one, unless you know the exact route reliably. And you can't. As said above, re-routing happens all the time.

That is the reason I tell "I'm reaching X as last address, which belongs to company Y, and after that I do not know what happens" if I'm troubleshooting a reachability issue. You can tell in which area a Black Hole might be from monitoring the surrounding, but never where it really is ;-).
0
 

Author Comment

by:projects
ID: 40214198
Yes, the mtr command is fine but it's the code that isn't. The code doesn't give me anything that I need based on the requirements I posted above.

That is what I am looking for in terms of a solution to this problem.
0
 

Author Comment

by:projects
ID: 40214206
@Qlemo; Correct, things change and are in potentially constant flux. Again however, I am not trying to map everything, I simply want to know which hop was down. I don't care if when it's up, it changes path, that is not important to me. I only want to log which hop went missing, how long, when it came back.

Just as the requirements are above :)
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 40215035
And again, to know that, you need to compare the failure with a success, to get the difference. Without, you only know there is something missing, but not which hop.
0
 

Author Comment

by:projects
ID: 40215054
@Olemo, we are not communicating well and both saying the same things :).

I already said yes, I understand that I won't know what the next hop is but I don't care. That isn't what is important to me. I simply want to know to which hop I CAN reach and that will already tell me which one I cannot reach. Even if the next hop changes, in my application, I don't care, I just need to know which last hop could be reached.
0
 
LVL 71

Expert Comment

by:Qlemo
ID: 40215558
Ah, that's different and feasible. Now, since we've made that clear, the *nix shell experts can take over again. My *nix times are long gone ...
0
 

Author Comment

by:projects
ID: 40215717
@Qlemo

I don't understand why that was not clear before however. I kept saying I know I cannot record the missing hop and that I don't care to.
0
 

Author Comment

by:projects
ID: 40215782
How about the following instead.

How can I run mtr, continuously, in report mode, in a bash script, ending it's process properly so that it ends while correctly spitting out it's report.

That's really all I am trying to accomplish and have been for weeks. I have tried and tried to figure this out. I have hired help which cannot seem to get this done and I have asked here repeatedly in many different ways.

There must be at least one expert on this site that understands what I am trying to do here. It is not that complex but not once has someone either told me it can be done or this cannot be done.
0
 

Author Comment

by:projects
ID: 40221587
I've asked this question in many ways so far and each time, they turn into very long threads.
I try to close them if/when there are actual answers which might help someone else.

That said, each time I have, I've come away with a little more information or knowledge or simply, the ability to ask again, this time with better details or description.

I don't want to leave all kinds of garbage questions on the site so figure with the information garnered from one question, I can ask a better question the next time.
0

Featured Post

Threat Trends for MSPs to Watch

See the findings.
Despite its humble beginnings, phishing has come a long way since those first crudely constructed emails. Today, phishing sites can appear and disappear in the length of a coffee break, and it takes more than a little know-how to keep your clients secure.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Fine Tune your automatic Updates for Ubuntu / Debian
In this article, we’ll look at how to deploy ProxySQL.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
Suggested Courses
Course of the Month20 days, 8 hours left to enroll

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question