Link to home
Start Free TrialLog in
Avatar of projects
projects

asked on

bash script logging missing hops during outage

I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp

Currently, my outage function is kind of nuts and mainly for static setups and is not quite what I need. I would rather send a few variables to the php app which in turn stores these things into mysql. I also would like to remove all of the tmp files and other useless stuff that seems to have become a part of this function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'
}
function outage()
{
    echo "Server is down looking for the problem"
    #while the network error persists check where is the error
    #this needs to be accurate in terms of timestamps

    # clean report file
    > $OUTAGE_REPORT

    while [ -z $(primary_check) ]; do
        NOW="$(date +"%F %T")#"
        mtr_report > ${FILE_TMP}
        if ! cmp -s $FILE_TMP $FILE_TMP_OLD  ; then
          echo -n "$NOW" >> ${OUTAGE_REPORT}
          # hop down
          grep -vf ${FILE_STATIC_TRACE} ${FILE_TMP} |head -n 1 | tr '\n' '#' | sed 's/#/,0#/'  >> ${OUTAGE_REPORT}
          # hop up
          grep -vf ${FILE_TMP} ${FILE_TMP_OLD} |head -n 1 | tr '\n' '#' | sed 's/#/,1#/'  >> ${OUTAGE_REPORT}
          #cat ${FILE_TMP} | tr '\n' '#' >> ${OUTAGE_REPORT}
          echo >> ${OUTAGE_REPORT}
        fi
        mv $FILE_TMP $FILE_TMP_OLD
    done
    rm "$FILE_TMP_OLD"

    # when server is back send a report
    $CURL -F function=outage  -F "level=@${OUTAGE_REPORT}" $SERVER_URL/app.php
}
Avatar of gheist
gheist
Flag of Belgium image

Routing globally does not work like this. You need to monitor BGP to get any sense out of routers down.
Avatar of projects
projects

ASKER

I'm posting here to get a code solution which would allow me to replace this function. I am not a programmer but am having to take care of some software problems. I can do some basic things however such as replacing functions, changing some php settings to accept different variables to be put into mysql and changing mysql tables/fields etc. I often ask for snippets on this site.

I am not wanting to monitor all of the internet, I just want to know which hop/s went down where there is a problem.

And, I could want to get into trouble by having my icmp traffic blocked either.
ASKER CERTIFIED SOLUTION
Avatar of gheist
gheist
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I know it is meant for static, that's why I asked for help in the question needing code so that I can fix this function.
Your script is incomplete. Can you share example input files at least?
The input is the following function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'

There is no value in sharing the full script and I can't anyhow. That said, it's useless as it is using this function which is why I am asking for help to get rid of all this for;

I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
In 2-3 hops you have multiple routing paths.
No offence but where is the example input FILE
No offense taken, not sure what you are asking me. This example is what the last programmer left behind and it's useless. There is no input file and I am not trying to fix that mess.

I am looking for a solution which does the following;.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
Thats how it looks... do you think you can use some monitoring package for your purposes? E.g. smokeping, nagios?
SOLUTION
Avatar of Qlemo
Qlemo
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Come on now experts... there's always an answer :).

First of all, yes, we use a number of software packages, large and small but sometimes, we just need some very basic tools such as in this case.

@Qlemo; I'm not trying to map the internet so I don't care about routing information from routers. I think you are looking at this in a more complicated way than I am asking :).
I simply want to log which hop was down and to make a note of it. Since it's down, obviously I cannot record that but I CAN record that I could reach the previous hop.
If I was mapping something, then yes, I would need a lot more detail but I don't. I just want to know which hop went down. Knowing which hop was last seen will help me to determine where the problem was/is.

The programmer who created the above code feels that it is complete, works as it should and is of value. He also thinks I asked him to write the code based on static nets which I would never ask for because on huge corporate networks, things aren't that static.
The code is useless and is overly complicated for what I need which is again, as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
The mtr command that you are running is good.

There are problems with the logic of monitoring path:

1) a missing hop (because ICMP is blocked or the interface is RFC1918)  does not mean that you will not reach the destination.

2) if you check for a missing hop and some intermediate router re-routes, it will look like a particular hop came up when in face another hop was used.
Again, you can record the last answering hop before the failing one, but not the failing one, unless you know the exact route reliably. And you can't. As said above, re-routing happens all the time.

That is the reason I tell "I'm reaching X as last address, which belongs to company Y, and after that I do not know what happens" if I'm troubleshooting a reachability issue. You can tell in which area a Black Hole might be from monitoring the surrounding, but never where it really is ;-).
Yes, the mtr command is fine but it's the code that isn't. The code doesn't give me anything that I need based on the requirements I posted above.

That is what I am looking for in terms of a solution to this problem.
@Qlemo; Correct, things change and are in potentially constant flux. Again however, I am not trying to map everything, I simply want to know which hop was down. I don't care if when it's up, it changes path, that is not important to me. I only want to log which hop went missing, how long, when it came back.

Just as the requirements are above :)
And again, to know that, you need to compare the failure with a success, to get the difference. Without, you only know there is something missing, but not which hop.
@Olemo, we are not communicating well and both saying the same things :).

I already said yes, I understand that I won't know what the next hop is but I don't care. That isn't what is important to me. I simply want to know to which hop I CAN reach and that will already tell me which one I cannot reach. Even if the next hop changes, in my application, I don't care, I just need to know which last hop could be reached.
Ah, that's different and feasible. Now, since we've made that clear, the *nix shell experts can take over again. My *nix times are long gone ...
@Qlemo

I don't understand why that was not clear before however. I kept saying I know I cannot record the missing hop and that I don't care to.
How about the following instead.

How can I run mtr, continuously, in report mode, in a bash script, ending it's process properly so that it ends while correctly spitting out it's report.

That's really all I am trying to accomplish and have been for weeks. I have tried and tried to figure this out. I have hired help which cannot seem to get this done and I have asked here repeatedly in many different ways.

There must be at least one expert on this site that understands what I am trying to do here. It is not that complex but not once has someone either told me it can be done or this cannot be done.
I've asked this question in many ways so far and each time, they turn into very long threads.
I try to close them if/when there are actual answers which might help someone else.

That said, each time I have, I've come away with a little more information or knowledge or simply, the ability to ask again, this time with better details or description.

I don't want to leave all kinds of garbage questions on the site so figure with the information garnered from one question, I can ask a better question the next time.