bash script logging missing hops during outage

I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp

Currently, my outage function is kind of nuts and mainly for static setups and is not quite what I need. I would rather send a few variables to the php app which in turn stores these things into mysql. I also would like to remove all of the tmp files and other useless stuff that seems to have become a part of this function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'
}
function outage()
{
    echo "Server is down looking for the problem"
    #while the network error persists check where is the error
    #this needs to be accurate in terms of timestamps

    # clean report file
    > $OUTAGE_REPORT

    while [ -z $(primary_check) ]; do
        NOW="$(date +"%F %T")#"
        mtr_report > ${FILE_TMP}
        if ! cmp -s $FILE_TMP $FILE_TMP_OLD  ; then
          echo -n "$NOW" >> ${OUTAGE_REPORT}
          # hop down
          grep -vf ${FILE_STATIC_TRACE} ${FILE_TMP} |head -n 1 | tr '\n' '#' | sed 's/#/,0#/'  >> ${OUTAGE_REPORT}
          # hop up
          grep -vf ${FILE_TMP} ${FILE_TMP_OLD} |head -n 1 | tr '\n' '#' | sed 's/#/,1#/'  >> ${OUTAGE_REPORT}
          #cat ${FILE_TMP} | tr '\n' '#' >> ${OUTAGE_REPORT}
          echo >> ${OUTAGE_REPORT}
        fi
        mv $FILE_TMP $FILE_TMP_OLD
    done
    rm "$FILE_TMP_OLD"

    # when server is back send a report
    $CURL -F function=outage  -F "level=@${OUTAGE_REPORT}" $SERVER_URL/app.php
}
projectsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gheistCommented:
Routing globally does not work like this. You need to monitor BGP to get any sense out of routers down.
0
projectsAuthor Commented:
I'm posting here to get a code solution which would allow me to replace this function. I am not a programmer but am having to take care of some software problems. I can do some basic things however such as replacing functions, changing some php settings to accept different variables to be put into mysql and changing mysql tables/fields etc. I often ask for snippets on this site.

I am not wanting to monitor all of the internet, I just want to know which hop/s went down where there is a problem.

And, I could want to get into trouble by having my icmp traffic blocked either.
0
gheistCommented:
Your script assumes static routes that end couple of hops outside your house.
You can check some popular service like facebook or gmail.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ON-DEMAND: 10 Easy Ways to Lose a Password

Learn about the methods that hackers use to lift real, working credentials from even the most security-savvy employees in this on-demand webinar. We cover the importance of multi-factor authentication and how these solutions can better protect your business!

projectsAuthor Commented:
I know it is meant for static, that's why I asked for help in the question needing code so that I can fix this function.
0
gheistCommented:
Your script is incomplete. Can you share example input files at least?
0
projectsAuthor Commented:
The input is the following function.

function mtr_report() {
      mtr --no-dns --report --report-cycles=1 smartpet.com -p | awk '{printf "%s,%s,%s\n", $1,$2,$3}'

There is no value in sharing the full script and I can't anyhow. That said, it's useless as it is using this function which is why I am asking for help to get rid of all this for;

I've posted this same question repeatedly but with specific tools in mind.
I think I need to ask my question in a different way in order to get the proper help I need.

Using a bash script, I want to monitor which hops went down during a network outage.

I've tried all kinds of tools and the problem is that all of them end up generating a lot of logging. Since I don't have much space on the device, I cannot log ongoing information and only want to know which hops went down with a time stamp.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
gheistCommented:
In 2-3 hops you have multiple routing paths.
No offence but where is the example input FILE
0
projectsAuthor Commented:
No offense taken, not sure what you are asking me. This example is what the last programmer left behind and it's useless. There is no input file and I am not trying to fix that mess.

I am looking for a solution which does the following;.

Rather than log all kinds of repetitive information, I only want to log as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
gheistCommented:
Thats how it looks... do you think you can use some monitoring package for your purposes? E.g. smokeping, nagios?
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Leaving all scripting aside, the basics are wrong, as said. Unless you have full control over how packets are routed, e.g. because all is "inside" of your company area (be it VPN, Leased Line or whatever if WAN), you never know which would be the next failing hop. It has been said -packets can go different ways at different time.

Technically you would need to ask the last responding router for its specific routing info, and extract the gateway from there.
An additional issue is that some routers do not respond to the ICMP packets.

So it doesn't make sense to even consider any scripting. Of course you could record the hops when all is fine, but you would have to need to record all possible combinations, and then guess (most of the time) which exact route has been chosen on failure.
0
projectsAuthor Commented:
Come on now experts... there's always an answer :).

First of all, yes, we use a number of software packages, large and small but sometimes, we just need some very basic tools such as in this case.

@Qlemo; I'm not trying to map the internet so I don't care about routing information from routers. I think you are looking at this in a more complicated way than I am asking :).
I simply want to log which hop was down and to make a note of it. Since it's down, obviously I cannot record that but I CAN record that I could reach the previous hop.
If I was mapping something, then yes, I would need a lot more detail but I don't. I just want to know which hop went down. Knowing which hop was last seen will help me to determine where the problem was/is.

The programmer who created the above code feels that it is complete, works as it should and is of value. He also thinks I asked him to write the code based on static nets which I would never ask for because on huge corporate networks, things aren't that static.
The code is useless and is overly complicated for what I need which is again, as follows.

-Start of outage - timestamp

-Which hop is missing - obviously, the one after which ever is last reached
-Did any other hops go missing and if so, timestamp those changes
-Don't log anything else if nothing much has changed so don't use single cycle checking because that simply creates tons of useless logs.

-End of outage - timestamp
0
Jan SpringerCommented:
The mtr command that you are running is good.

There are problems with the logic of monitoring path:

1) a missing hop (because ICMP is blocked or the interface is RFC1918)  does not mean that you will not reach the destination.

2) if you check for a missing hop and some intermediate router re-routes, it will look like a particular hop came up when in face another hop was used.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Again, you can record the last answering hop before the failing one, but not the failing one, unless you know the exact route reliably. And you can't. As said above, re-routing happens all the time.

That is the reason I tell "I'm reaching X as last address, which belongs to company Y, and after that I do not know what happens" if I'm troubleshooting a reachability issue. You can tell in which area a Black Hole might be from monitoring the surrounding, but never where it really is ;-).
0
projectsAuthor Commented:
Yes, the mtr command is fine but it's the code that isn't. The code doesn't give me anything that I need based on the requirements I posted above.

That is what I am looking for in terms of a solution to this problem.
0
projectsAuthor Commented:
@Qlemo; Correct, things change and are in potentially constant flux. Again however, I am not trying to map everything, I simply want to know which hop was down. I don't care if when it's up, it changes path, that is not important to me. I only want to log which hop went missing, how long, when it came back.

Just as the requirements are above :)
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
And again, to know that, you need to compare the failure with a success, to get the difference. Without, you only know there is something missing, but not which hop.
0
projectsAuthor Commented:
@Olemo, we are not communicating well and both saying the same things :).

I already said yes, I understand that I won't know what the next hop is but I don't care. That isn't what is important to me. I simply want to know to which hop I CAN reach and that will already tell me which one I cannot reach. Even if the next hop changes, in my application, I don't care, I just need to know which last hop could be reached.
0
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Ah, that's different and feasible. Now, since we've made that clear, the *nix shell experts can take over again. My *nix times are long gone ...
0
projectsAuthor Commented:
@Qlemo

I don't understand why that was not clear before however. I kept saying I know I cannot record the missing hop and that I don't care to.
0
projectsAuthor Commented:
How about the following instead.

How can I run mtr, continuously, in report mode, in a bash script, ending it's process properly so that it ends while correctly spitting out it's report.

That's really all I am trying to accomplish and have been for weeks. I have tried and tried to figure this out. I have hired help which cannot seem to get this done and I have asked here repeatedly in many different ways.

There must be at least one expert on this site that understands what I am trying to do here. It is not that complex but not once has someone either told me it can be done or this cannot be done.
0
projectsAuthor Commented:
I've asked this question in many ways so far and each time, they turn into very long threads.
I try to close them if/when there are actual answers which might help someone else.

That said, each time I have, I've come away with a little more information or knowledge or simply, the ability to ask again, this time with better details or description.

I don't want to leave all kinds of garbage questions on the site so figure with the information garnered from one question, I can ask a better question the next time.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.