Solved

bash script network hop logging

Posted on 2014-07-25
53
293 Views
Last Modified: 2014-08-18
Need bash solution which logs only the pertinent results of down hops during a network outage.

-When network outage occurs, run continuously. check for target IP during outage. End once target IP is back and outage is over.
-Monitor hop/s down and/or loss details.

In the following example from another question posted on this site;

-An outage occurs, cannot reach the target IP anymore
-Hop 6 goes down but then comes back
-Hop 5 goes down but then comes back
-Hop 7 goes down but then comes back
-Target IP is back, outage is over, end of function/script

Script should strip everything but the following;

-id1 timestamp - outage
-id2 timestamp - 5. ae-1.ebr1.Washington1.Level3.ne (last hop reachable) or hop 6 down and/or loss%
-id3 timestamp - 4. ae-2.ebr2.Washington1.Level3.ne (last hop reachable) or hop 5 down and/or loss%
-id4 timestamp - 6. ge-3-0-0-53.gar1.Washington1.Le (last hop reachable) or hop 7 down and/or loss%
-id5 timestamp -end of outage

I only have ping and paris-traceroute available to me on the OS.

There must be someone on this site who can accomplish this. I've tried paying people, I've tried various things myself, I've tried posting here and I've been pulling my hair out.

Thank you!
0
Comment
Question by:projects
  • 24
  • 11
  • 11
  • +2
53 Comments
 
LVL 57

Expert Comment

by:giltjr
ID: 40221312
I can't remember, is this for a private network?  Meaning all hops are within "your network"?  Or is this for the Internet?

If this is for the Internet, you should not care which hops are up or which hops are down.  It's not your job and the ISP should already know.  In fact you should not even know which hops are there.  The Internet provider should have redundant paths and you may not take the same route every time, so you may not even know which hop is down, if one is down.  Even using paris-traceroute, the ISP can change the paths, thus the hops, at anytime they want.  To know which hop is down, you need to know exactly which hops there are and in what order they are or could be in and hope they never change.

Now if this is your network, then you should know which hops are there and what order they should be in.  Then using traceroute/paris-traceroute and by using bash and sed, possibly awk, you can accomplish what you want.
0
 

Author Comment

by:projects
ID: 40221712
It is mostly private network but some segments do go over the internet.
Correct, I do have a map of the networks so all I am looking for is to know which hop/s went down.
So if hop 5 is the last hop I can reach, then I know that hop 6 is either down or is in a high loss situation.

I'd like to know which ever it is, loss or out.
0
 
LVL 37

Expert Comment

by:Gerwin Jansen
ID: 40222926
Is it that you want to record the hop numbers that are down? If the amount of hops would change at some time, the route would have been changed meanwhile. So recording names of each hop would not make sense then.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40223184
Why dont you ask your routers to send SNMP trap to your logging servers on up/down events?
0
 

Author Comment

by:projects
ID: 40223328
Everything is fully monitored but some things seem to slip through the cracks so I just want something simple of my own that will tell me when hops are down.

@Gerwin Jansen; I'm not following what you are asking. I can't make it any simpler than how my question explains it. I think most people that read the question end up asking too many of their own which end up making my original question look confusing.

I simply want to know when the hops were down, which hop was down. This isn't rocket science, it's just scripting to know which hops are having problems. As I've said countless times now, I don't care if the hop changes, I simply need to know if it went down and which one.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40223496
You need to write SQL/Perl/Python/Prolog query against your monitoring system that collects the router events.

PS Once you try you understand the now invisible complexity of multiple routes from one end to another.
0
 
LVL 37

Expert Comment

by:Gerwin Jansen
ID: 40223747
>> As I've said countless times now, I don't care if the hop changes, I simply need to know if it went down and which one.
No problem, I won't make any more suggestions here.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40224435
Part of the problem is you are asking to do something that, well, is not done and you are asking to do it on a platform that is very limited it what it can do.

>>As I've said countless times now, I don't care if the hop changes, I simply need to know if it went down and which one.

The problem is if there are two possible paths, thus two possible next hops, you may not know which one went down, you just know that one went down.  Say you have:
          /-------> R2<------->\
R1 <-|                               |-> R4
          \-------> R3<------->/

If R1 uses some mechanism to load balance traffic between the link going to R2 and R3, there will be a brief amount of time where the link between R1 and R2 or R3 could fail and R1 could try to send some traffic over the failed link while other traffic could still go over the good link.

Since the next hop from R1 is not fixed, your test could fail, but you would have no clue if R2 or R3 is the problem.  You just know there is a problem someplace.
0
 

Author Comment

by:projects
ID: 40224673
@giltjr; Yes, I understand and that is the nature of the internet. Packets could take any number of routes to get from one point to another.
Over the years however, major backbones which have been handling core traffic would typically route the shortest path as much as possible no? Over time, would there not be only so many routes?

@gheist; Yes, I have maps of most of the paths. The only ones I don't have are when packets get routed over the internet. However, I would still like to keep track of those things too. The major software packages being used do use snmp, I can't interface my little script into that however. I just need something stand alone.

@Gerwin Jansen; Didn't mean that to sound rude if it did. I just meant I've said that many times because I've posted this question many many times in various ways. The last one came the closest to my being able to explain what I am after. Thought I would use some of the last questions info to try again.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40224830
>>Over the years however, major backbones which have been handling core traffic would typically route the shortest path as much as possible no?

Not when they have redundant parallel  paths that are equal distance, which most backbones have.

What you are attempting to do sounds simple, but as you have found, its not when you try and implement it.

Even if you owned the whole path (private network) attempting to do what you want is still complex.  You would need to analyze the output from the ping and the traceroute and know the exact path something should take.

If you detected a not responding router on the traceroute, you would need to know if there was one or two possible next hops.  If there are two possible next hops, then you would need to see which ping failed to know if "hop option#1" or "hop option#2" failed.

I'm am far from a bash expert, I would not even consider myself a bash novice, however I think would be quite complex to analyze the out put from two different commands in a simple scripting language like bash.
0
 

Author Comment

by:projects
ID: 40224860
If you detected a not responding router on the traceroute, you would need to know if there was one or two possible next hops.  If there are two possible next hops, then you would need to see which ping failed to know if "hop option#1" or "hop option#2" failed.

I guess the question is... would there eventually be only so many routes?
0
 
LVL 61

Expert Comment

by:gheist
ID: 40224868
BASH has quite advanced conditional language, Just that it lacks efficient storage for all possible routing paths.
0
 

Author Comment

by:projects
ID: 40224873
Storage is going into mysql
Only problem is that it cannot update the database until the network connection is back if it cannot reach that lan.

The one thing I've not mentioned is that I don't much care about the entire travel of the packet and as I've said, only to know which hop might fail.

If I know that hop 5 is a problem on a very regular basis, I don't much care about anything beyond that and only that we need to check hop 5.

Again, sometimes, you can't share everything in public so it's hard to ask questions but I sure hope this helps a little.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40224912
Get over it. Such tracing script could be useful in 1980s, but not today.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40224917
Yes, there would be eventually only so many routes, however unless you are only doing this for you private network, the routes could change and you would not necessarily know.

Also, some Internet providers will detect ICMP packets that are not from their monitors and drop them.  So to you it may look a router is down when it is really just ignoring you.

As I am sure you know, typically monitoring is from a central site, or a few select sites, not from each end point which is what I believe you are trying to do.
0
 

Author Comment

by:projects
ID: 40224932
Sorry, I edited my reply above but didn't get done before two more posts showed up.
0
 

Author Comment

by:projects
ID: 40225416
@giltjr; How much could the routes change if I monitored between source and destination to know which hops show up over time.

When things go across the internet, yes. we have no way of knowing what the route will take if something changes. I don't much care about when things go over the internet but it could be valuable. Might not have to be 100% correct, just some idea of where the problems are. Are they with us or something over the net.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40226611
It really depends if the path is fully within a single ISP or if you are crossing ISP and how the peering is setup between the ISP's you cross.

I have done some testing where I did traceroutes every 5 minutes for an hour (so 12) and out of the 12 the same path was never taken twice.  In each one at least 1 hop along the way was different.  Now this was from a site in the US to a site in Singapore.

It seems like you are trying to figure out where somebody else's problem.
0
 

Author Comment

by:projects
ID: 40226836
Actually, I'm trying to figure out problems internally but since the problems can sometimes be over the internet and not related to us, I am trying to gather as much information as is useful in order to get a bigger picture.

For the most part, I don't much care about anything beyond say 4 or 5 hops beyond our networks but it would be good to have this info to know if problems are in fact internal and external.

I am mostly concerned with keeping an eye on hops inside of our network and perhaps a few hops externally to be safe.

This is why I am not terribly concerned with changing hops once we get too far into the internet.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40227100
Then I would not be too concerned with trying to use traceroute, it shows the last working hop instead of the first failing hop.

You could just ping each hop in sequence, the first hop that does not respond is the hop that is down and you don't need to ping any more.  Again I don't know bash, but this should be fairly simple to do.
0
 

Author Comment

by:projects
ID: 40227111
That's what I do now but what I want is to run the script continuously to check if there are any other hops which change condition. Single cycle runs generate a lot of logs and I want to extract only the useful details.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40227201
You do realize that running this continuously will eat up bandwidth and CPU cycles.  Although it may not be much, it will still interfere with whatever function device is doing.


I'm not sure what you mean by "check if there are any other hops which change condition".  Again once ping fails, there are no more hops to check, as all of the next hops will (should) also fail.  

Assuming that the ping failed because the path between the last working ping and this ping fail and not because the the "failing" devices decided not to respond to your ping.

Based on your other posts you are not responsible for the network.  I would really suggest you leave the network monitoring to the group that is responsible for it.  

I will say if you worked for the same company as I do and you attempted to do this, you would not work for it for too long.  Not trying to be mean, but I am sure you would not want somebody else from another group trying to do your job.
0
 

Author Comment

by:projects
ID: 40227212
This function has 100% priority so that is not a problem. As far as using up bandwidth, it should not use much at all because the connection will be down while this test is happening.

What I mean by 'any other hops down'; I guess you've not seen my other similar question then :)
Here it is; http://www.experts-exchange.com/Programming/Misc/Q_28482734.html

>Based on your other posts you are not responsible for the network.

Not a good assumption :). If I could tell you everything then you would understand but sometimes, you cannot share all of the little details so share what you can. When I started using this site, I knew that I would be posting many questions and that by doing so, it would be possible that someone might become interested in why I keep asking certain questions. Therefore, I have purposely made sure to leave out a number of details which I simply cannot share because they would divulge the project/s that I am working on.

Not everything is for public consumption, even when posting in public. That said, I try very hard to give as much information as I possibly can.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227283
Your measurement method is unworkable. You can check next hop outside your premises and thats all.

traceroute -m3 www.yahoo.com
traceroute to www.yahoo.com (46.228.47.115), 3 hops max, 60 byte packets
 1  192.168.0.1 (192.168.0.1)  1.114 ms  3.532 ms  3.534 ms
 2  static-1-124-145-212.ipcom.comunitel.net (212.145.124.1)  17.437 ms  17.456 ms  19.649 ms
 3  10.4.193.155 (10.4.193.155)  17.395 ms  19.585 ms  19.570 ms <-thats two hops out and already not good...
0
 

Author Comment

by:projects
ID: 40227296
I've got a programmer working on it and he seems to think it's possible so, not sure where this is going.
0
 

Author Comment

by:projects
ID: 40227358
I've requested that this question be deleted for the following reason:

After asking this question in many different ways, the experts have not been able to give me what I seek which is a small script to show this is possible. Instead, I get more and more questions each time, telling me that I do not know what I am talking about... basically.

Therefore, time to give up this question as unanswerable.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 57

Expert Comment

by:giltjr
ID: 40227316
--> As far as using up bandwidth, it should not use much at all because the connection will be down while this test is happening.

This confuses me a LOT.  You stated you really want to do continuous testing, which to means means that you would always be sending out some type of test packets.  Then you state that during the testing the "connection" will be down.  I'm not sure which connection your are talking about, network or "application", but if the network connection is down, these test work work.  If the application connection is down when your test is running and your test is always running, I guess the application will always be down.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40227329
Also, after spending a little time in IT I have found that you will never get the "correct/right/best" solution if you don't provide all of the necessary details in requirements.
0
 

Author Comment

by:projects
ID: 40227353
I've been involved in IT for a very long time but sometimes, it's good to think outside of the box and see what you can come up with. You are right of course, can't always find the perfect answer :)

Thank you to everyone who has helped however. It seemed unfair to pick anyone solution since there was none to be found, on this site at least.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227359
No. Your request makes no sense. It mentions some backbone router. So if it is your network i assure somebody else monitors it very well. Just call them and they will help you understand how internet works.
0
 

Author Comment

by:projects
ID: 40227371
You are arguing without knowledge of what my full intentions are. You cannot help because you continue to ask for more and more information when I have made my question as simple as it possibly can be.
I am not seeking anything all that complicated and I've explained it repeatedly. If no one on this site can help, so be it, it's time for the programmer to prove otherwise.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40227386
-->  when I have made my question as simple as it possibly can be ...

Without providing all of the details, which can change the answer/solution.  If you provide 1/10th of your requirements, you will get a solution that satisfies 1/20 of what you need.

--> I am not seeking anything all that complicated and I've explained it repeatedly ....

Maybe based on your limited knowledge and understand it is not complicated.    Think about this, if it was that simple, the you would not need to ask the question because a simple Google search would yield what you are looking for.

I pointed to a script that would do what you asked you said that would not work.
0
 

Author Comment

by:projects
ID: 40227398
You might want to re-read the question a few times, then the ongoing information. I gave the scenario, then I said what I was interested in and not.

The script you pointed to doesn't even come close to what I am seeking. That script simply monitors a host then emails the person if it's down. There are countless such scripts on the net.

Thanks but unfortunately you have not answered my question, only asked an awful lot more.

As for my limited knowledge, what can I say, that's from your perspective. I can't get into that and won't.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227448
Answer to your question:
There is no co-relation between remote host reachablility and random network router not answering your ping requests.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 40227462
Well first ping and traceroute canNOT monitor link.  They can only monitor the accessibility to a IP address, a.k.a. a HOST.

So by using ping/traceroute ultimately you are not monitoring a link, you are monitoring the ability to get to a host.  As long as you can get to the host you don't care which path it is taking as long as you can get their.  Which is what the script does and the script can be easily modified to log instead of send e-mail.

To truly monitor the status of a link (interface up/down) you need access to each router/hop in the path and display the status of each interface within the path or you need to received SNMP traps, as suggested earlier, from the router.
0
 

Author Comment

by:projects
ID: 40227494
I am not monitoring interfaces.

Not interested in random routers or the many paths that things can take.

See ID: 40226836
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227640
So it boils down to ping.... Or check_http etc.
0
 

Author Comment

by:projects
ID: 40227663
It's not the tool that it boils down to, it's the results of using what ever is appropriate, Since we can't run mtr continuously in a script, killing it and still getting the results, then yes, it'll end up being ping/traceroute.

The question was my asking for a script solution to doing this. A script which only gathers the relevant information from the tests so that I don't have large logs to deal with and only the results are sent to say mysql for later review.

Again, just to help make the question that much clearer;

For the most part, I don't much care about anything beyond say 4 or 5 hops beyond our networks but it would be good to have this info to know if problems are in fact internal and external.

I am mostly concerned with keeping an eye on hops inside of our network and perhaps a few hops externally to be safe.

This is why I am not terribly concerned with changing hops once we get too far into the internet. 

Open in new window


I am not interested in traditional monitoring and/or tools, I simply want something slightly different but as a script which I can automatically run DURING an outage. Outage meaning that I cannot reach the target for ANY reason what so ever.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227866
By flooding network with the requests during network stress period you are shooting yourself in the leg.
0
 

Author Comment

by:projects
ID: 40227894
I'm not flooding anything, it's down. It isn't a critical moment when I start my function to find out which hop is or are down.

The hops which are reachable will see my traffic of course but I can also use delay intervals.
I'm not worried about flooding anything but yes, that's a good call.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40227925
You still have not provided information:
How will you detect that network outage has occurred?

Another food for thought:
What are you going to do if your provider switches off your favourite hop and throws it in trash bin?
0
 

Author Comment

by:projects
ID: 40227943
I believe that currently, for testing, the script simply users curl to check for a target. If target is unreachable, outage function kicks in. Not necessarily a network outage, just that we cannot reach target. This way, I'm not flooding the networks with icmp day in and out.

Provider? So like external to us but where say Level3 changes a hop which is over the internet?
If they change something, no big deal, if it's gone, it's gone.

We don't control the internet so don't expect anything to ever be static.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40228424
Lets see
Your site has 10000 users. Each functionally runs curl (it is called chrome or firefox or whatever)
Now when you stop apache server everybody is entitled to flood-ping your site...
I am unmonitoring this question because you dont understand and just bash anybody trying to help you.
0
 
LVL 37

Expert Comment

by:Gerwin Jansen
ID: 40229317
>> Actually, I'm trying to figure out problems internally
Since you are not interested in external problems, why not use a list (database) of your network equipment together with your network map to create a basic 'ping' script that tells you which of your network components have failed?
0
 

Author Comment

by:projects
ID: 40229513
I bash everyone? I've explained that I am not trying to sound rude but there is only so much information I can give out at this time. Not trying to be rude, just trying to find ways of providing information so that experts can help.

@Gerwin; Yes, the engineers have all of those things.

I simply need something much lighter to help diagnose certain problems.

I'm interested in problems which are x number of hops from where ever I am testing from.
As it's not always clear which network takes what route and sometimes go over the internet, then I need to gather up what ever info I can. I have no control over internet traffic but it's always good to have as much info as possible because I cannot know how many hops will be internal or external. That is why I'd like to simply log any and all hop changes.

Anyhow, I'm about ready to stop using this site because I cannot provide everything for details and people are starting to think badly of me.
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 500 total points
ID: 40230268
I think if we leave the global view and focus on "near neighbourhood", it should be feasible.
-----------
Assuming there is already a trigger script able to fire the diagnostic script/tool you want to create/get.
Then the task is to test with a traceroute for the next 5 hop.
If we get thru - ok, not "your fault", log this and give up, we are done.

If we don't, note the last working hop (maybe including the intermediate hops before).
Set a flag that we are in "recording mode".
Wait for some time, then check again.

:loop
If we get thru in "recording mode", we are done, but need to log success.
If we don't, just wait again, then goto loop
--------
Does that sound like what you are after?
0
 

Author Comment

by:projects
ID: 40230367
>I think if we leave the global view and focus on "near neighbourhood", it should be feasible.

Correct.

>Assuming there is already a trigger script able to fire the diagnostic script/tool you
>want to create/get.

Yes, the script already checks for target missing and there is a trigger to go to an error condition.

>Then the task is to test with a traceroute for the next 5 hop.

I usually use 7 just to be safe but even 5 should be plenty.

>If we get thru - ok, not "your fault", log this and give up, we are done.

Correct. Outage over, end of function, back to other things.

>If we don't, note the last working hop (maybe including the intermediate hops before).
>Set a flag that we are in "recording mode".
>Wait for some time, then check again.

>:loop
>If we get thru in "recording mode", we are done, but need to log success.
>If we don't, just wait again, then goto loop
>--------
>Does that sound like what you are after?

Darn close!

Only difference would be...

-We time stamp when the event (outage mode) started.

-We are now in recording mode.

-We test to find out how many hops we can reach, if any and log the last one we could which means the next hop is down. No need to log all of the hops we did reach because I have a map of all the internal routes. The initial timestamp tells me when this happened and what hop was at fault.

-We loop again and if we can't get through, we test again. No logging is required if the same hop is down. IF however, the previous hop is now reachable but another one is not, then we want to note the new hop along with a new time stamp, because, things changed.

-We loop again and if we can't get through, we test again. No additional logging is required if nothing has changed. The second hop which changed is the last change so we log nothing.

-We keep looping, nothing nothing unless something changed. If something changed, log it with a time stamp.

-We loop again and if the target is back, it means all routes are back so we time stamp that and the function ends.

In the end, instead of having a huge log if the target was unreachable for an extended period of time, we end up with a much smaller log which looks like this;

-id1 timestamp - outage
-id2 timestamp - 5. ae-1.ebr1.Washington1.Level3.ne (last hop reachable) or hop 6 down and/or loss%
-id3 timestamp - 4. ae-2.ebr2.Washington1.Level3.ne (last hop reachable) or hop 5 down and/or loss%
-id4 timestamp - 6. ge-3-0-0-53.gar1.Washington1.Le (last hop reachable) or hop 7 down and/or loss%
(no additional logging if nothing changes)
-id5 timestamp -end of outage

This is then sent to the database for storage.
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 40230424
There are still some possible flaws or issues to take care of, but that adds just a liitle more noise to the logs, so no real issue. You might have to refine later, after seeing the script doing its magic.

If you now detach the solution from requiring "bash", Experts might be able to write up something more easily.
0
 

Author Comment

by:projects
ID: 40230439
Unfortunately, it's all I have access to. I can only use bash scripting to accomplish this along with basic tools such as ping, traceroute etc.
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 40230449
Reading gheist's comment in regard of the available bash features, and trying to remember bsh/sh etc., it might be difficult to accomplish that.
0
 

Author Comment

by:projects
ID: 40230578
I'm no bash expert either which is why I'm posting here :).

I figured the script would keep track of the logging, stripping the rest.
At worse, write a temp log file and the relevant info in there.
0
 

Author Comment

by:projects
ID: 40268733
I've requested that this question be deleted for the following reason:

Please delete this question. I've asked it once too many and will not get any solutions here.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
FTP output from Wireshak 6 50
RIP Routing 5 48
HP Laser Jet Errors 10 56
cisco switch stacking 6 35
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
PRTG Network Monitor lets you monitor your bandwidth usage, so you know who is using up your bandwidth, and what they're using it for.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now