• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1415
  • Last Modified:

Should I be using Full http monitoring or Simple http monitoring for health checks?

Hi,

We recently upgraded from a Foundry load balancer which used simple health script checks to keep nodes in the pool to a Zeus Load Balancer which uses ping, the Zeus "Full HTTP" and a health script which is a .NET page which renders a simple "server up" message. Passive monitoring is turned OFF.

We have multiple issues but it basically comes down to this:
1. The Zeus will drop a node for whatever reason. In this mornings case - it was a timeout to the node it dropped
2. Our CDN all of a sudden can't get to orign
3. Our site's go down until we purge our CDN


This morning Zeus dropped one node but displayed the regrettable - "no suitable nodes available to service your request" But only one node was dropped.

There were no errors or problems on the box - no application or system events, no processor pegs, no memory pegs, but the box was unresponsive for whatever reason and the entire pool failed. 2 minutes later, the pool was back online and functioning properly with 2 nodes. We of course had to flush our CDN to get rid of the nodes message.

The question is this - Is "full http" too much monitoring for simple content and registration Web site?
Would Zeus' "simple http" be a better alternative, which would create fewer false positives with the nodes?

Any help would be appreciated!

Thanks
Chris


0
garriganlyman
Asked:
garriganlyman
  • 2
  • 2
2 Solutions
 
PugglewuggleCommented:
Hi Chris,
You shouldn't need anything more than simple monitoring... I've used Solarwinds before and it does the same thing sometimes... if the ping gets too high for just a little while it will say node down and flood you with emails - NODE DOWN! NODE DOWN! This can get very annoying because if a node really is down you don't know if it's a false alarm or not.
If I were you I'd try simple monitoring for a bit and see what happens. If this doesn't solve the problem, see if there's a software upgrade for the Zeus.
Cheers! Let me know if that helps!
0
 
garriganlymanAuthor Commented:
So they have "started" by raising our connection timeouts on the health file, but I am still not convinced, and am having trouble getting them to change to simple http. I think it's the right answer but I have a bunch of stuff in production that I worry about if something goes wrong. I may build a test pool as a separate VIP and see how that fairs with monitoring.
0
 
PugglewuggleCommented:
Interesting... I would do that and see what happens. Test it and see if there's something up elsewhere.
0
 
garriganlymanAuthor Commented:
Pugglewuggle - your answer put us on the right track I think. Tough to know without "soaking in it" for a while, but as a combination solution, we have set 500 to be an allowed return. This way servers are not dropped from the pool if an application they call from is experiencing errors and returns a 500. There was also some output caching that needed to be done to minimize the 500 errors. On some of our more dynamic apps if a db call failed .NET would keep going back to the well to try and try again, which would throw it into a tailspin. Adding the output caching to expire once an hour, allows us to require one successful transaction per hour which is then cached by the .net page itself. This will keep the 500's at bay.

We also set the zeus to load a custom page which can be done in the health config section of the pool itself to load the same error as the 500 should both servers be dropped from the pool in a catastrophic crash.

Thanks for your input I think it put us on the right track. Your use of the phrase "false alarm" got us thinking differently.

Cheers!
Chris
0

Featured Post

Will You Be GDPR Compliant by 5/28/2018?

GDPR? That's a regulation for the European Union. But, if you collect data from customers or employees within the EU, then you need to know about GDPR and make sure your organization is compliant by May 2018. Check out our preparation checklist to make sure you're on track today!

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now