Solved

Should I be using Full http monitoring or Simple http monitoring for health checks?

Posted on 2008-10-03
4
1,399 Views
Last Modified: 2013-12-02
Hi,

We recently upgraded from a Foundry load balancer which used simple health script checks to keep nodes in the pool to a Zeus Load Balancer which uses ping, the Zeus "Full HTTP" and a health script which is a .NET page which renders a simple "server up" message. Passive monitoring is turned OFF.

We have multiple issues but it basically comes down to this:
1. The Zeus will drop a node for whatever reason. In this mornings case - it was a timeout to the node it dropped
2. Our CDN all of a sudden can't get to orign
3. Our site's go down until we purge our CDN


This morning Zeus dropped one node but displayed the regrettable - "no suitable nodes available to service your request" But only one node was dropped.

There were no errors or problems on the box - no application or system events, no processor pegs, no memory pegs, but the box was unresponsive for whatever reason and the entire pool failed. 2 minutes later, the pool was back online and functioning properly with 2 nodes. We of course had to flush our CDN to get rid of the nodes message.

The question is this - Is "full http" too much monitoring for simple content and registration Web site?
Would Zeus' "simple http" be a better alternative, which would create fewer false positives with the nodes?

Any help would be appreciated!

Thanks
Chris


0
Comment
Question by:garriganlyman
  • 2
  • 2
4 Comments
 
LVL 12

Accepted Solution

by:
Pugglewuggle earned 500 total points
ID: 22641882
Hi Chris,
You shouldn't need anything more than simple monitoring... I've used Solarwinds before and it does the same thing sometimes... if the ping gets too high for just a little while it will say node down and flood you with emails - NODE DOWN! NODE DOWN! This can get very annoying because if a node really is down you don't know if it's a false alarm or not.
If I were you I'd try simple monitoring for a bit and see what happens. If this doesn't solve the problem, see if there's a software upgrade for the Zeus.
Cheers! Let me know if that helps!
0
 

Author Comment

by:garriganlyman
ID: 22682725
So they have "started" by raising our connection timeouts on the health file, but I am still not convinced, and am having trouble getting them to change to simple http. I think it's the right answer but I have a bunch of stuff in production that I worry about if something goes wrong. I may build a test pool as a separate VIP and see how that fairs with monitoring.
0
 
LVL 12

Assisted Solution

by:Pugglewuggle
Pugglewuggle earned 500 total points
ID: 22682806
Interesting... I would do that and see what happens. Test it and see if there's something up elsewhere.
0
 

Author Closing Comment

by:garriganlyman
ID: 31508457
Pugglewuggle - your answer put us on the right track I think. Tough to know without "soaking in it" for a while, but as a combination solution, we have set 500 to be an allowed return. This way servers are not dropped from the pool if an application they call from is experiencing errors and returns a 500. There was also some output caching that needed to be done to minimize the 500 errors. On some of our more dynamic apps if a db call failed .NET would keep going back to the well to try and try again, which would throw it into a tailspin. Adding the output caching to expire once an hour, allows us to require one successful transaction per hour which is then cached by the .net page itself. This will keep the 500's at bay.

We also set the zeus to load a custom page which can be done in the health config section of the pool itself to load the same error as the 500 should both servers be dropped from the pool in a catastrophic crash.

Thanks for your input I think it put us on the right track. Your use of the phrase "false alarm" got us thinking differently.

Cheers!
Chris
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Hi there, This article summarizes what you need if you are going to set up your home or small business Network Attached Storage (NAS) to be accessible from the internet. Of course there are configuration differences based on your NAS or router ma…
Before I go to far, let's explain HA (High Availability) and why you should consider it.  High availability is the mechanism used to provide redundancy to any service at the same site and appears as a single service to the users of that service.  As…
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now