Solved

Should I be using Full http monitoring or Simple http monitoring for health checks?

Posted on 2008-10-03
4
1,401 Views
Last Modified: 2013-12-02
Hi,

We recently upgraded from a Foundry load balancer which used simple health script checks to keep nodes in the pool to a Zeus Load Balancer which uses ping, the Zeus "Full HTTP" and a health script which is a .NET page which renders a simple "server up" message. Passive monitoring is turned OFF.

We have multiple issues but it basically comes down to this:
1. The Zeus will drop a node for whatever reason. In this mornings case - it was a timeout to the node it dropped
2. Our CDN all of a sudden can't get to orign
3. Our site's go down until we purge our CDN


This morning Zeus dropped one node but displayed the regrettable - "no suitable nodes available to service your request" But only one node was dropped.

There were no errors or problems on the box - no application or system events, no processor pegs, no memory pegs, but the box was unresponsive for whatever reason and the entire pool failed. 2 minutes later, the pool was back online and functioning properly with 2 nodes. We of course had to flush our CDN to get rid of the nodes message.

The question is this - Is "full http" too much monitoring for simple content and registration Web site?
Would Zeus' "simple http" be a better alternative, which would create fewer false positives with the nodes?

Any help would be appreciated!

Thanks
Chris


0
Comment
Question by:garriganlyman
  • 2
  • 2
4 Comments
 
LVL 12

Accepted Solution

by:
Pugglewuggle earned 500 total points
ID: 22641882
Hi Chris,
You shouldn't need anything more than simple monitoring... I've used Solarwinds before and it does the same thing sometimes... if the ping gets too high for just a little while it will say node down and flood you with emails - NODE DOWN! NODE DOWN! This can get very annoying because if a node really is down you don't know if it's a false alarm or not.
If I were you I'd try simple monitoring for a bit and see what happens. If this doesn't solve the problem, see if there's a software upgrade for the Zeus.
Cheers! Let me know if that helps!
0
 

Author Comment

by:garriganlyman
ID: 22682725
So they have "started" by raising our connection timeouts on the health file, but I am still not convinced, and am having trouble getting them to change to simple http. I think it's the right answer but I have a bunch of stuff in production that I worry about if something goes wrong. I may build a test pool as a separate VIP and see how that fairs with monitoring.
0
 
LVL 12

Assisted Solution

by:Pugglewuggle
Pugglewuggle earned 500 total points
ID: 22682806
Interesting... I would do that and see what happens. Test it and see if there's something up elsewhere.
0
 

Author Closing Comment

by:garriganlyman
ID: 31508457
Pugglewuggle - your answer put us on the right track I think. Tough to know without "soaking in it" for a while, but as a combination solution, we have set 500 to be an allowed return. This way servers are not dropped from the pool if an application they call from is experiencing errors and returns a 500. There was also some output caching that needed to be done to minimize the 500 errors. On some of our more dynamic apps if a db call failed .NET would keep going back to the well to try and try again, which would throw it into a tailspin. Adding the output caching to expire once an hour, allows us to require one successful transaction per hour which is then cached by the .net page itself. This will keep the 500's at bay.

We also set the zeus to load a custom page which can be done in the health config section of the pool itself to load the same error as the 500 should both servers be dropped from the pool in a catastrophic crash.

Thanks for your input I think it put us on the right track. Your use of the phrase "false alarm" got us thinking differently.

Cheers!
Chris
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Hello All, I have been training on Multicast for a while now and whenever I start the topic , I find out that my friends /  Colleagues mention that they do not know how to test Multicast Joins. As most of the multicast would be video traffic and …
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question