Im looking for a new network monitoring system to replace our Nagios installation. Nagios is nice and free but it lacks the details and configurability we are looking for.
Here is some background. He have a small data center, about 82 racks and about 250 servers. 95% of the servers are HPs with a few Dells and old Compaqs littering the place.
What I need to do is to have some monitoring system in place that actively monitors every server 24/7. When I say monitor, what I need is more then just the fact that said server is ping-able. I need to know if a server has crashed, blue screened, etc. I also want to know when drives are getting close to being 20% full. Now our servers have dedicated OS drive partition should never get less then 40% or so, but some of our data drives can be upwards of 4-5 TB. Now when a drive gets less then 20% full, we are talking about 800-1000 GB free. I would also like to know when systems are starting to over heat if possible, when fans go out, if possible and when drives go out, again, if possible. One thing Nagios does that we dont like is repeated reminders over and over till the cows come home about tripped events. This fills up everyones mail box with redundant messages and many times they get ignored by people because its considered spam by some.
What systems do you guys/gals have in place?