Malevolo
asked on
Best practices for a NMS solution
I've been using SolarWinds ipMonitor 8.5 for a while now to remotely monitor some servers via SNMP however the system isn't very efficient and doesn't give me what I'm looking for in some cases. I am going rolling out a new instance of ipMonitor 9.x shortly and am currently tinkering with SolarWinds Orion NPM 9. See below for additional details, and eventually my question =)
I have ~35 clients that I manage. An average client has between 3 and 5 servers. We currently have one-way VPN connections to all clients so we can remote in for work. We could setup rules for two-way communication if required.
All clients are running HP & Dell systems. All systems have their Vendor Management utilities installed. Windows servers across the board. About 20% of our servers are running VMware ESX stand-alone servers.
I'm looking for monitoring that will alert me when any physical hardware has issues (ie. dead physical disk or predictive failure, etc.) in addition to windows/software alerting (ie. partitions running low on space, services not running, etc.).
Not to point out the obvious, but the key is to acheive a reliable system in the most network efficient manner possible. I assume this would involve setting up some SNMP traps instead of polling every server every 5 minutes but I'm not sure how exactly. I'm completely open to any and all ideas. Any references and or documentation are much appreciated as well. Ultimately if someone could draft a model for me that says for every client you should do X, Y, Z and then A, B, C from my end to get this up and running, it would be great. Thanks in advance.
I have ~35 clients that I manage. An average client has between 3 and 5 servers. We currently have one-way VPN connections to all clients so we can remote in for work. We could setup rules for two-way communication if required.
All clients are running HP & Dell systems. All systems have their Vendor Management utilities installed. Windows servers across the board. About 20% of our servers are running VMware ESX stand-alone servers.
I'm looking for monitoring that will alert me when any physical hardware has issues (ie. dead physical disk or predictive failure, etc.) in addition to windows/software alerting (ie. partitions running low on space, services not running, etc.).
Not to point out the obvious, but the key is to acheive a reliable system in the most network efficient manner possible. I assume this would involve setting up some SNMP traps instead of polling every server every 5 minutes but I'm not sure how exactly. I'm completely open to any and all ideas. Any references and or documentation are much appreciated as well. Ultimately if someone could draft a model for me that says for every client you should do X, Y, Z and then A, B, C from my end to get this up and running, it would be great. Thanks in advance.
Hi,
I can recommend you a few options for service-based NMS;
Propriety-NMS
SolarWinds Orion www.solarwinds.com Propriety
Smarts www.smarts.com Propriety
WhatsupGold www.whatsupgold.com Propriety
EM7 www.sciencelogic.com Propriety
CA ca.com Propriety
ServerAlive www.woodstone.nu Propriety
Observer www.netinst.com Propriety
Service Monitors
Hound-Dog www.hounddogiseasy.com Propriety
Level Platform www.levelplatforms.com Propriety
Kaseya www.kaseya.com Propriety
N-Able www.n-able.com Propriety
Open-Source NMS
ZenOSS www.zenoss.com LAMP based NMS
Nagios www.nagios.org LAMP based NMS
JFFNMS www.jffnms.org LAMP based NMS
OpenNMS www.opennms.org LAMP based NMS
Zabbix www.zabbix.com LAMP based NMS
Hyperic HQ www.hyperic.com LAMP based NMS
GroundWork www.groundworkopensource.com LAMP based NMS
I can recommend you a few options for service-based NMS;
Propriety-NMS
SolarWinds Orion www.solarwinds.com Propriety
Smarts www.smarts.com Propriety
WhatsupGold www.whatsupgold.com Propriety
EM7 www.sciencelogic.com Propriety
CA ca.com Propriety
ServerAlive www.woodstone.nu Propriety
Observer www.netinst.com Propriety
Service Monitors
Hound-Dog www.hounddogiseasy.com Propriety
Level Platform www.levelplatforms.com Propriety
Kaseya www.kaseya.com Propriety
N-Able www.n-able.com Propriety
Open-Source NMS
ZenOSS www.zenoss.com LAMP based NMS
Nagios www.nagios.org LAMP based NMS
JFFNMS www.jffnms.org LAMP based NMS
OpenNMS www.opennms.org LAMP based NMS
Zabbix www.zabbix.com LAMP based NMS
Hyperic HQ www.hyperic.com LAMP based NMS
GroundWork www.groundworkopensource.com LAMP based NMS
Orion APM application monitoring you need to know whats happening with your applications.
for Orion APM go too http://www.solarwinds.com/products/orion/application_monitor/
for demo go too http://oriondemo.solarwinds.com/Orion/Apm/Summary.aspx
for Orion APM go too http://www.solarwinds.com/products/orion/application_monitor/
for demo go too http://oriondemo.solarwinds.com/Orion/Apm/Summary.aspx
ASKER
Thanks all for the information. I am definitely moving forward with ipMonitor 9 and Orion NPM (with the APM module). My question, which admittedly was all over the place, was along the terms of how should I structure this for a sound monitoring platform?
How should I handle SNMP Traps, should I integrate Syslog servers? For example, to simplify, if I had only only two clients with 5 servers each, what would be the best way to monitor the servers at the application level (windows critical events in event log) and at the physical level (physical disk with predictive failur or a raid array in degraded status). Do I set up on Syslog server for each client and have it collect information from all machines, and then have that syslog server report to my Orion NPM or do I need to install the syslog server on every server and have them report to Orion NPM? How about for SNMP traps? Do I have every server there set to report to me directly, or should I have them all report to one server there and then have that one report to my Orion NPM once thresholds or limits are hit? Thanks.
How should I handle SNMP Traps, should I integrate Syslog servers? For example, to simplify, if I had only only two clients with 5 servers each, what would be the best way to monitor the servers at the application level (windows critical events in event log) and at the physical level (physical disk with predictive failur or a raid array in degraded status). Do I set up on Syslog server for each client and have it collect information from all machines, and then have that syslog server report to my Orion NPM or do I need to install the syslog server on every server and have them report to Orion NPM? How about for SNMP traps? Do I have every server there set to report to me directly, or should I have them all report to one server there and then have that one report to my Orion NPM once thresholds or limits are hit? Thanks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
There's a lot of work to do to put a complete system together like you are looking for. I have done this and I charged the clients lots of money to a) set it up and b) keep it running and up to date. So I charge an upfront fee and a monthly recurring fee.
I have not found anything that does every thing I want it to do. Zenoss (OpenNMS is also free, but the SNMP trap integration is a little chaotic) is free and beats the heck out of paying for HP OpenView or HP OpenView Operations. HP OVO is a fairly good product and can let you know if certain process aren't running and can restart them if you want . . . but it is pricey and requires a LOT of attention. I used it when I worked for GTE/Verizon and the Oracle and Web Logic integration eventually defeated me and so I wrote my own Java apps that did precisely what I wanted.
One of the attractive features about Zenoss (most NMS's have this feature but it is very simple in Zenoss) is the ability to monitor scads of devices but only generate alerts on selected devices and then only on things that I really care about being paged in the middle of the night.