VMware vCenter Alarms - Periodically receive alarms but nothing seems amiss, best practice needed

We're running vCenter 5.5 on a physical server and have most of the default alarms configured. Every few days (there is no set pattern) we get two alarms back to back but there does not appear to be anything wrong. The first is as follows:

[VMware vCenter - Alarm alarm.HostConnectionStateAlarm] alarm.HostConnectionStateAlarm changed status from Green to Red

Target: fqhostname
Previous Status: Green
New Status: Red
 
Alarm Definition:
([Red state Is equal to notResponding] AND [Red state Not equal to standBy])
 
Current values for metric/state:
 State = Not responding AND State = Unknown
 
Description:
Alarm 'Host connection and power state' on fqhostname changed from Green to Red

The next is:

[VMware vCenter - Alarm alarm.HostConnectivityAlarm] Host fqhostname in TPS is not responding

Target: fqhostname
Stateless event alarm
 
Alarm Definition:
([Event alarm expression: Cannot connect host -  incorrect Ccagent] OR [Event alarm expression: Cannot connect host - network error] OR [Event alarm expression: Cannot connect host - time-out] OR [Event alarm expression: Cannot connect host - time-out] OR [Event alarm expression: Host connection lost])
 
Event details:
Host fqhostname in TPS is not responding

Now, I'm sure I can go in and disable these alarms but is that a good thing? I'd like to know if any of my hosts disconnect and are not online, but these "false" warnings are doing little more than desensitizing us to vCenter warnings.

Going into Tasks & Events on the vCenter server reveals that the host shows as disconnected for about 4 seconds and then is back online. which is what I'm assuming triggers the alarms. We do have other backup (I/O) operations running on this vCenter at various times and I guess it's possible that is causing an issue. Maybe. I did not see an option to add duration (> 5 seconds) to the alarms that are being triggered. I should also note that it is not always the same host or hosts that trigger these alarms.

Should I disable these alarms? Or is there a better way?
robklubsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Do you have any flapping network ports, which could be causing this disconnected, not responding.
robklubsAuthor Commented:
Good question. I don't believe so. There are four ports (between to adapters) plus an iDRAC port (it's a Dell server). A port from each is being put together in a team (via Windows 2012 teaming). One team is for the LAN, the other for the SAN for direct-to-SAN backups via Veeam.

All seems fine, but I'm going to check out the event viewer for any network oddities.
robklubsAuthor Commented:
Nothing of note in the event viewer. Checked specifically before and after the times the alerts were generated.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

compdigit44Commented:
I am interesting in what you find out. We are running vcenter 5.1GA with 110 host and randomly get a host disconnect yet everything seems ok. I have turned up logging in vcenter and found when this happens the host drops a high number of heart beats. Our network team states their are no problems but I have my reservations..
piedthepiperCommented:
Check if the ports on the host and switch are set to auto/auto. if they are force them to a set value.

It could also be a DNS issue at that point in time something is going on, is there anything else going on in the environment during that time frame?
robklubsAuthor Commented:
All network adapters are set to 1000 MB, full duplex.

And yes, there are other things going on throughout the day on the server hosting vCenter. Is also hosts Veeam which is facilitating backups plus we have offsite backups running through it at various times. However, the alarm emails are not consistent with this activity.

Is there any logging that I could enable to see if these alarms are even being triggered on the host side?
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
check the /var/logs, you will see any disconnections, of the network interfaces.
robklubsAuthor Commented:
Thanks, I will check that.
compdigit44Commented:
I never thought about DNS !!! Thanks piedthepiper for the great tip..

Would DNS resolution problems appear in the host log?
piedthepiperCommented:
Well it looks like you are using the FQDN, and if there is any issue with DNS atthat time, or the pointer file etc, it could drop off the network.

A bit of a long shot but worth a look, maybe event logs around that time etc
robklubsAuthor Commented:
I've checked the log files and don't see any network disconnections. This is besides, of course, the vCenter alarms saying the host is disconnected. That alarm lasts for 1-2 seconds and then vCenter reads everything back online. This alarm just occurred within the past hour but I just can't find anything wrong.

Re-reading piedthepiper's comments on DNS I'm wondering how to best investigate that. The internal DNS server we're using is older, but no errors are appearing on it. Would there be any harm with switching one of our hosts (temporarily) to a Google DNS server and monitor from there?
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If those are the only Alarms you've got I would not be worried about them, you'll see many more in months to come, if this is new!

VMware HA failover alerts, VDP alerts if you use it, etc etc IPMI alerts, it's knowing what is important to act on.

If you are on the latest patches, it's possible you've got a missing heart beat, possibily network related, but as long as you are not missing packets from VMs.
robklubsAuthor Commented:
We don't use HA or VDP, but we do have an alarm set for IPMI alerts. We do have the latest patches.

You hit it on the head - I just want to know which alerts we need to act on and ideally only get alerted when there is something important enough to warrant such an alert. I wish there was a feature on this particular alert that allowed you to specify length of missing HB's.

For now, I'm going to disable the Host connection and power status alert. Most of the default alarms are there and active so hopefully we get notified if there really is something going on.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
This may help, and you might want to think about it for your Operations Staff, Service Desk etc we use (and it's free), Solar Winds  Alert Central

http://www.solarwinds.com/alertcentral.aspx

It's an appliance, so you just import it, and then start configuring, the problem we have like ALL IT Departments, we have so many Alerts, from vSphere, SANs, Network Switches, ALL the Linux, Windows, Apple Servers, our Inboxes are full of them, it's difficult to know WHICH ones are IMPORTANT which really need looking at.... enter http://www.solarwinds.com/alertcentral.aspx

It allows you to send all alerts to it, and then configure some  workflow around it to send to different teams, ignore, action etc

It's brilliant, and free, takes a little configuration to get it to work for you....

As I type, we currently have 155 alerts in the Solar Queue, and I've been sent two alerts which need further investigation...immediately

a snapshot alert for a VM, and a disk failure in a Synpology NAS, the other 153 alerts, I can check at my leisure, and are they noise, e.g. CPU and Memory alerts for VMs.....

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
robklubsAuthor Commented:
Thanks for the info on Solar Winds. It looks like that is the most elegant solution. Better have a system cull through all the alert emails where you can specify what needs attention and what does not rather than continuing down this path.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.