ESX servers rebooted at the same time

Two of our three ESX servers got rebooted last night at around the same exact time.
(1:30 AM in the morning)
I thought that it might be power outage , just checked .. they are using different power outlets.

Is tehre any built-in auto-upgrade or a process that can cause this ? how can I figure the root cause ?

log file shows that host enters maintenances mode :

VpxdMoHost::UpdateConnectionStateInt] <ESX IP > Marked  as dirty.
LVL 10
akhalighiAsked:
Who is Participating?
 
Danny McDanielConnect With a Mentor Clinical Systems AnalystCommented:
the "marked as dirty" just means that there has been some kind of change on the host and data needs to be pulled from the host so the db can be updated with the changes.

There is no auto-update within VMware.  You can schedule an update with Update Manager, but it requires a user to set-up the job.

Check the event/tasks tabs on the hosts to see if there was some kind of operation that caused the reboot.
You should also check the logs on the hosts.  /var/log/messages /var/log/vmkernel and /var/log/vmware/hostd .  Remember the reboot would have caused the log files to roll, so you will need to look at the .# versions of those logs.   with 4.x they are compressed so you will need to use 'gzip -d' to uncompress them.
0
 
akhalighiAuthor Commented:
it happened again today :( I downloaded a report bundle and checked the log files that you said but all of them are referring to AFTER reboot .

I am getting these errors in vpxd logs :

[2010-11-30 08:04:01.021 04012 error 'App'] [VpxdVmomi] Got vmacore exception: The semaphore timeout period has expired.

[2010-11-30 08:04:01.021 04012 error 'App'] [VpxdVmomi] Backtrace:
backtrace[00] eip 0x01a1c66d ?AbortProcess@System@Vmacore@@YAXXZ

backtrace[01] eip 0x01a1d0a7 ?CreateQuickBacktrace@SystemFactoryImpl@System@Vmacore@@UAEXAAV?$Ref@VBacktrace@System@Vmacore@@@3@@Z

backtrace[02] eip 0x0195f300 ??0Throwable@Vmacore@@QAE@ABV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z

,.....


That seems to be the root cause , any clues ?
0
 
Danny McDanielClinical Systems AnalystCommented:
the backtrace's in the vpxd log just indicate potential problems with the vCenter service.  http://communities.vmware.com/community/vmtn/vsphere/automationtools/vima has a download for a management appliance that you could on the ESXi host that isn't rebooting.  You can set it up to be your syslog server so that all logging information goes to it and it won't be dependent upon the local storage of the ESXi hosts.

Since it is two of the hosts at the same time, twice (or more), I would suspect something external to the hosts to be the cause, though.  Is there any hardware logging on the servers?
0
 
akhalighiAuthor Commented:
No . we don't have hardware logging on ESX servers but if it was hardware it shouldn't occure right at the same moment on both servers. these two servers are using different power oulets ; one of them is connected to UPS ( with the healthy ESX server) and the other one is connected to a different power outlet.

If for some reason , ESX servers get disconnected from VCenter ; it shouldn't stop VMs right ? in my case it shutdown some VMs and reboot some others....
0
 
Danny McDanielClinical Systems AnalystCommented:
Nope, getting disconnected from vCenter won't cause VM's to go down and/or hosts to reboot.  ESX won't even power down VM's when licenses or evaluations expire...it'll just prevent VM's from being powered on.  The only thing that I can think of that would cause hosts to power down, would be the DPM functionality but that would put them into a sleep mode and not reboot them.
0
All Courses

From novice to tech pro — start learning today.