esx 4.0 random reboots

I have an ESX 4 server which is rebooting daily and I'm having a hard time figuring out what is going on with it.  I've tried to export the logs to try and open up a case with VMWare but the diagnostic bundle never generates.  It always comes up with it saying log files are missing.

What I believe triggered all this is a bad hard drive, I have a raid 1 set and one of the drives failed, I've since replaced the failed drive and monitored the array rebuilding, but now this server reboots everyday.

I'm obviously not a VM expert so any help would be appreciated.  This is a BL460c G1 which is currently in maintenance mode.
PCVICAsked:
Who is Participating?
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Okay, so all VMs are on the SAN?

Is this this the only single ESX 4.0 server?

Does it reboot daily?

I think personally, if the server reboots daily, I would do the foillowing

1. I would install ESXi 4.0 U3 on a USB flash drive and install it on the Internal USB Connector inside the blade, and remove the two disks temporary. ESXi and ESX are compatible, and it will connect to your existing SAN, and run the VMs, with no issues and no changes.

This will prove if it's ESX 4.0 and/or disks drives.
0
 
IanThCommented:
do you get a purple screen of death
why arent you using 4.1 ?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
One of the many reasons why ESX can reboot is due to bad memory or processor, or overheating.

Check the memory with http://www.memtest.org/
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
d33mCommented:
you can also try to check hardware/system logs on itegrated light out (iLO, if available) referring to this blade server.

possible you will find them usefull.
0
 
PCVICAuthor Commented:
No purple screen, for now all of our ESX systems will stay at 4.0

I've used ILO, reviewed the IML log and its been clean since the recovery of the array.

I've run hardware diagnostics and it has found nothing, I've gone as far as removing the blade reseating all the hardware.  
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Check the memory with http://www.memtest.org/
0
 
PCVICAuthor Commented:
I will try the memory test.
0
 
PCVICAuthor Commented:
Looks like its going to take forever, I'll let it finish though.  Its been running for 52mins and is at 12%.
0
 
PCVICAuthor Commented:
I let it run over the weekend and found no issues with Memory.  The whole time while it was booted up to the memtest ISO it never rebooted.  I think we can safely rule out hardware as the issue
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:

Hardware is clearly okay, unless a VMware drive is causing issue with hardware, or a software fault in ESX 4. What build are you running?

Are you running the HP version of ESX, e.g. have you got HP Agents installed?

It might be worth stopping or uninstalling the HP Agents temporarily.

Is ESX 4 installed on the internal hard drives - RAID 1?

Any SAN attached?
0
 
PCVICAuthor Commented:
Running ESX 4.0.0, 261974

There is no hardware agents installed.  ESX is installed on internal drives, RAID 1.  

It is attached to SAN
0
 
IanThCommented:
does your  vi client get any logs
0
 
PCVICAuthor Commented:
I've rebuilt the system onto two new drives.  So far the system has been up and running for 1 day and 20 hrs.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Very Good, duff disks maybe?
0
 
PCVICAuthor Commented:
I have a feeling it had to do with the array since a failed drive is what triggered the symptoms.  There was no logs anywhere though that can confirm it.

Anyhow the issue appears to be resolved, I just hate the paper work that comes along with re-introducing the host back into production.

Thanks hanccocka
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.