Link to home
Start Free TrialLog in
Avatar of Thor2923
Thor2923Flag for United States of America

asked on

powered down vmhost reseated memory modules after powering back up, I can no longer migrate VMs from other hosts on the farm, HA errors

We have a vmware farm with 4 hosts. I recently powered down one of the hosts to reseat some memory modules. I first migrated all VMs to other hosts on the farm, then took the host into maintanence mode, then powered down. I reseated my memory modules and rebooted a couple times while I was on the phone with IBM, then powered back up. Everything looked normal. I opened vsphere and took the host out of maintanance mode, this seemed to take longer then normal. I also saw something about a HA error or not starting. All the hosts in the cluster now have red ! over them. I tried to start migrating VMs back to my IBM host server and I get this error "the host is reporting errors in an attempt to provide HA support". I have been through this procedure several times,
1. Migrating VMs to other hosts
2. Taking the host to maintanence mode
3. powering down
4. adding memory or working with the hardware
5. powering the host back up manually
6. logging into vsphere and connecting
7. exiting maintance mode
8. Powering on
9. migrating VMs back to original host

I have never had this issue before. I did not touch any of the Esxi configuration. I can still log into the ESXi host and see the OS version is esxi 4.1.0 build 260247

I just need to get HA running again so I can migrate devices back to this HOST. All advice and comments are welcome.....thanks
Avatar of IanTh
IanTh
Flag of United Kingdom of Great Britain and Northern Ireland image

please put the logs from vcenter server / viclient
ASKER CERTIFIED SOLUTION
Avatar of Luciano Patrão
Luciano Patrão
Flag of Portugal image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Thor2923

ASKER

Thanks for all the info....my situation began over a week ago. Our IBM series x only had 32GB of memory and I purchased 16 4GB modules to upgrade to 64GB. I went through the procedure I mentioned above to power the IBM server down and manually open the server up and replace the memory. When I booted back up I noticed the server only saw 60GB of the memory, so I called IBM. We tried to go into the BIOS, but realized a previous engineer had set an admin password and IBM had no "free" way for me to bypass it, so I downloaded some diagnostics software at IBM's direction and booted from it. I ended up creating a file from that software that I emailed to IBM. IBM reported back that "DIMM slot 1 was being reported as unknown", explaining why 4GB was not being seen. The diagnosis was that I had installed a bad memory module and I went back to my vendor for a replacement. A week later the replacement arrived and I went to install it yesterday. I went through the same produre that I had done before and powered up the server again and I noticed it still only displayed 60GB of memory.  I contacted IBM and after speaking with a couple of people it was finally determined that Dimm slot one would have to be "reenabled since the IBM server would have disabled it upon seeing a bad memory module" I do not have the BIOS admin password and after contacting higher levels of IBM support, I was told the "board inside had to be replaced to reset the password, there were no switches, jumpers or tools that could help me" That was going to cost over 2000.00 plus labor, so I put everything back together and powered it back up with 60GB. I did not really make any changes other then replacing one of the memory sticks with the same brand of Kingston memory from the same vendor. The server ran properly on our VMWARE farm for a week with 60GB of the new memory.  That is pretty much my story. As you can see most of it is hardware and IBM not vmware, so I left some details out in my original post, but if this gives anyone any clues as to what happened, I would appreciate it...thanks....and I will check out your links
IanTH, where do I get those logs exactly?? I log into the HOST server and browse to the path you provided? /viclient??
Hi

You can extract the logs from you vSphere Client.

In the vClient choose "file" then export.

But the log from the HA is the file /var/log/vmware/vpx/vpxa.log

Jail
well I went to vsphere and diid the export of system logs....I got 3 files, two are zip...I see a zip file for our vcenter and under that there is a log folder with 233 MB of data. I see lots of sub folders and not /var

Should I unzip everything and search for a specific file??

I exported the events warnings and errors from yesterday and got the output below...does this mean anything to anyone??

ype      Time                    User                    Description        
error     10/08/11 12:44:41 PM                            Host 10.3.2.17 in PTI-Peak10 is not responding
error     10/08/11 3:17:41 PM                             Host 10.3.2.17 in PTI-Peak10 is not responding
error     10/08/11 6:56:27 PM                             Unable to contact a primary HA agent in cluster PTI-Cluster in PTI-Peak10
error     10/08/11 7:30:51 PM                             Host 10.3.2.17 in PTI-Peak10 is not responding

I think i have it!!! I was doing some reading and went to the cluster/edit settings and uchecked HA, let it reconfigure then rechecked it and HA reconfigured on all four hosts. I saw the alarm on the host in question and it was a network connection issue...I think we all my reboots with IBM I did some without the network cables connect or maybe not all of them connected. After HA was back on the red ! went away on all the hosts and the cluster and when I changed status to green on the host in question the ! went away from that also. I have already migrated 2 VMs back to the host that was causing the issue. Thanks guys you are all been worderful sticking it out with me today.
Hi

Glade that you mange. Is always a good think :)

Jail
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial