?
Solved

powered down vmhost reseated memory modules after powering back up, I can no longer migrate VMs from other hosts on the farm, HA errors

Posted on 2011-10-09
10
Medium Priority
?
866 Views
Last Modified: 2012-05-12
We have a vmware farm with 4 hosts. I recently powered down one of the hosts to reseat some memory modules. I first migrated all VMs to other hosts on the farm, then took the host into maintanence mode, then powered down. I reseated my memory modules and rebooted a couple times while I was on the phone with IBM, then powered back up. Everything looked normal. I opened vsphere and took the host out of maintanance mode, this seemed to take longer then normal. I also saw something about a HA error or not starting. All the hosts in the cluster now have red ! over them. I tried to start migrating VMs back to my IBM host server and I get this error "the host is reporting errors in an attempt to provide HA support". I have been through this procedure several times,
1. Migrating VMs to other hosts
2. Taking the host to maintanence mode
3. powering down
4. adding memory or working with the hardware
5. powering the host back up manually
6. logging into vsphere and connecting
7. exiting maintance mode
8. Powering on
9. migrating VMs back to original host

I have never had this issue before. I did not touch any of the Esxi configuration. I can still log into the ESXi host and see the OS version is esxi 4.1.0 build 260247

I just need to get HA running again so I can migrate devices back to this HOST. All advice and comments are welcome.....thanks
0
Comment
Question by:Thor2923
  • 5
  • 3
  • 2
10 Comments
 
LVL 30

Expert Comment

by:IanTh
ID: 36939114
please put the logs from vcenter server / viclient
0
 
LVL 24

Accepted Solution

by:
Luciano Patrão earned 1200 total points
ID: 36939120
Hi

What have you change regarding the memory? Did you had more memory on it?

Have you test you memory to see if everything is ok? When I put an important VMware host in production I always do a memory test for any possible problems. I use Memtest86+ http://www.memtest.org/

With this test I have bypass some problems that may encounter with a fault memory.

Also check when the server is power on if the memory in the BIOS is the correct memory that you have added into the server.

Also did you reserve any memory/cpu in any Pool, or VMs? This may also presents problems when enable HA if there is no “Insufficient resources to satisfy HA failover”"

Any changes regarding IP, Host name etc.?

If there is no problem with the memorys, try to check all HA resources and see if everything is ok

Check here the HA tests:

http://kb.vmware.com/kb/1001596

Hope this can help

Jail
0
 
LVL 1

Author Comment

by:Thor2923
ID: 36939163
Thanks for all the info....my situation began over a week ago. Our IBM series x only had 32GB of memory and I purchased 16 4GB modules to upgrade to 64GB. I went through the procedure I mentioned above to power the IBM server down and manually open the server up and replace the memory. When I booted back up I noticed the server only saw 60GB of the memory, so I called IBM. We tried to go into the BIOS, but realized a previous engineer had set an admin password and IBM had no "free" way for me to bypass it, so I downloaded some diagnostics software at IBM's direction and booted from it. I ended up creating a file from that software that I emailed to IBM. IBM reported back that "DIMM slot 1 was being reported as unknown", explaining why 4GB was not being seen. The diagnosis was that I had installed a bad memory module and I went back to my vendor for a replacement. A week later the replacement arrived and I went to install it yesterday. I went through the same produre that I had done before and powered up the server again and I noticed it still only displayed 60GB of memory.  I contacted IBM and after speaking with a couple of people it was finally determined that Dimm slot one would have to be "reenabled since the IBM server would have disabled it upon seeing a bad memory module" I do not have the BIOS admin password and after contacting higher levels of IBM support, I was told the "board inside had to be replaced to reset the password, there were no switches, jumpers or tools that could help me" That was going to cost over 2000.00 plus labor, so I put everything back together and powered it back up with 60GB. I did not really make any changes other then replacing one of the memory sticks with the same brand of Kingston memory from the same vendor. The server ran properly on our VMWARE farm for a week with 60GB of the new memory.  That is pretty much my story. As you can see most of it is hardware and IBM not vmware, so I left some details out in my original post, but if this gives anyone any clues as to what happened, I would appreciate it...thanks....and I will check out your links
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:Thor2923
ID: 36939169
IanTH, where do I get those logs exactly?? I log into the HOST server and browse to the path you provided? /viclient??
0
 
LVL 24

Expert Comment

by:Luciano Patrão
ID: 36939208
Hi

You can extract the logs from you vSphere Client.

In the vClient choose "file" then export.

But the log from the HA is the file /var/log/vmware/vpx/vpxa.log

Jail
0
 
LVL 1

Author Comment

by:Thor2923
ID: 36939304
well I went to vsphere and diid the export of system logs....I got 3 files, two are zip...I see a zip file for our vcenter and under that there is a log folder with 233 MB of data. I see lots of sub folders and not /var

Should I unzip everything and search for a specific file??
0
 
LVL 1

Author Comment

by:Thor2923
ID: 36939349

I exported the events warnings and errors from yesterday and got the output below...does this mean anything to anyone??

ype      Time                    User                    Description        
error     10/08/11 12:44:41 PM                            Host 10.3.2.17 in PTI-Peak10 is not responding
error     10/08/11 3:17:41 PM                             Host 10.3.2.17 in PTI-Peak10 is not responding
error     10/08/11 6:56:27 PM                             Unable to contact a primary HA agent in cluster PTI-Cluster in PTI-Peak10
error     10/08/11 7:30:51 PM                             Host 10.3.2.17 in PTI-Peak10 is not responding

0
 
LVL 1

Author Comment

by:Thor2923
ID: 36939403
I think i have it!!! I was doing some reading and went to the cluster/edit settings and uchecked HA, let it reconfigure then rechecked it and HA reconfigured on all four hosts. I saw the alarm on the host in question and it was a network connection issue...I think we all my reboots with IBM I did some without the network cables connect or maybe not all of them connected. After HA was back on the red ! went away on all the hosts and the cluster and when I changed status to green on the host in question the ! went away from that also. I have already migrated 2 VMs back to the host that was causing the issue. Thanks guys you are all been worderful sticking it out with me today.
0
 
LVL 24

Expert Comment

by:Luciano Patrão
ID: 36939437
Hi

Glade that you mange. Is always a good think :)

Jail
0
 
LVL 30

Assisted Solution

by:IanTh
IanTh earned 800 total points
ID: 36941365
learn to use the vsphere viclient logs properly is the key to diagnosing vmware errors
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I will show you HOW TO: Suppress Configuration Issues and Warnings Alert displayed in Summary status for ESXi 6.5 after enabling SSH or ESXi Shell.
In this article, I will show you HOW TO: Create your first Windows Virtual Machine on a VMware vSphere Hypervisor 6.5 (ESXi 6.5) Host Server, the Windows OS we will install is Windows Server 2016.
Teach the user how to install ESXi 5.5 and configure the management network System Requirements: ESXi Installation:  Management Network Configuration: Management Network Testing:
Teach the user how to use vSphere Update Manager to update the VMware Tools and virtual machine hardware version Open vSphere Client: Review manual processes for updating VMware Tools and virtual hardware versions: Create a new baseline group in vSp…
Suggested Courses

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question