operationsIT
asked on
VM servers were acting weird this morning some were not pingable others were. Where to start?
They were not on the same host.
Eventually they all came back.
I am trying to determine the root cause.
I had a sample of 10 of my 50 on my 4 host and 6 would timeout and not reply though they didn't all stay timed out for the same length.
Others were able to ping at the same time.
Could console in but all had black screen.
Eventually they all came back.
I am trying to determine the root cause.
I had a sample of 10 of my 50 on my 4 host and 6 would timeout and not reply though they didn't all stay timed out for the same length.
Others were able to ping at the same time.
Could console in but all had black screen.
I would look at your networking logs, physical switches, trunks, VLANs, network configuration.
ASKER
are there any logs on the Vsphere side? Could ping network gear benchmark and physical servers too
many logs in \var\log
if networking is down between host and switches.
if networking is down between host and switches.
Can you give use more detail on your setup
1) What version of ESXi and vCenter?
2) Make and model host?
3) In VMware you using standard or distributed switches?
4) Any recent changes to your network or infrastructure latey?
5) In Vmware on the affect VMs any errors/alerts list for them?
6) What thirdty party app are you using to backup your VM's
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc..
8) On the affected host any errors in the /var/log/vmkwarning.log or /var/log/vmkernel.log (You can upload both files so other can help you review them)
1) What version of ESXi and vCenter?
2) Make and model host?
3) In VMware you using standard or distributed switches?
4) Any recent changes to your network or infrastructure latey?
5) In Vmware on the affect VMs any errors/alerts list for them?
6) What thirdty party app are you using to backup your VM's
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc..
8) On the affected host any errors in the /var/log/vmkwarning.log or /var/log/vmkernel.log (You can upload both files so other can help you review them)
ASKER
1) What version of ESXi and vCenter? 5.5
2) Make and model host? Cisco UCS
3) In VMware you using standard or distributed switches? Standard
4) Any recent changes to your network or infrastructure latey? No
5) In Vmware on the affect VMs any errors/alerts list for them? Nothing
6) What thirdty party app are you using to backup your VM's EMC Networker
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc.. NetApp Fiber
8) Log error
2016-12-15T14:15:45.210Z cpu10:33020)VSCSI: 2736: handle 9840(vscsi0:0):Reset [Retries: 0/0] from (vmm0:server1)
2) Make and model host? Cisco UCS
3) In VMware you using standard or distributed switches? Standard
4) Any recent changes to your network or infrastructure latey? No
5) In Vmware on the affect VMs any errors/alerts list for them? Nothing
6) What thirdty party app are you using to backup your VM's EMC Networker
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc.. NetApp Fiber
8) Log error
2016-12-15T14:15:45.210Z cpu10:33020)VSCSI: 2736: handle 9840(vscsi0:0):Reset [Retries: 0/0] from (vmm0:server1)
1. Networking
2. CPU and Memory out of resources.
3. Missing SAN Paths
All of the above will cause the issues which you have seen, and can easily be generated.
Without having more access to the environment, or all the logs, it would be difficult to find the cause.
2. CPU and Memory out of resources.
3. Missing SAN Paths
All of the above will cause the issues which you have seen, and can easily be generated.
Without having more access to the environment, or all the logs, it would be difficult to find the cause.
Would it be possible for you to upload the vmkwarning.log?
ASKER
Here is one reference for one of the VM's. Again it wasn't all the VMs and the ones impacted were on different data stores and different hosts
_ds_01, /vmfs/volumes/30366138-56f 910a3;
2016-12-15T14:39:20.570Z [37BC2B70 warning 'Vmsvc.vm:/vmfs/volumes/30 366138-56f 910a3/serv er1/server 1.vmx'] UpdateStorageAccessibility StatusInt: The datastore 192.168.99.236:/vol/DS_ds_ 01 is not accessible
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30 366138-56f 910a3/serv er1/server 1.vmx'] UpdateStorageAccessibility StatusInt: Vm's storage accessibility status changed to false
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30 366138-56f 910a3/serv er1/server 1.vmx'] VM config backing gone -- marking VM _postAct to MARKBADLOAD.
2016-12-15T14:39:50.508Z [37BC2B70 verbose 'Vmsvc.vm:/vmfs/volumes/30 366138-56f 910a3/serv er1/server 1.vmx'] Got DSSYS change: [N11HostdCo
mmon18DatastoreSystemMsgE: 0x37dc8c90 ]UPDATE-NO W-DISCONNE CTED, 10.100.8.236:/vol/USGUSA00 012_SATA_D S, /vmfs/volumes/0c568d2b-133 e7f64
_ds_01, /vmfs/volumes/30366138-56f
2016-12-15T14:39:20.570Z [37BC2B70 warning 'Vmsvc.vm:/vmfs/volumes/30
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30
2016-12-15T14:39:50.508Z [37BC2B70 verbose 'Vmsvc.vm:/vmfs/volumes/30
mmon18DatastoreSystemMsgE:
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Not that I am aware of, but I will look through the logs. Thank you for the information!