asked on

VM servers were acting weird this morning some were not pingable others were. Where to start?

They were not on the same host.
Eventually they all came back.
I am trying to determine the root cause.
I had a sample of 10 of my 50 on my 4 host and 6 would timeout and not reply though they didn't all stay timed out for the same length.
Others were able to ping at the same time.
Could console in but all had black screen.

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

I would look at your networking logs, physical switches, trunks, VLANs, network configuration.

operationsIT

ASKER

are there any logs on the Vsphere side? Could ping network gear benchmark and physical servers too

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

many logs in \var\log

if networking is down between host and switches.

compdigit44

Can you give use more detail on your setup

1) What version of ESXi and vCenter?
2) Make and model host?
3) In VMware you using standard or distributed switches?
4) Any recent changes to your network or infrastructure latey?
5) In Vmware on the affect VMs any errors/alerts list for them?
6) What thirdty party app are you using to backup your VM's
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc..
8) On the affected host any errors in the /var/log/vmkwarning.log or /var/log/vmkernel.log (You can upload both files so other can help you review them)

operationsIT

ASKER

1) What version of ESXi and vCenter? 5.5
2) Make and model host? Cisco UCS
3) In VMware you using standard or distributed switches? Standard
4) Any recent changes to your network or infrastructure latey? No
5) In Vmware on the affect VMs any errors/alerts list for them? Nothing
6) What thirdty party app are you using to backup your VM's EMC Networker
7) What type of storage are you using, SAN (iSCSI/FS/FCoE), NFS, Local etc.. NetApp Fiber
8) Log error
2016-12-15T14:15:45.210Z cpu10:33020)VSCSI: 2736: handle 9840(vscsi0:0):Reset [Retries: 0/0] from (vmm0:server1)

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

1. Networking
2. CPU and Memory out of resources.
3. Missing SAN Paths

All of the above will cause the issues which you have seen, and can easily be generated.

Without having more access to the environment, or all the logs, it would be difficult to find the cause.

compdigit44

Would it be possible for you to upload the vmkwarning.log?

operationsIT

ASKER

Here is one reference for one of the VM's. Again it wasn't all the VMs and the ones impacted were on different data stores and different hosts

_ds_01, /vmfs/volumes/30366138-56f910a3;
2016-12-15T14:39:20.570Z [37BC2B70 warning 'Vmsvc.vm:/vmfs/volumes/30366138-56f910a3/server1/server1.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 192.168.99.236:/vol/DS_ds_01 is not accessible
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30366138-56f910a3/server1/server1.vmx'] UpdateStorageAccessibilityStatusInt: Vm's storage accessibility status changed to false
2016-12-15T14:39:20.570Z [37BC2B70 info 'Vmsvc.vm:/vmfs/volumes/30366138-56f910a3/server1/server1.vmx'] VM config backing gone -- marking VM _postAct to MARKBADLOAD.
2016-12-15T14:39:50.508Z [37BC2B70 verbose 'Vmsvc.vm:/vmfs/volumes/30366138-56f910a3/server1/server1.vmx'] Got DSSYS change: [N11HostdCo
mmon18DatastoreSystemMsgE:0x37dc8c90]UPDATE-NOW-DISCONNECTED, 10.100.8.236:/vol/USGUSA00012_SATA_DS, /vmfs/volumes/0c568d2b-133e7f64

ASKER CERTIFIED SOLUTION

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

operationsIT

ASKER

Not that I am aware of, but I will look through the logs. Thank you for the information!