Solved

weird ESXi host: some VMs in it unpingable : narrowed to vmnic

Posted on 2013-05-23
6
1,544 Views
Last Modified: 2013-05-24
I have 5  x3650 M3 (let's call them s4, s5, s6, s7, s8) all of the same hardware
specs in one cluster : all are on ESXi 5.0 Upd 1.

Each & everyone of the ESXi host's vmnic are the same ie:
vmnic0 ==> Management VLAN
vmnic1 ==> vMotion
Quad NIC 1 has vmnic4,5 connected for Prod VLANs
Quad NIC 2 has vmnic8,9 connected for the same Prod VLANs


They have been running fine since mid last year till about
1 month back when 1-3 VMs in s4 would suddenly become
unpingable (as reported by Tivoli & I would try to Rdp into
them upon getting Tivoli alert but can't access).

Troubleshooting done so far:
a) vCenter could still console into all the VMs (affected &
    unaffected VMs) in s4 but the affected VMs can't ping
    to their gateway IP address though "ipconfig" still show
    the IP addresses of the affected VMs & their respective
    gateway IP addressses.  The affected VMs in s4 could ping
    other VMs in s4 that are of the same VLAN/subnet but
    not other VMs of different VLANs/subnets in s4.  Affected
    VMs also can't ping other VMs of same VLAN that sit
    inside s5-s8

b) while inside the affected VMs console, I noticed
    under Win 2008 R2 Standard x64 the affected
    NIC shows as "Unidentified network".  For those
    VMs not affected still in s4, it shows as "corp.local"

c) from vCenter's "Edit Settings" deleted the NIC
    adapter & recreate back : still the same issue

d) reboot the affected VMs & after they've booted
    up (& still stay in s4), still no joy

e) the moment I vMotioned out the affected VMs
    to another host (s5 - s8), the VMs became
    pingable again
   
f) from vCenter, selected the vSwitch, "Managed Hosts",
    check to select s4 & then  we disabled vmnic4/5, all VMs in
    s4 (including those VMs that were pingable so far while in s4)
    immediately became unpingable.  Then we enable back
    vmnic4/5 & disabled vmnic8/9 (the other pair of NIC ports
    on the other QUAD NIC), all VMs in S4 became pingable
    again. Got IBM to replace this 'suspected' Quad NIC but
    still no joy.
    On the pair of stacked Cisco C3750 switches that vmnic8/9
    are connected, it showed high packet drops with 0 payload
    (ie input rate = output rate = 0 kbps)

g) All LEDs on the pair of switches are green & all LEDs on
    s4's NICs are green

h) I transferred the cable of vmnic8 to a free port vmnic1
    (the onboard NIC), then used vCenter to disable vmnic4+
    5+8+9 but enable vmnic1 ("Managed Hosts") & all VMs
    in s4 became unpingable.  I swapped this piece of cable
    with a tested working cable & still no joy

Management wanted the entire s4's ESXi to be reinstalled.

Any other suggestions?

I'll attach the bundle logs of s4 in a while
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 

Author Comment

by:sunhux
ID: 39190297
The Bundle logs extracted from vCenter is too large, of about
50MB when zipped.  If needed, pls let me know the specific
log / filename required & I'll attach here
0
 
LVL 120

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE^2)
Andrew Hancock (VMware vExpert / EE MVE^2) earned 490 total points
ID: 39190307
have you checked the configuration of the ports s4 is connected to?

have you swapped network connections between s4 and other servers?

how are these configured?

Quad NIC 1 has vmnic4,5 connected for Prod VLANs
Quad NIC 2 has vmnic8,9 connected for the same Prod VLANs

teaming policy, load balancing, physical switch config?

can you check which actual nics the VMs are using, using esxtop in network mode, type N.

it will show which VM is using which actual nic for data transfer, or have you done this, and it's all nics, 4,5,8,and 9?
0
 

Author Comment

by:sunhux
ID: 39190412
Once my access is granted in about 1 hr's time, I'll get the esxtop output.

Btw, how do I copy out the esxtop output to a USB thumb drive?
Not allowed to enable SSH server on our ESX servers for security
reason.

interface GigabitEthernet2/0/19
 description *** S2 ***
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 48,49,70,129,130,132,133,137-145,161,162,169,171
 switchport trunk allowed vlan add 173,174,185,186,189,191,410-412,421,422,424
 switchport trunk allowed vlan add 425,452,454
 switchport mode trunk
 switchport nonegotiate
 speed 1000
 duplex full
 spanning-tree bpdufilter enable
end

A currently working Cisco switch's port looks like the above.

Will post the configs of the two ports on the Cisco switch which the
suspected vmnic8/9 are connected to in a while
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 120

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE^2) earned 490 total points
ID: 39190433
that is difficult, you can only take a picture of the screen, but you should be able to check which VMs are using which ports.

I would swap working ports for "non-working ports" as a check if the issue is physical switch or server.

also check errors on physical switch ports
0
 

Author Comment

by:sunhux
ID: 39194961
>I would swap working ports for "non-working ports" as a check if the issue
> is physical switch or server.

Had done the swapping & isolated that it's due to both the Cisco switches'
(a pair of C3750X-48 stacked together) ports issue: got the network engr
to provision 2 other ports  gi1/0/12 & gi2/0/12 on the same pair of
switches & vmnic8/9 now worked ==> verified by disconnecting all ports
& connecting up vmnic8 only to gi1/0/12 & then disconnect it & connect
up only vmnic9 to gi2/0/12 & all VMs in s4 are pingable.


Just one last question:
with 2 ports working & another 2 ports not working, shouldn't VMware
reroute all traffic to the 2 working ports (ie vmnic4 & vmnic5) ?  This
is an LACP dot1q trunk of the four ports vmnic4/5/8/9 so I'm expecting
that with Cisco Cdp being used (as shown in vCenter), ESXi should be
smart enough to route all traffic to the 2 remaining useable ports,
shouldn't it?
0
 
LVL 120

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE^2)
Andrew Hancock (VMware vExpert / EE MVE^2) earned 490 total points
ID: 39195190
How does the ESXi server, know your ports are duff, it doesn't!

so the traffic gets distributed across all four, any traffic up the duff ports, could go into a bucket of water! - not if the port is up and linked!
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
How to install and configure Citrix XenApp 6.5 - Part 1. In this video tutorial we have explained step by step installation of Citrix XenApp 6.5 Server on Windows Server 2008 R2 is explained in this video. We have explained the difference between…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

697 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question