Solved

VMware ESX4 link aggregation with Juniper EX8208 causes NFS failure

Posted on 2011-02-18
5
1,583 Views
Last Modified: 2012-05-11
We have 4 ESX4 nodes, 2 connected to an EX8208 in our primary location and 2 to another EX8208 in our secondary location. The two sites are directly connected. Each VMware node has three aggregated port groups; one for management and vmotion, one for virtual machine traffic and one dedicated to NAS traffic using NFS. We have two NAS units; one IBM N6040 in the primary location and one IBM N3600 in the secondary location.

Each VMware node can mount and access the NFS volumes; existing VMs show up and it is possible to SSH to the node and read/write files under /vmfs/volumes

However, if we try to "Edit Configuration", power-up or create a new virtual machine, the operation will time out with a "general system error" and the relevant .vmx file becomes corrupted. A reboot of the VMware node and the NAS confirms that the file has indeed been truncated, this is not a cache issue. Both NAS units have been updated to the recommended ONTAP version.

The problem occurs even if the aggregated link has only one member link. If I then reconfigure that link to be a plain access port, everything works as expected. This aids troubleshooting but is not a viable solution.


On the VMware side, the virtual switch settings are as recommended by VMware: load balancing="route based on ip hash", failover detection="link status only", notify switches="yes", failback="no". All member links are configured as "active adapters".

On the Juniper side, the member links and aggregated interfaces are configured as follows:

    ge-0/0/8 {                
        ether-options {                
            802.3ad ae10;              
        }                              
    }            

    ae10 {
        traceoptions {
            flag all;
        }
        mtu 1500;
        aggregated-ether-options {
            link-speed 1g;
        }
        unit 0 {
            family ethernet-switching {
                port-mode access;
                vlan {
                    members 130;
                }
            }
        }
    }

Notice that we are not using LACP since VMware only supports static link aggregation.

The NAS vendor, switch vendor as well as VMware have been trying to solve this problem for weeks and we're getting nowhere. I'm looking for others with a similar setup (port aggregation, Juniper switches and NFS)
0
Comment
Question by:FloydATC
  • 3
  • 2
5 Comments
 
LVL 18

Expert Comment

by:deimark
ID: 34926030
What does the juniper say about the aggregated links?

show interfaces terse
show interfaces ae10 extensive
show interfaces ge-0/0/8 extensive
0
 

Author Comment

by:FloydATC
ID: 34950571
Sorry for the late response. Interfaces terse is lengthy and doesn't offer any surprises so I won't include the complete output here.

oikt@FERADH-SW001> show interfaces ae10 extensive 
Physical interface: ae10, Enabled, Physical link is Up
  Interface index: 138, SNMP ifIndex: 685, Generation: 141
  Description: FERADH-VM01-SAN
  Link-level type: Ethernet, MTU: 1500, Speed: 1Gbps, BPDU Error: None, MAC-REWRITE Error: None,
  Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1,
  Minimum bandwidth needed: 0
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x0
  Current address: b0:c6:9a:cd:c2:0b, Hardware address: b0:c6:9a:cd:c2:0b
  Last flapped   : 2011-02-18 11:14:53 UTC (4d 00:31 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :            109617954                 2176 bps
   Output bytes  :          39723756226                 1488 bps
   Input  packets:               792144                    2 pps
   Output packets:             30080101                    1 pps
   IPv6 transit statistics:
    Input  bytes  :                   0 
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 153, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0

  Logical interface ae10.0 (Index 66) (SNMP ifIndex 699) (Generation 131)
    Flags: SNMP-Traps 0x0 Encapsulation: ENET2
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :             0          0             0            0
        Output:       1308515          0     140007933            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      ge-0/0/8.0                 0           0            0            0
    Protocol eth-switch, Generation: 145, Route table: 0
      Flags: None

oikt@FERADH-SW001> show interfaces ge-0/0/8 extensive 
Physical interface: ge-0/0/8, Enabled, Physical link is Up
  Interface index: 161, SNMP ifIndex: 575, Generation: 164
  Description: FERADH-VM01-SAN
  Link-level type: Ethernet, MTU: 1500, Speed: 1000mbps, Duplex: Auto, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled,
  Auto-negotiation: Enabled, Remote fault: Online
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x0
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: b0:c6:9a:cd:c2:0b, Hardware address: b0:c6:9a:cd:c2:09
  Last flapped   : 2011-02-18 11:14:53 UTC (4d 00:31 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :            109618966                    0 bps
   Output bytes  :          39723758744                  512 bps
   Input  packets:               792151                    0 pps
   Output packets:             30080130                    1 pps
   IPv6 transit statistics:
    Input  bytes  :                   0 
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0, L3 incompletes: 0,
    L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 153, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0, FIFO errors: 0,
    HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
  Egress queues: 8 supported, 7 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0 best-effort                    0             28878173                    0
    1 assured-forw                   0                    0                    0
    2 mcast-be                       0                    0                    0
    4 mcast-ef                       0                    0                    0
    5 expedited-fo                   0                    0                    0
    6 mcast-af                       0                    0                    0
    7 network-cont                   0              1128679                    0
  Active alarms  : None
  Active defects : None
  MAC statistics:                      Receive         Transmit
    Total octets                     109618966      39723758744
    Total packets                       792151         30080130
    Unicast packets                     746050         28880707
    Broadcast packets                      633            24181
    Multicast packets                    45468          1175242
    CRC/Align errors                         0                0
    FIFO errors                              0                0
    MAC control frames                       0                0
    MAC pause frames                         0                0
    Oversized frames                       169
    Jabber frames                            0
    Fragment frames                          0
    Code violations                          0
  Autonegotiation information:
    Negotiation status: Complete
    Link partner:
        Link mode: Full-duplex, Flow control: Symmetric/Asymmetric, Remote fault: OK,
        Link partner Speed: 1000 Mbps   
    Local resolution:
        Flow control: None, Remote fault: Link OK
  Packet Forwarding Engine configuration:
    Destination slot: 0
  CoS information:
    Direction : Output 
    CoS transmit queue               Bandwidth               Buffer Priority   Limit
                              %            bps     %           usec
    0 best-effort            75      750000000    75              0      low    none
    2 mcast-be               20      200000000    20              0      low    none
    7 network-control         5       50000000     5              0      low    none

  Logical interface ge-0/0/8.0 (Index 95) (SNMP ifIndex 722) (Generation 262)
    Flags: 0x0 Encapsulation: ENET2
    Local statistics:
     Input  bytes  :                    0
     Output bytes  :             21169374
     Input  packets:                    0
     Output packets:               185178
    Transit statistics:
     Input  bytes  :                    0                    0 bps
     Output bytes  :                    0                    0 bps
     Input  packets:                    0                    0 pps
     Output packets:                    0                    0 pps
    Protocol aenet, AE bundle: ae10.0, Generation: 275, Route table: 0

Open in new window


Also, I have configured a syslog
file interface-log {
    any any;
    match ifOperStatus;
}

Open in new window

and I'm not seeing any activity there.
0
 

Accepted Solution

by:
FloydATC earned 0 total points
ID: 34959930
Seems we finally found the solution on our own: Remove the MTU setting on the aggregated interface ae10. The virtual switch is set to MTU 1500 but this must be matched with an MTU setting of at least 1514 on the Juniper.
0
 
LVL 18

Expert Comment

by:deimark
ID: 34960054
Hiya bud

Sorry, missed your 1st reply to this.

Yes, that would make sense now, just never really used link aggregation without LACP so was doing a bit of digging.

Thanks for letting us know.
0
 

Author Closing Comment

by:FloydATC
ID: 34995314
Solved
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
Teach the user how to install and configure the vCenter Orchestrator virtual appliance Open vSphere Web Client: Deploy vCenter Orchestrator virtual appliance OVA file: Verify vCenter Orchestrator virtual appliance boots successfully: Connect to the …
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question