Netgear GS724TS & GS748TS - Unresponsive to pings every 1000 minutes!!
Posted on 2009-04-23
We have a number of Netgear GS724TS & GS748TS Smart switches deployed in our production environment. Every minute they are queried with SNMP for their status, interface utilisation etc - effectivly the same as a continuous ping.
Exactly every 1000 minutes (16 hours & 40 mins) each switch stops responding to pings & sends and alert. I have set up constant pings to & through the switches from a number of places within the network - the switch IP is unavailable for approx. 1 minute - it cannot be pinged & I cannot manage the switch. Traffic still goes through the switch however - I can ping devices on the other side - so the switch is not down, for some reason it just does not respond to pings. Each switch is on it's own 1000 minute cycle (i.e. they do not all go off together, they go independently every 1000 mins @ different times throughout the day).
I have deployed a GS724T v2 smart switch (an earlier model) with exactly the same config & it does not alarm, leading me to suspect a firmware bug - all switches are running the newest version of the firmware, version 3101.
I cannot downgrade the firmware as the earlier version does not allow you to change the management VLAN, so you cannot monitor by SNMP at all!
Netgear are not aware of the issue - I'm wondering if anyone else has experienced anything like it before - is anyone monitoring any Netgear GS724TS or GS748TS switches running the latest firmware, or has had a similar issue when monitoring any other type of device?
Config on the switch is as follows:
Single VLAN (e.g. VLAN101).
2 x LAGs (aggregated links - e.g. interfaces 1&2 & 3&4)
All interfaces are in VLAN101 except those making up the LAGs (requirement)
LAG interfaces themselves are in VLAN101
All interfaces have a PVID of 101 (including LAGs & interfaces making up the LAGs)
Spanning tree is running with all interfaces running Portfast except the LAGs & those making up the LAGs
Each switch connects to 2 x Cisco 3560E switches - 1 LAG to one & 1 LAG to the other; 2 port Etherchannel is configured on the Cisco switch. These autonegotiate for speed (at Netgear & Cisco ends) - I am just about to configure the duplex & speed @ both ends to see if this helps. I have also just connected a GS724TS with a single cable to one of the Cisco switches.
Default gateway on the Netgear switches is the HSRP address on the relevant VLAN on the Cisco switches.
Netgear are not aware of the issue; they advise they will attempt to replicate the issue by Monday.
Please help as receiving spurious alerts means people have set up filters for the alert e-mails & are ignoring them when there is a legitimate alert!
We are using SolarWinds Orion as the monitoring tool, but as above, if I set up a constant ping from my machine it loses connection to the various switches @ exactly the same time as the Orion server.
Any further info required please ask! Many thanks in advance!