failing core switch or dhcp issue?
Posted on 2011-02-10
I'm desperate for help.
I'm still a new network admin.
I've ran this by 2 consultants and two Dell support staff members.
A quick synopsis of the problem. We run a physical star- MDF in the middle of campus (core is Dell PC 6024f) with fiber connections to IDF (Dell PC 3448 switches) at 7 other buildings.
We have a management vlan 255, a server vlan of 10, and building vlans of 100, 101, 102, etc.
Building A's vlan is 101, port number 5 on core switch
Building B's vlan is 102, port number 6 on core switch
Building C... etc
Building G's vlan is 105, port number 9 on core switch
Building H's vlan is 107, port number 11 on core switch
Yesterday one building (G, vlan 105) (with 40 workstations- mainly Dell GX620s or Dell Optiplex 760s) dropped completely... no pings or connection to the network. This was followed by two other buildings (B, Z and H) losing connection to network. I cannot ping the switch in the IDFs but cannot ping any nodes on that particular vlan (G, B, Z, or H). Those nodes cannot ping other devices including their switch. Eventually buildings B, Z, and H came back up while G is still down. In G they cannot log into the domain but logging in locally I see a ip of 169.254... signal that I can't get to DHCP server... right?? That switch uses vlan 105, 20 (a vlan for our printers.) You might think we have a bad cable/fiber/gbic... nope... we have a printer on vlan 20 that I can ping and print to from another vlan across campus.
This morning we have building G, H, and A down completely. From the core, I can ping their switches in the IDF (Dell PC 3448) but cannot ping nodes on that VLAN. nodes on that VLAN cannot ping the IDF switch of their building and cannot ping the core switch. again... 169.254...
Today I could get 6 workstations on vlan 20 to come across the switch in the G building and connect with the network. Any more workstations we try to add on vlan 20 won’t connect… just a 169.254 address.
The problem is not electrical, no other device has been installed on the network. No other changes were made on the network.
On the core switch we are seeing a ton of activity in the Statistics/RMON, Table Views, Utilization, Counter, Interface and Etherlike:
Utilization shows 3 ports with 100% Non Unicast Packets Received and all the other ports are working fine showing 100% Unicast recieved.
Counter Summary shows significantly more Received Non Unicast Packets by the bad ports rather than the good ports which report much lower.
Interface Statistics I've cleared the counters and on the bad ports I'm seeing a significant number of broadcast packets than unicast or multicast packets. The good ports show unicast packets in the tens of thousands and very little broadcast or multicast packets.
There are zero reports under the Etherlike Stats for bad or good ports... all looks good there.
Port 9 on the core switch connects to the IDF in building G, vlan 105. I even moved the connection to port 12 thinking it could be a bad port. Same issue.
I considered a broadcast storm but since we have some buildings up and some buildings down I cannot isolate a particular building. At one point I disconnected all nodes from building G and rebooted everything hoping to break a loop if there was one. no luck.
I have bounced every switch at least 5 times.
Is my core switch failing or do I have some weird dhcp issue?
Thanks for any help you can give me.