Link to home
Start Free TrialLog in
Avatar of imayjustdriveoffintothesunset
imayjustdriveoffintothesunsetFlag for United States of America

asked on

failing core switch or dhcp issue?

I'm desperate for help.
I'm still a new network admin.
I've ran this by 2 consultants and two Dell support staff members.

A quick synopsis of the problem.  We run a physical star- MDF in the middle of campus (core is Dell PC 6024f) with fiber connections to IDF (Dell PC 3448 switches) at 7 other buildings.

We have a management vlan 255, a server vlan of 10, and building vlans of 100, 101, 102, etc.

Building A's vlan is 101, port number 5 on core switch
Building B's vlan is 102, port number 6 on core switch
Building C... etc

Building G's vlan is 105, port number 9 on core switch
Building H's vlan is 107, port number 11 on core switch

Yesterday one building (G, vlan 105) (with 40 workstations- mainly Dell GX620s or Dell Optiplex 760s) dropped completely... no pings or connection to the network.  This was followed by two other buildings (B, Z and H) losing connection to network.  I cannot ping the switch in the IDFs but cannot ping any nodes on that particular vlan (G, B, Z, or H).  Those nodes cannot ping other devices including their switch.  Eventually buildings B, Z, and H came back up while G is still down.  In G they cannot log into the domain but logging in locally I see a ip of 169.254... signal that I can't get to DHCP server... right??  That switch uses vlan 105, 20 (a vlan for our printers.)  You might think we have a bad cable/fiber/gbic... nope... we have a printer on vlan 20 that I can ping and print to from another vlan across campus.

This morning we have building G, H, and A down completely.  From the core, I can ping their switches in the IDF (Dell PC 3448) but cannot ping nodes on that VLAN.  nodes on that VLAN cannot ping the IDF switch of their building and cannot ping the core switch.  again... 169.254...
Today I could get 6 workstations on vlan 20 to come across the switch in the G building and connect with the network.  Any more workstations we try to add on vlan 20 won’t connect… just a 169.254 address.

The problem is not electrical, no other device has been installed on the network.  No other changes were made on the network.

On the core switch we are seeing a ton of activity in the Statistics/RMON, Table Views, Utilization, Counter, Interface and Etherlike:

Utilization shows 3 ports with 100% Non Unicast Packets Received and all the other ports are working fine showing 100% Unicast recieved.

Counter Summary shows significantly more Received Non Unicast Packets by the bad ports rather than the good ports which report much lower.

Interface Statistics I've cleared the counters and on the bad ports I'm seeing a significant number of broadcast packets than unicast or multicast packets.  The good ports show unicast packets in the tens of thousands and very little broadcast or multicast packets.

There are zero reports under the Etherlike Stats for bad or good ports... all looks good there.

Port 9 on the core switch connects to the IDF in building G, vlan 105.  I even moved the connection to port 12 thinking it could be a bad port.  Same issue.

I considered a broadcast storm but since we have some buildings up and some buildings down I cannot isolate a particular building.  At one point I disconnected all nodes from building G and rebooted everything hoping to break a loop if there was one.  no luck.

I have bounced every switch at least 5 times.

Is my core switch failing or do I have some weird dhcp issue?

Thanks for any help you can give me.
Avatar of Aaron Tomosky
Aaron Tomosky
Flag of United States of America image

Is there wifi in the buildings? Could someone have plugged into one building and connected to wifi in another and crossed vlans?
Ok, I just thought of a much better reason. Someone could have plugged the LAN side of a dhcp enabled router into a port thinking they can just use it as a switch.
Avatar of imayjustdriveoffintothesunset


Thank you for these two ideas.   No on the first.   And no on the second. Our switches are all in locked boxes with only myself has the key.
Avatar of Rick_O_Shay
Flag of United States of America image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I Mean in building g, someone could have plugged in a rogue access point (consumer router) therefore adding another dhcp server to the network. I could still be totally wrong I just wanted to clarify.
Avatar of manni78

If you can ping all your switches from core switch then I suggest you to start troubleshooting with giving static IP to PC which are not picking IP from DHCP.
If you can ping nodes in other building from server VLAN then it’s not an issue with physical connectivity.
Check the configuration of uplink ports between core and access layer switches. Uplink port should be a trunk port and make sure server VLAN is allowed in trunk port.
One more thing, check if there is option to define DHCP relay server IP on the core/access switch.

I can't tell you how appreciative I am to have all these suggestions...

aarontomosky:  I went to all 9 rooms and physically inspected and unplugged each port.... thinking that same thing.  No rouge AP.  We do use some palm switches so I thought maybe a teacher or kid messed with the cords and caused a loop... nothing.

Rick O Shay:  DHCP looks okay.  The buildings down (the scopes are according to the buildings) have no leases... they expired yesterday.  The current buildings are showing leases ending at various times tdoay or tomorrow.

I manually config'd a few workstations in building G with a vlan address...   Vlan 105 ip is: so I manually config'd or, etc.  subnet of gateway of: and nothing... can't even ping the IDF switch.

All uplinks are properly config'd for spanning tree.  This network has been running properly for years this way but I did go in with Dell support to confirm it's correct.

The 4th thing you mentioned I'd like a bit more info to help me out...  would you suggest I monitor traffic on what vlan?  I have nothing coming through at all on vlan 105 (Building G) or 101 (Building A)or 107 (building H.)  Should I look for DHCP  Offers or ACK from a DHCP anything other than our 1 I know is valid?

manni78:  I can ping all the switches from the core... even the one's whose buildings are down.  I can't ping even with a static ip.  For a 6 hour period yesterday our printer vlan 20 was able to ping out to all other vlans and nodes across it's IDF switch and across the core switch.  The uplink ports were working fine for years before now and nothing has changed.  They are trunk ports and the vlan is allowed in the trunk port.  Yes, our core is config'd to define the DHCP relay.

could this be it?  I've used wireshark to watch the traffic on and off for 30 minutes now.  The core switch continues to send ARP to some servers.  

I see about 200 ARP requests to in 3 seconds of capture
I see no response

I see about 200 ARP requests to in 3 seconds of capture
I see no response

I see 50 ARP requests to in 3 seconds of capture (a server) responds back 13 responses

When I look at the switch I see the arp cache.  All these IP's and MACs are are already listed

super bad news.

reconfiged a new core (Dell 6024F) put on network and didn't fix my issue.

core switch issue is out.

could this be dhcp?  services are running.  leases look fine.  Can the program code get messed up???  What could be interfering with DHCP?
I like the arp path you were on for a second but I don't know how to troubleshoot that.
Would it be fruitless to put dhcp on another server and take the old one down?

The fact that you put in a static ip and still couldnt ping the switch from the building makes me think it's not dhcp as the root cause.
The only reason I can't let the dhcp go is because for a 6 hour period i could get workstations on the printer vlan using status and dynamic addresses and that I'm getting a 169.254 address
You could be right. I'm just leaning toward all traffic being jacked possibly because of arp and dhcp is just a side effect. But no way to be sure yet.
thanks for all the back and forth aarontomosky... it gives me more brain power.  Another thought after 3 hours of reading.  could STP be the culprit?  It seems to get nasty on Dells?

As aarontomosky said if you can’t ping with static IP it can’t be DHCP issue. But yes it could be STP.

Have you made any changes on core switch? If yes could you check the ports that interconnect switches must not be configured with "spanning-tree portfast"? Do you any logs from core and access switches?
Well... it was a broadcast storm started from a small switch in a teacher's classroom in a building that wasn't taken down.  It took down 4 other buildings!