Link to home
Start Free TrialLog in
Avatar of urbanstyles

asked on

Switch Upgrade Gone Horribly Awry

So my project last night was to replace two aging and ailing 3Com 10/100 network switches with new NetGear Gigabit Switches.  The old switches were a 48-port and 24-port respectively.  The new switches we purchased were NetGear GS748T.  I THOUGHT the proceedure would be as easy as physically swapping the equipment and giving them enough time to sort out the tables to allow for traffic.  I was mistaken however.  After initial setup of the NetGear switches I decided to try and configure a LAG connection combining 4 of the ports on each of the switches in order to establish a 4 gigabit 'trunk' between the two devices to allow for increased bandwidth between the switches.  I configured this through the web interface and then set about plugging in all 96 devices into the new switches.  However, once powered up, I had next to no network connectivity and what was there, was intemittant at best.  I even went so far as to reset the switches back to factory defaults in hope of establishing some sort of functionality.  The whole objective in upgrading to gigabit across the board was to allow for some of our heavy bandwidth applications.  After struggigling most of the night with this I was forced to swap back to the 3Com switches in order to have things somewhat functional for staff arriving in the morning.  Swapped everything back and network speeds were back to what they were.

Does anyone have any idea what I've done wrong, or am I trying to use these NetGear switches in an environment they do not supoort (though I don't really see how).  

If it adds any clarity, we have two file servers shared to the staff, that's it.  My objective is simply to share the access to the two servers and our internet connection to the 60 ish staff computers and various printers and such. I separated devices based on ones that supported a gig connection on one switch and devices that are perhaps a little older and only support 100 base on the other switch.  The two servers and the router were connected to the first swich will all the other 'power users'.  If any more clarity as to what I've tried is necessary, I'll be happy to provide.  

Any help is desperately appreciated as now I am under the gun to get these gig switches in place.

Avatar of Paul MacDonald
Paul MacDonald
Flag of United States of America image

You might try establishing the LAG before you connect any other devices to the switches.  Another issue may be with port speed.  Are the switch ports set to negotiate a data rate?  How about the client NICs?  Is the wiring infrastructure suitable for gigabit speed?
Avatar of urbanstyles


I did in fact establish the LAG prior to attaching the other devices.  I tested connectivity with a couple of laptops on each of the switches just to make sure everything was talking properly.  

All devices, switch ports and NIC's are set for autonegotiation.  Should I be disabling this?  I have no reason to believe the wiring won't support gigabit.  The building is recently renovated, and I have no option but to trust that the electricians did as they say they did.  Would this cause no connectivity, or just slightly less than gigabit?

I agree with paulmacd.  The cabling infrastructure would be the most obvious issue.  The number 1 cardinal rule of deploying mission critical equipment like this is to configure it off-line and make sure it all works before you deploy it.  You will need cat 5e or 6 to run the Gig connections to your servers.  The best practice is to lock-down the port speed/duplex settings for servers, routers and other network gear (do not let them auto-negotiate).  This is a two-way street, so your servers will have to match, which could be the source of your issue.  Check out the port settings and make adjustments as needed for speed and duplex.
I would agree with Paul. Most likely an LAG issue. In addition to establishing the LAG before bringing up the other devices, make absolutely certain that all ports in the LAG are completely identical (speed, duplex, VLAN membership, tagging, etc.). If they aren't, all bets are off.
The servers are in close proximity to the rack, so I know that cabling is good.  The runs to the workstations are a different story, I suppose there is a possiblity there.  However, I'm experiencing the issue with a laptop plugged directly into the switch, two feet away from the servers.  

Everything works fine, with only a few devices attached.  The problem comes once all ports are filled.  I'd love to be able to do this offline, but unfortunately, after hours is my only option.  I will try to force the speed and duplex and see how that goes.
Cabling can be a factor, but it generally shouldn't prevent communication.  Those are all just troubleshooting steps I would take in a similar situation.  It sounds verly much like you've done everything right.  

If you stage the switches again and notice intermittent connectivity, try bypassing the wiring plant by running spare cables between a couple machines and the switches to see if connectivity improves (or becomes more reliable).  Make sure all your cables are well manufactured too.  By and large, I would expect faulty cabling to simply result in slow transfer speeds, but EMI/RFI can do funny things at gigabit speeds.

Of course, another option would be to set aside the LAG for now and see if that helps.
The issue persists even after blowing away the LAG configuration all together, but yes they are identical.  Basically fresh out of the box, set 4 ports to LAG membership and that was it.  I've done this on Pro Curves and 3Com's before, which is why my frustration.  I fear I'm missing something obvious.  Forcing negotiation on 90 some devices is going to take a boat load of time, but I'll give it a shot.
I wouldn't specify a data rate on the ports or the NICs.  I think that will just be a waste of time and something you'll have to undo later.  

I'd point out that it's not impossible the switches are bad.  Are they running the lastest firmware?
To be honest I haven't checked.  I guess unpacking new equipment and expecting it to work without upgrades isn't always possible.

I'll check the firmware.
I would not worry about negotiation on the workstations, just your servers, other switches and routers.  The workstations should be fine. I beleive the key is to set this up off-line and get thing working before you try to deploy it to production.  Cabling can be an issue  if you trying to use Cat 4 for GB speed.  This could be an issue if the auto-negotiate is turn-off on that port.  Check out the port speed settings on your servers and make sure they match the new switch.  Your laptop should be able to function through the new switch without issues, if all is working as it should.  If you have a port available on the old switches, plug in the new one, with only your laptop on them so you can complete your setup and testing.
Certainly take paulmacd's advice and make sure the new switches have the latest firmware/patches installed.  This should not prevent them from working, but you do want them as up to-date as possible.
Once connected everything shows successfully linked at 1000M FD.. the wiring is most definitely cat5e throughout the building.  I suppose even though the connection SHOWS as properly auto negotiated, I'll try hard coding the Server NICS to make double sure.

It would appear from the activity lights ( which i know can be misleading) that a broadcast storm seems to be ocurring but that is just a thought.
With only 60 users and a couple servers, I would not recommend using VLANS, unless the business is required for security reasons.  You should be fine with a flat network schema until such time as the business expands.
I'm not using VLAN's.  I simply would like an aggregate link between the switches for increased bandwidth on an 'inter-switch' basis.  having 40+ devices on one switch restricted to a single gig link to the other swich that contains the servers may be limiting at some point.  One of our applications in particular is very inefficient in it's bandwidth use.
You know, a broadcast storm is not outside the realm of possibility; Paul mentioned that the switches themselves could have problems, and I agree - you could have a "chatty" port on one of the switches.

I would recommend that you configure a monitor port on one of the switches, connect a laptop to that switch, and install WireShark on that laptop.  Then monitor the network traffic with your servers and a few workstations connected, and see what's talking.

Relevant links:
Switch Manual (for info on configuring a monitor port, if you need it)
WireShark (free to download, in case you're not familiar with it)

I'm not sure if I'm working my way to a solution or not, but I reconfigured the LAG, then enabled broadcast control in the ports involved and have slowly moved a few devices over from the old switches to the new ones.. So far so good.  Everyone is talking.  I'll have to wait until this evening to move the servers.. But at least it is looking promising.
Hey, great!   Good luck.
Good that you're making progress.

I would have done this in a different order:

Upgrade firmware.
Stack the switches using dedicated stacking cable.
If no stacking cable, then use single patch cable for uplink.
Swap out switches.
Reconnect all.
Test for problems.

If you're working here, then it's easy to troubleshoot a LAG.  If you do it all at once, or in a random order, how do you know what's truly failing?

Important question:
Any reason you bought new switches without stacking cables?  If it's not too late, return them.

One 10Gb or 24Gb stacking cable is faster than sucking up ports to make 4x1Gb LAG.

The GS748TS can stack up to 6 tall, and uses a 20Gb (10Gb duplex) stacking cable.  Two ports on the back of each so you can cross-link or make a ring for redundancy.
I've checked.. and the firmware is current.  These are the non-stacking switches.  I agree they would have been a better option, but the local vendor had these in stock and I purchased them before realizing that a stackable option was available.

What is it they say of hindsight?
Hindsight = swift returns with receipt in-hand.  If they'll take it, send it back.  No shame.

I've returned a lot of stuff and taken a hit on shipping and re-stock fees.  Cheaper in the long run to get the right equipment.

The stacking models are $200-300 more, but will be much easier to maintain.  If you're paying for Netgear tech support (highly recommended), then they'll spend less time troubleshooting and re-creating your LAG whenever you call.

CDW carries the GS748TS for $862.  Amazon for $865, and you can have them by Thursday with overnight shipping.  Check on the stacking cables...but you can always use 1-port uplink with a standard patch cable until stacking cables arrive.

There is a separate SKU for the extended warranty, but IIRC, you can add that in the first 30 days (double-check that info though).
Avatar of urbanstyles

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Was able to troubleshoot my own error in connections.