Dejayy

Possible Network Bottleneck

Well my network is experiencing some strange events right now. I need some help in finding what is happening. First we run one main application that is pretty network intensive if there is a hiccup on the network everyone will crash instantly. I have roughly 100 users in 4 different locations. Every location is connected directly to the server room by a gigabit port on their local switch to a gigabit switch in the server room. All together we have about 150 devices connected to the network.

I am not sure exactly what I am experiencing, and I was hoping someone could clear a couple things up and maybe point me in the right direction.

1st  Everyone is experiencing a network lag.
- I ran a test where I copied a 500 meg file to and from the server and it should take roughly 50 sec. it can take anywhere from 1 to 5 min. This happens on the weekends when no one is even logged onto the network it is not just during the work day.

2nd  I believe that I have isolated the problem with the network
- I have determined that the device that is causing the Problems/bottleneck is a Netgear GS724T this switch is located in our server room. All servers are connected to it, and every different location is connected to it via gigabit port on their local switch located in every suite.
- I can reset the switch to factory defaults and everything will work fine for a couple of hours and then it is back to a slow network.

I mainly want to know is there something I can do to test to see if the switch is just inferior?

I have talked to a network specialist and it sounds like he just wants to sell me a $6000 Cisco switch and apparently that will fix everything.

I am hoping to get some feedback what I want to do is eliminate all options before I just change out the switch does anyone have ideas of test I can run to see why this is happening or how I can go about this.
Does anyone recommend that I replace the switch? I dont want to unless I can determine that the switch is the problem and not something else that is causing the switch to be overloaded.

The switch i am currently using says it has
Bandwidth: 48 Gbps

I have been looking at a switch that claims
Bandwidth: 96 Gbps (non-blocking)

The first switch is a 24 port and the second is a 48 port. Is the bandwidth the main thing I should be looking at will be make a substantial difference?
first wheh you say it should take 50sec is this from previous experince or just what you ahve calculated?
eg are you sure it is worse than usual?
but first i would get wireshark. jsut serch google and download it its a free packet capture program..

if you can mirror the gig ports so you can monitor them through wire shark. this will show you exactly what traffic is going accross the link and what is happening. (you can use a hub to mirror a port. if you simple plug it in along the link, and then have one cable comming of to your PC.. make sure you turn of TCP/IP on your network card settigns so you monitor PC cant talk back on to the network.. )

by looking at the data you should ahve a better idea how much traffic is going accross the network and what is going on..
I'd be of the same opinion as you in that I'd want to be pretty sure that the problem is actually with the switch before replacing it.  Have you checked the performance of the switch when the data is running slow? How is the CPU/Memory utilization?  At 48Gbps it should be fine based on your description of the network.  Are you sure that the problem is not just on the server itself? For example when rebooting the switch this will cause the network adapter on the server to reset which may actually be the reason for the problem going away.  Have you tried any data transfers to other devices connected to the same switch?
50 sec is from previous experience. We were having problems with software we use and there tech supposrt told us that our network needed to be able to move a 500M file in 50sec and every computer in our network was able to.

Now they are not.

I have already been running wireshark on my network i have run it on all the servers and workstations looking for anything out of the ordinary.
The main problem is unless it is extremly out of the ordinary i am not really sure what i am looking for. I dont know what acceptable levels of traffic are.

Any tips to using wireshark?
i would run it on the coe links. jsut see waht % uterlisation thay are running.

you can also look at the data just from taht one server and look for errors and retransmissions as well as ammount of data over time.. does it drop or. do the problems start happing when network usage is high?? or at any time??

and as davy said check the server NIC.. IS it a proper Server NIC or a desk top NIC?? cheaper desktop NICS can't keep up with gig speeds for any period of time.. they are not built to do it! on my monitering PC's ui have fried a number of NIC simple cause they cant put up with continious high useage.. a reboot often fixes them for a while...

look on teh NIC on the server and see waht it is doing in terms of packets in and out when tehre are problems..
I have just set up monitoring on the switch and i have begun that process.

For the servers they all seem like they are reacting fine. Processor, Memory, and Network Utilization are not stressed at all.
All the key servers are Dell servers with Gigabit Network cards in them. I have monitered them and they are never pushed very hard.
This is not a problem that is happening when people are working i was in this past weekend when no one was here i was the only person logged on to the network and i was still experiencing problems.

I have also tried copying the file to different servers and i recieved the same result.

one other thing to test if possible is to stop all network activity you would expect to that server. (eg close down the appliaction taht you run and dont have any web browsers / email clients open. (basicly you want to see only background traffic) then run your monitor on teh server NIC and see waht the back ground traffic is like.. make sure tehre is not anotehr machine on the network that for some reson is sending lots of data at taht servers NIC.
I did that last weekend no one was here there was no applications open i even shut down the SQL server and Exchange server because i figured they probably create the most traffic of any servers connected to the switch. and i still ha dthe problem copying the file to two different servers. then as soon as i reset hte switch it worked fine this was on saterday. I then came in again on sunday and it wa the same situation i though i had fixed the problem i and i just wanted to see it working again i logged on i was the only one who was logged on. I know that there wasent anyone using the network overnight to and when i copied the file the time had shot back up to 3 min from 45 sec the day before.

Not a single person used the network besides me, and 8 to 12 hours later the same problems where happening again.
different port on the switch??

have you tried a different switch. just to connect directly through to another pc/server to see if you still ahve the bottle neck ?
I can not take the server off line right now but i will try and copy the same file between two computers so that it will not use the gigabit switch at all it should copy directly through the local switch.

Is that correct?
I just copied it up and down to two different computers in the same lan segment so they would only use the local switch. I copied it up and down 4 times and the worst time was 52sec the best time was 42 sec.

I am going to try this in every segment of the LAN just to rule it out.
Aaron Street
Thank you i am still working on this but i will start a new post if i am having problems discovering what is the issue. It is taking some time to look through all these logs but hopefully i can find the problem.

Thank you.