Slow (LAN) Network Performance

We’re experiencing some speed issues on our LAN.  Intermittently throughout the day, when a user opens a mapped network drive, it can take up to 30 seconds or longer for the contents of the directory to appear.  Once the folder’s been opened, I can close & re-open that drive in Windows Explorer without a problem.  But then if I try to browse the same share a while later, it stalls again.  This appears to be happening with multiple users accessing various shares located on either one of our domain controllers/file servers.

We moved less then a year ago & the issue became extremely apparent once we were in our new location.  We had a consultant setup & configure our switches at our old location, but due to tight budgets, I didn’t have the luxury of using the consultant & personally set these back up after we moved.  I’m afraid that due to my lack of knowledge on the switches, that I may have set them back up incorrectly, but I don’t know how to confirm that.  I did check my notes & noticed that I didn’t have the switches setup like the consultant originally did, but thought I had resolved it since it seemed to be better for a while…but apparently not.

Basically, the consultant setup switch “A” with aggregated pairs of ports for each of the switches to connect from/to.

1.      Switch “A” ports 39 & 40 to Switch “B” ports 47 & 48
2.      Switch “A” ports 41 & 42 to Switch “C” ports 47 & 48
3.      Switch “A” ports 43 & 44 to Switch “D” ports 47 & 48

Half of the servers are directly connected to Switch “A” & the others to Switch “D”

Here are some basics regarding our network setup:

1.      The shared network drives are mapped at logon via a kixstart script.
2.      2 Domain Controllers running Windows 2003 Standard: DC1 (DNS & DHCP) & DC2 (WSUS)
3.      1 Exchange 2003 server
4.      1 Windows 2003 Standard running IceWarp & Exchange 2003 (for smartphone email access & spam filtering)
5.      1 SonicWALL NSA 2400
6.      4 3Com Baseline 2948 switches
7.     33 Windows XP Pro workstations

Thanks in advanced for the help!
Who is Participating?
blugirlConnect With a Mentor Author Commented:
I think the culprit has finally been identified!  I've been monitoring a few workstations' Task Manager for process that we're spiking the CPU.  On one, I saw that the CPU usage was high, but nothing was evident.  I did some research & found the Process Explorer utility...AWESOME!!  It showed that Interrupt was spiking the more research indicated that it was a problem with a sound card driver, disabled C-Media High Def. Audio Device in Device Manager & that took care of one least for overall slow performance.  (I only mention this because "slow performance" issues have been popping up all over of network, but at times have been due to various causes.)

Then, back to the slower network issue...more research, monitoring & tweaking led me to find a post to disable "Scan within archives during on-access scans (e.g., .zip, .rar, .tat, .tgz)" in McAfee settings.  So far, this has made the most difference & things have been flying again!  I made this change 5 days ago...and just today, McAfee released 5.2.1.  I haven't seen much difference with that update yet, but I believe that it was the last setting tweak that helped the most.

Thanks again all for the help!
Have you checked the firmware on the nic of the server that the shares are on?  Have you updated the firmware of the switches?  Have you updated the network drivers on the server?  What happens if you put in \\servername without a share does it take a while to come up?  How about the cabling do you have a bad cable on the server.  Can you put a workstation on the same swtich as the server and does it work ok?  Just some things to try.  
What it sounds like is potentially:
1. bad cabling. If the servers and switches are Gigabit capable, they are much  more sensitive to the quality of the in-wall cabling. Was it certified as CAT5e ?
2. Duplex mismatch between either workstations or server(s) and the switch. Very common if the switch ports are setup as 100/full and the server NIC set to auto.
3. Mac-address hopping between the channel-group ports on the switches. Maybe one port group is not setup correct and switch is seeing same server mac-address on two different ports instead of the one port group channel it expects.
I would certainly start with Layer 1 physical cabling, then layer 2 duplex negotiation and port groups.
Not sure how to tell you to check any error conditions on 3Com switches, but I have to assume they have a web interface where you should be able to get some statistics and look at error counters.
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

blugirlAuthor Commented:
@steinmto & @lrmoore:

Since this is intermittent, I can’t always reproduce the slow drive access issue in Windows Explorer on demand.  So I did some ping tests then I connected my computer to Switch “A” & pinged some more & found some really wild numbers!  I’ve never used ping in this manor & am not sure if it helps with troubleshooting this type of issue, but I noticed that if I ping the servers from the servers, they all average 0 ms.  When I do the same test from some other workstations…same thing.  But when I ping the servers from my computer or the other workstations, I get the attached results.  I also get similar results when connecting my computer directly to switch “A” & bypassing the in-wall wiring.

Also, I've never really looked at the managed switch app before, but looked at the statics for all ports that were in use.  About 1% of the ports showed that it had about 50 or less collisions or fragments.  Then I think I may have found a culprit!  One port had over 200,000 collisions, 6,000 fragments & 51,000 oversized packets.  So possible bad NIC?  I’m going to investigate this one when the user heads to lunch in an hour or so.

I also verified that these 2 workstations are set for DHCP & receiving the same DNS IPs.

I haven't checked the firmware on the server NICs.  It looks like firmware updates for the 3Com switches aren’t available since being taken over by HP…or maybe its cause I don’t have an active support contract for those devices?

Accessing the \\servername seems to go ok.  When I tried opening Windows Explorer this morning, it stalled for about 30 seconds.  Then when I tried to open the mapped network drive, that stalled too.

Since the same problem occurs when attempting to access shares on 2 different servers that are directly connected to 2 different switches, I’m kind of guessing that it’s not a cable?  At this point, I know that the issue is happening to at least 2 different users, too.

Regarding in-wall cabling…there was existing CAT5 or CAT5e going to all the offices & our cabling guy wired all the new cubical areas with CAT6.  We’ve used him for years & never certify.  Of course, there are occasions where there’s a bad jack, cable or termination, but we usually just have him come back out & fix those.

All the switches are set to auto for speed & duplex.  My & the other user’s NICs are set for full auto negotiation.

I’m not sure how to check for the mac-address hoping?

I'm still checking all the switches to ensure that a switch hasn't been patched back into self, but wanted to update y'all regarding my findings so far in case it helped.

Thanks again
blugirlAuthor Commented:
Oops...sorry...forgot to attach the ping results...
blugirlAuthor Commented: I believe that I've resolved the negative ping response time issue (but the overall speed of browsing issue still exists).  Per this forum post, it appears that it was an issue with AMD dual processors.  I downloaded & installed the AMD Dual Core Optimizer & now my ping responses are back to normal on my workstation (I haven't tested it on the other yet)...but I just tried opening the network share via the UNC path...stalling still.

Also, I identified that there was a 3COM phone that was connected to the port that was showing excessive collisions, fragments & over sized packets on the switch.  I disconnected the phone, reset the switch statistics about 2 hours ago.  I made some test calls & no new collisions, fragments or over sized packets on that port.  So I'm not sure if those stats were from the phone or a previous device that was plugged into that port?  Especially since I'm not sure if stats reset when rebooting the switch.  So maybe this isn't a culprit after all?
Are any of the servers dual-homed with 2 NIC's?
Check DNS servers. Check the forwaders. I've seen two servers pointing to each other as forwarders causing issues.
How about CPU utilization on the servers? I've seen servers 100% cpu bound and just can't handle any more requests.
I've also seen servers hang up doing AV scans in the middle of the day. Check the server event logs for any kind of errors.
Although it sounds like a network issue, if you don't find a loop, then it could be server / dns related.
Make sure the servers have the latest NIC drivers from the manufacturer, not windows updates.
What OS on the servers?
What type of switches do you have?

You have two ports from switch A to each of the other switches.  Are these port setup as a "team" so that they logically appear as a single path?  This is called Etherchannel on Cisco switches and trunking on some other switches.  Cisco uses the term trunk for something else.

Are you using multiple VLAN's?

Collisions on a specific port are not really a bad thing if the device on that port is half duplex only and the switch port correctly go into half-duplex.

However, sometimes you can get duplex mis-match, the switch port in one duplex mode while the device is the other mode.  This will cause serious problems, but normally with the device on that port.  It should not affect communcations with devices on other ports.
As giltjr suggested, once you confirm there are no VLAN configs on the 3Coms and assuming the switches are all in one place, I'd recommend choosing one of the switches for your servers and sonicwall. Then, connect each switch to that switch. Seems you current have your servers split between two switches and possibly your switches are daisy chained to each other. I wouldn't go beyond two of these in this configuration and since you have four, you may be experiencing slow LAN connectivity because of this.
blugirlAuthor Commented:

All 4 switches are 3Com Baseline switches 29148-SFP Plus (SCBLSG48).  Our consultant setup aggregated pairs for each of the switches & I’m fairly certain that I have them correctly attached now. (see attached image)  If I didn’t wouldn’t I see errors or something on those ports?

We are not using VLANs.

Thanks for the clarification regarding potential collision issues being isolated to that port & not affecting other communication.

One thing I did notice is that on all 4 switches, flow control is disabled EXCEPT for ports 45-48 which have flow control enabled.  I’m not certain if this was intentional, and I’m not familiar with what flow control does…but I did start researching the topic & it appears that it can cause issues, so I thought I’d mention it.

link aggregation is an important part to this. i must have missed it. my suggestion regarding the physical connectivity is null at this point.
Sorry, somehow I missed where you had already stated what type of switches you have.

How many files are in the directory on the mapped drive?

The way the directory listing is built there are a lot of little network interchanges.  The more files the longer this takes and so the slower things appears.  Now, when I say more, I am talking thousands of files in the same folder.   Building a directory list is done by "give me all file names".  Once the client has a list of file names it they asks "what are the attributes for file #1", where attirbutes are file permissions, ACL, all dates relating to the file.  Then it does this for file #2, then #3 and so on until it has done this for every file in that folder.  So 1,000 files means 1,000 queries.

Are the mapped drives always on the same server?

If so, can you map a drive on a different server/computer and see if it is slow also?

I would have to look at 3COM switches and it flow control, but typically flow control should not cause a problem unless you have a lot of traffic.
blugirlAuthor Commented:

The 3 main network shares are:

M:\ (job files)
-558 GB
-626,000 files
-93,000 folders
-34 objects in the root directory
-on server DC1

U:\ (user files)
-216 GB, 165,000 files
-16,000 folders
-48 objects in the root directory
-on server DC2

S:\ (install files)
-129 GB
-328,000 files
-24,000 folders
-240 objects in the root directory
-on server DC2

I was doing some further testing with Task Manager & noticed that when I access network resources that the McSheild.exe process jumps up to 50%-98%.  This typically just happens the first time I access the network share, or if I haven't accessed it for a while.  This is still occurs intermittently...almost like McAfee doesn't scan it each & every time.  If I disable McAfee, everything flies!!

If McAfee happens to REALLY be the culprit, does anyone have suggestions for corporate AV (approx 35 users), or at least something that I can demo to see if the performance is better?

Our typical workstations are Windows XP Pro SP2, 3 GB RAM, Athelon dual core 2.0 ghz.  Our servers are Windows 2003 with two Exchange 2003 servers.
Are you running the latest version of McAfee? Perhaps it's a bug? I don't have my clients scan network shares. I have McAfee on the file servers doing that.
blugirlAuthor Commented:

It's setup to auto update & am currently on version 5.0.0 Patch 6, last updated today, DAT 6348.000, Scan engine 5400.1158.  There is a known issue regarding high memory usage during updates that's supposed to be fixed in version 5.2 that's due to release this month per my last conversation with Sonicwall about a month ago.  If I'm not mistaken, they'd said that it'd been released for the regular McAfee products, but due to some issues with Sonicwall, they had to hold off on the release for the rest of us.

For testing purposes, I disabled scanning mapped drives during on-access scans, but it didn't improve the performance (see attached)
Isn't the bug that it scans the network drives despite being disabled? What if you get a workstation that doesn't have the client? What happens then?
blugirlAuthor Commented:

I'm not sure of all the bugs that the current release includes, but SonicWALL said that 5.2.1 has many fixes for these types of issue.  Since it's due out this or early next week, I'm going to back off on further detective work in case this helps.

What if you get a workstation that doesn't have the client? What happens then?

If I disable McAfee on one of the workstations experiencing the slowness, it flies! O_o
Curious. Can't wait to see what the updates net you.
Glad you discovered the issue. I think that may have been it. Of course, any virus hiding in one of the compressed files won't be found out until they are unzipped. Shouldn't be an issue or any different if they weren't compressed. Come to think of it, I'm not sure what our Symantec settings are regarding this.
blugirlAuthor Commented:
There were multiple causes for the slow performance issue that I discovered while troubleshooting & wanted to point out the cause to help others.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.