Link to home
Create AccountLog in
Avatar of Drakin030
Drakin030

asked on

Major network slowdown.

Just recently we have been experiencing some major performance issues with our computers and network. Here is what I got...

We have a 25+ Windows Server environment with 2003, and 2000 servers. We have a DHCP server, DNS, Domain Controller, Exchange 2003, file server, SAN, multiple Cisco switches and so on. All our computers run Windows XP Pro.

The issue we are having is this....

In the morning when we all go to work, our computers just lock up and freeze. It's not just one person, it's everyone. For instance on my computer; if I open op "My Computer" it will just sit there and do nothing for about 5 minutes. Or if I click Start > Run it will just sit there. Or if I hit the task manager, nothing.

This only seems to happen in the morning for about an hour or two.

We do have mapped drives on our computer. I checked the performance of all the server that the mapped drives point to, and everything seems to be okay. Also during this problem when I can get the task manager open, CPU usage is low, Networking usage is low, Memory usage is low.

We do not run any roaming profiles on the network.

I have checked the GC's event log, and could not find anything that stands out.

What should I check? I'm trying to think of the cause of this issue, but I'm not really sure where to start.
Avatar of Darius Ghassem
Darius Ghassem
Flag of United States of America image

What AV do you have installed on your network?
Avatar of Drakin030
Drakin030

ASKER

We currently use Symantec Corporate edition. I want to say it's version 10.1.5.5000
This issue you are having was just like one that I experienced with Symantec Endpoint. The network would slow down to a crawl then sieze the clients.
What did you do to fix it? What exactly was causing it? I'm logged into our Symantec server now just poking around.
We had to fully remove SEP from the servers and installed McAfee Total Protection to get the network to run correctly. Symantec wasn't any help and MS wouldn't troubleshoot until we fully removed it.
Hello,
With symantec endpoint you have to uninstall the network performance portion that is "integrated" into the software.  It is an option on each client.
Thanks,
Kelly W.
Kelly,

Even when you don't install the network threat protection it still installs the registry keys and service. This didn't the issue for me.
When you checked the performance of the servers you have mapped drives to, were you just looking at the CPU or were you looking at the disk activity as well?

If just the CPU, take a look at how busy the disks are on the servers.  If they are pinned, that could be causing your problem (because of the mapped drives).  Take a look and see if any of the workstations are accessing a large number of files by using the Computer Management MMC.

A good way to test the theory is to remove the mapped drives from one computer and see if it responds appropriately as the others hang up.

We had this problem with people putting PST files on the file server.  When they all started work in the morning, the load on the file server and network caused the entire network to lock up.

Got any PSTs on the network?

Microsoft does not support PSTs on the network.
http://support.microsoft.com/kb/297019
Well I don't think it is Symantec, and the reason why is this...

My computer is the only computer in the office that does not run Symantec.

The reason why is because I'm just good and I don't get viruses. (I know I know....)

But I still experience the same problem as everyone else.
RPPreacher:

Users do store their archives on the file server. Perhaps when they open Outlook it tries to access this PST.

But again though, when checking the Performance of that file server everything appears to be fine. I may need to check disk performance though.
Do you have Symantec on the servers? The problem wasn't the clients but the servers having SEP installed on them. The clients work perfectly fine.
I checked the open shares on our file server, and there are only a handful of users who have archives open. Not as many as I though, so perhaps it's not PST's.
Hello,
Okay you mentioned that there is a slow down when everyone comes in and logs on first thing in the morning.
If you reboot your computer during the day is there still a slow down or is it faster than greased lightning?
Thanks,
Kelly W.
Rebooting does not seem to help. Some users report rebooting up to 5 times.
Oh and even now were still having the issue. It's about 10am and I clicked my Start > Programs and then nothing pulled up for about 3 minutes.
Disconnect your mapped drives and then try it again.
Just did. I'll wait to see if it happens again.
Hello,
I know that this is basic but have you looked at the physical layer of the OSI model?
What if you do a continuous ping to the server for 10 minutes?  How many failed packets do you have?  What is the average ms time and the longest ms time?
It also could be a computer with a marginal card has gone rogue or you have someone that is doing a broadcasting storm (but the Cisco switches are supposed to take care of that).
Thanks,
Kelly W.
Well during the time I would run a ping and the response time was <1ms. Also in terms of dropped packets I don't think I had a single one.
I would suggest disconnecting the PSTs.  It doesn't take many to muck things up royally.
Hello,
Do you only have your DNS server as the DNS for each computer?
Thanks,
Kelly W.
All computers point to two different servers here in the office for DNS. (Primary and secondary)
Hello,
I am just wondering if the two different DNS servers are clashing with each other.
When the users logon are there any errors in the event viewer on either the server side or the client side?
Thanks,
Kelly W.
I don't see any DNS errors on my local computer. I can check the server to see if anything stands out.
Hello,
I would not only check DNS errors but any other types of errors in the event viewer during the logon process when it is really slow.
Thanks,
Kelly W.
Well I removed all mapped drives but one, and I had a lock up again. Now I removed that one, so if it doesn't happen again, I think we know the culprit.
Actually I take that back.

I removed all mapped drives, and it just locked up again. So now I know it's not a mapped drive problem. Perhaps not even a share problem.
ALso (Sorry for all the posts) During the lock up, nothing showed up on my local computers event log.
Hello,
Again I just wonder if you have a network card in one of the PCs that is doing a broadcast storm.
When you mentioned that you removed all the mappings and it looked up again, what were you doing when it locked up?  Internet, in Office, in Windows Explorer, etc?
Thanks,
Kelly W.
At the time I had Outlook open, Firefox, and a RDP session.
I do have a port on our main switch that forwards ALL traffic on the network out that one port. I use this with Etherreal to monitor traffic.

If it was a broadcast storm, what should I look for? Also would I not be able to monitor this on one computer if it's going to all computers?
Hello,
Just to through this one out there, on the servers how much room is on each C drive?  Could the domain controllers not have enough room on the C drives?
Thanks,
Kelly W.
Hello,
Yes you should be able to monitor it on one computer.
Are you able to see any reports at all from your switches?
Thanks,
Kelly W.
I checked all major servers, and they all have bout 40% or more in disk space available on the C Drive.

We do have a graph I can look at through Cisco's network assistant. I can go ahead and run that.

I have Etherreal running on my computer now. Right now I have maybe....1-5 packets a second while standing idle. If I surf a web page it goes up. I figure when my computer locks down again, maybe I can start the capture process and see what's comming in.

Hello,
Do you have anyone that is listening to music over the internet or has setup themselves to share music through Napster, Pandora, etc..?
Thanks,
Kelly W.
Ok I had Etherreal running, but could not get the switch monitor running when my computer locked down again.

It seems like this happens when I access Windows explorer. For isntance, I can surf the internet during the lock-up but I can't open my computer or anything. But when it catches up, all these windows open where I had tried to previously open them.

The lock up lasted about 3 minutes, and I had about 400 packets, most of which were ARP (About 40%)

The destination was Broadcast, but the source was different. THere was one computer that had many requests, but I don't know how many is to many.
I use Pandora which is music streaming, but other users who have problems do not.
Hello,
Okay that is interesting that one had many ARP requests.  This is used to announce the MAC of the card to different requests.  
Is it the same card that is answering the ARP requests?
If it is the server then that is fine, BUT if it is another card that is in a regular PC then that PC has surplanted itself as one oft he domain controllers in the server world.
I would check to find out what IP or MAC it is that is sending the ARP requests.
Thanks,
Kelly W.
It very well may be a printer. I'm watching now and my computer is fine but it's still submitting arp requests. Not many, but still a few. For instance, over 5 1/2 minutes I've had 331 arp requests.
The device that is sending the ARP requests is a Server. It's just an old Terminal Server that a few people use, but for the most part was decomishioned.
Hello,
Is anyone using it right now?
I just wonder if you downed that server if things would be better?
Thanks,
Kelly
Let me check.

Also I just had a lock up, and it's weird because it happens for everyone at the same time.

I did some monitoring again of the packets, but I had very little coming in.
I do have one user on that server. I'm going to ask them to log off and shut the server down for a bit and see if the problem continues.
Hello,
Okay I am going back a ways on this.
What type of building are you in?
What is behind the wall of where the main servers and/or switches are?
Why I am asking is that I had the same problem when I worked at the hospital.  There was no rhyme nor reason of why things would lock up at different periods during the day, until we found out that a new MRI machine was located on the other side of the main wiring closet in the hospital.  It was creating a heavy EMI interference.  We moved the closet and life was good.
So is there anything on the other side of any of the walls that has EMI on it?
Thanks,
Kelly W.
Hmm, well we have 3 floors. The server room sits on the middle floor and I sit right next to it.

The building is pretty much solid concrete.

My office is literally right next to the server room. So I have a cable about 25-40 feet long running through the ceiling to the server room switch. The switches connect to the servers with a 25 foot CAT6 cable that runs down a ladder rack. Nothing really in-between. So my computer is the closest to the server room with nothing in the way but 2 walls.

I do have lights in my office, but I keep them off because the sunlight through the window is more than enough.
Hello,
Okay that is off the list.
Not knowing what your setup was, I just had to ask.
Will do some thinking on this one.
Thanks,
Kelly W.
Yeah this is a tough one.

I'm still monitoring packets when the issue occures, also I still need to shut down that server.

Hello,
Are you doing any network scanning on your antivirus on each computer?
Thanks,
Kelly W.
Each computer runs it's own scan locally, but as for File servers, I'll run a scan manually once a month or so because it takes so long.
Hello,
On each PC it is just the local drives and not the network drives that it is scanning, right?
Thanks,
Kelly W.
Just the local drives.
Hello,
Do you have any routers that are connecting a different company to you?
Some medical facilities have to do this.
Just inquiring of whether this is a closed network to your company only with just a router out to the internet.
Thanks,
Kelly W.
We do have a remote location that has a Cisco ASA5510 that I had set up this past summer.

That network contains about 20 clients. (Ours has about 60) and it connects via tunnel from the ASA to a VPN Concentrator to our office.

We also have another location that connects to our office from oversea's. It's connection is VPN based as well. They have been set up with us for about 2+ years.
Hello,
The reason that I am asking is that one of my clients had their internal network but was also connected to the hospital.  On of the hospital's Ciscos started to give off broadcast storm and for some reason it affected the medical office and not the hospital.
Something to keep in the back of your mind.
Thanks,
Kelly W.
Hello,
To test this out, what if you uninstall Symantec from one of the PCs and see if that works.  I know that you don't have Symantec on yours, but what about another PC?
I have seen nothing but bad stuff from the latest Symantec and it all deals with the slowness if not freezing of the network.
Thanks,
Kelly W.
Well I can try it on my other IT guy's computer. Let me try that out.
Remove Symantec off the server as well. This is exactly what happened to me and I had to remove Symantec off the servers for it to totally stop freezing the network.
Well I'm a little reluctant to remove Symantec off the server unless that's the only option. I'd rather turn off the server or something first.
Turning it off or disabling didn't work we had to fully remove the product.
Are we sure this would affect computers that did not even have Symantec installed?
Yes, when accessing the server the clients freezed or were slow.
Just as an update.

This morning things have been running okay so far.

I had turned off the server submitting the arp broadcast traffic, but I'd still be very shocked if that was what was causing the problems.

I was doing some packet monitoring on our primary file server and found that even if I dissconnect all my mapped drives, traffic is still being sent to the file server, so I didn't want to rule out this being the issue.

I did see something interesting with the packets though and I wanted to know if this might be the cause.

While surfing the file server via my computer, I noticed this type of traffic come up in the packet monitor...

NT Create Andx Request, Path: \<filedirectory>\Somedirectory:$DATA
NT Create Andx Response, Error: STATUS_OBJECT_NAME_NOT_FOUND

There is just a ton of these "Status object not found" packets in the packet sniffer. I tried to google it but didn't come up with much.

Could this be a potential issue?
Hello,
Do you have any linux or OS/2 clients on the network at all?
Thanks,
Kelly W.
I have a couple of Linux servers for our VOIP, and secure email solution, but no desktop clients. We do not have any OS/2 clients. We do have like....2 Mac users.
Hello,
Why I ask is that this is normal if you have linux or OS/2 clients on your network.
If you do not then it is not normal and you very possibly could have the Iraqi Worm:
http://www.mynetwatchman.com/kb/security/articles/iraqiworm/iraqitrace.htm
OR
You have Veritas Backup Exec running with the remote agents running on other servers
OR
You are running network printing with TCP/IP cards and SNMP or DLC is running on these cards and they are advertising at an alarming rate
OR
The old server was a domain controller and the Active Directory is now having problems.  This was a previous thread (https://www.experts-exchange.com/questions/23608404/Slow-logon-to-Windows-machines-if-old-demoted-DC-is-powered-off.html) that had been solved.
Thanks,
Kelly W.
I'll check both of these out. If nothing goes wrong today I'll assume it was the ARP Broadcast traffic, and will award the points for that.
Ok just locked up again.

I think I now know that it is with our file server. Everytime it locks up, I can't remote into the file server, but I can remote into any other server.

I'm going to monitor the traffic and see if I can look at what's going on during this.
I found that during the lock up if I open My Computer it would typically send out traffic to our file share for Mapped drive information.

Instead, during the lock up when I opened "My Computer" the traffic I see was this....

http://i219.photobucket.com/albums/cc252/Drakin030/PacketSniffer.jpg

Could it be a STP problem?
Hello,
Port 3052 is used for APC UPS's.  Port 60692 is being used for unix or hacking.
Could it be that the server that locks up is running some main network software for your UPSs (I know that the software is based on a flavor of linux and uses SNMP)?
Thanks,
Kelly W.
Not to my knowledge. The UPS is pretty much independant from the other servers, and the file server basically sits there, logs file changes, and hosts files.

I did log into our main Cisco 4506 Catalyst switch, and when I got to it the CPU was 82%, then at the next poll it went back down to 28%. Not sure if that helps or not.
Hello,
Looking at your screen print you have some IPX going on.  Do you have any machines that are attaching to Novell servers that are only IPX?
Or do you have any network based printers?  If it is the printers, I would turn IPX off on them since an IPX packet is about 4 times the size of an ethernet packet.
IPX packets can become chattier than ARP packets.
Thanks,
Kelly W.
No Novell servers, but we do have network printers which are on a different VLAN. We also have a single print server which shares all these printers.
I thought you had disconnected your mapped drives.

Have you had a chance to run perfmon on the server and see how busy the disks are during the slow downs?  That won't show up in task manager.

I did, but when I ran the packet sniffer again, it still showed traffic going out to the file server.

The file server is connected to a SAN. I've monitored that, and have had no issues with that at all.
More interesting stuff.

When the lock up happens. The Cisco switch that controls the network goes up to 100% CPU, and then shoots back down to 12% (It typically stays around 29%)
Hello,
I wonder if something is not coming from the server itself to shoot up your bandwidth.
I tend to navigate back towards the comment from yesterday to get rid of the Norton on the server.
For my clients, I will not let them have Nortons or McAfee on any of their PCs or servers since they are resource hogs and just do some very bad things in slowing down the network.
Thanks,
Kelly W.
Well I can try removing Symantec.

Do I just need to uninstall it from that server, or is there more I should do?
Hello,
Just uninstall it from the server.
More than likely you will have to reboot the server though.
Thanks,
Kelly W.
If it's not the problem, can I always just re-install it and it pick up all the clients again, or will I have to re-install the AV to everyones computer again?
Hello,
In theory you should be able to pick up the clients again.
I don't use Norton but in previous years you could just pick up the clients again.
Thanks,
Kelly W.
Alrighty, I removed the Reporting Agents, and Symantec system center from the server.

I'll be shocked if this works, but at this point...I just want SOMETHING to work.
On the server that everyone is mapping drives to... when you experience the slow down take a look at the Open Files list in the Computer Management MMC.  Do any of your users have an extremely large number of files open?

I want to say no, but I'm going to leave that open in case we have this happen again.

For instance right now I have like....20 files currently opened by users. I'll watch this and see if it changes.
The issue is still not fixed sadly.

Even after removing Symantec, it still locked up on me this morning. What's weird though is this happened right when my file server went offline due to some upgrades.
Hello,
So your server went down and you locked up?
Have you locked up while the server is up, since you removed Symantec?
Anything in the event viewer of the PC when you locked up when the server was down?
Thanks,
Kelly W.
Again... remove the PSTs from your file server.

Seriously...

I had brought the server down for some upgrades in hopes of fixing some of these issues. As SOON as it went down, I locked up. As SOON as it came back up, I was back up.

Nothing was showing up in my event log.

Here's my thought.....

When I run something like "My Computer" on my local PC, it sends a packet out like a query on the file server. We have multiply file servers, but this is the biggest. Anytime that server is offline, it's like the computer is still requesting this information and locks up until it gets what it needs.

Now the network utilization on this file server runs high at times, so maybe the reason for the lock up, is it's to busy handling other requests.

I guess my question now is...Why does the computer lock up if it can't reach the file server?

RPPreacher:

If you suggest I remove the PST's from the file server. (Archive PST's) Then where do you suggest putting them? Users can't just archive to their local PC's because if a hard drive fails, there is no redundancy for their archived information.
We purchased an email archive solution.  Or purchase more storage space on Exchange.  On set a retention policy.  It doesn't matter.

The fact is that if you remove the PSTs, the issue will go away.

Then figure out what to do with the PSTs.  Go to management and say "gosh, we have been putting these on the file server and this is what is causing our problems.  Here are the options."

Or you can keep trying everyone elses solutions that won't work.
What email archive solution did you purchase?

We can't just get more space on Exchange because users mailbox stores would be breaking 20 gigs. We can't delete information because were required to maintain all emails. The archive has to go somewhere.
Outside of my last statement, I think the issue is fixed now, and the result was the file server.

I'm not sure who to award the points to on this one.
We used EAS Email Archiving.

http://www.symantec.com/business/theme.jsp?themeid=globalsem_enterprisevault&header=0&footer=1&depthpath=0&tab=1&om_sem_eid=Google&om_sem_cid=biz_sem_Enterprise_Vault_US_English&om_sem_adid=EV_-_Platinum_Terms&om_sem_kw=email+archiving

But I would recommend disconnecting ALL PSTs first from all mailboxes.  Then MOVE them off the file server (just in case you missed someone).  Then see if the problem goes away.  (Who knows, maybe I'm full of it).

If the problem goes away, then worry about an archiving solution (there are a lot of them).

If the problem doesn't go away, then put them all back.
I think we knew the problem for the most part was your file server.  Moving the PST's may help, but I don't think that is your problem.

Did you get a chance to see if there were any excessive file accessing during slowdowns?  Obviously you couldn't do this when the server was offline.

If you have a user that is bottlenecking your disk performance, you can see these problems with the mapped drives.

Well I was logged into the file server during one of the lock ups, and the server was actually locked up.

So in my mind...The user will access My Computer for instance, which sends a request off to the file server. (Packet sniffing showed just a couple of queries perhaps due to mapped drives or something else)

Well, if the file server cannot respond then the computer locks up.

Well when the server locks up first, it cannot handle any requests, so that's what causes the lock up.

Now the biggy is...When the server locks up, CPU usage is <5%, Memory usage is minimal, network usage is minimal as well.

Now users access their files over the file server which connects to a SAN via 2GB fiber connection. The disk performance on the SAN has been monitored, and shows little to no activity. Perhaps the local disks on the file server are choking up, but there is no shared data on that server, so it doesn't make sense.

But the biggest mystery is why computers lock up if they cannot access a file server. I can bring down one of the other file servers and the users would not lock up. It's something about this one.
Hello,
Could it be you have a flaky network card on the server that is marginal and under a heavy load it puts itself to sleep or temporarily gives up?  OR maybe it is going into a hibernation mode on the card itself?
Just some thoughts.
Thanks,
Kelly W.
Actually that upgrade I made on the server this morning was a new Gigabit card.

Like right now the server is just locked up, and the CPU usage is 2% and the system.exe goes to 2%...then 0...then 2...That's the only pattern I can find.
Hello,
Possibly a different port on your switch?
Reaching for straws here.
Thanks,
Kelly W.
Could be bad or misconfigured switch port, but if the server itself is locking up when you are on it, it seems that would be more of the cause for your other problems, making them the symptoms.

Besides a file server, what else is that server doing?  Are you running the latest patches, drivers and firmware?  

What type of SAN do you have?

I am currently patching the server now with a few Windows updates.

The file server does nothing much that share files, and logs what changes are made to these files through File Site Pro.

The SAN is a EMC CX300 I believe.
We have a Clarrion as well and had similar problems.  We had users that were compiling data through mapped drives.  Whenever they compiled, it hung everyone else up.  The only way we tracked it down was that the users that were causing the problem had MANY files open at once, which is why I previously mentioned the mapped drives and what kind of file accesses you were seeeing.

Could be a coincidence, but the fact that you have the same SAN makes me suspicious that this could be a vendor driver problem.




Any update this morning?
ASKER CERTIFIED SOLUTION
Avatar of Drakin030
Drakin030

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
I would say no rush on the points.  Why don't you wait and make sure it's actually resolved.  That way nobody loses interest in the question (sometimes people do).

Well it's worse in the mornings, and it's about noon with no reports of problems at all. I can ride it our for a bit.
I agree let it run for a couple of days to see what happens.
So far still good.
Excellent.

As far as grading the question, I would say you got good feedback from everyone, but your answer is probably the one that should be selected as the Accepted Solution.  It's up to you if you want to mark other comments as assisted solutions and to award points.