Link to home
Start Free TrialLog in
Avatar of mazur
mazur

asked on

Network freeze for 30-40 seconds

We have a network of about 25 Macs connected through a router to a large PC WAN. Several times a day, I find that all of our Macs seem to freeze for about 30 seconds. During this time the double-arrow icon appears steadily in the upper left, as if some network traffic is going on. The cursor does move, but all other actions are just queued up until the freeze goes away. P.S. We are also using MS Mail.
Any ideas?
Avatar of TheHub
TheHub

Is there a local Mac server for the group of Macs?
If so, what OS is on the server and what version of AS?
What MacOS is running on the Macs?
Are the Macs PPC, 68k or a mixture?
Waht version of AppleShare is running on each Mac?
Is there a Hub or Switch between the Router and the Macs?
My first guess is that file sharing is turned on. Under System Folder:Control Panels:Sharing Setup . Unless you need it, turn file sharing off. It is very rude and will steal all of the CPU time occasionally to check network status. This isn't that big of a deal with System 7.6 - 8.0 and Open Transport. If MS Mail is running through mounted servers, you still don't need file sharing to use it. Hope this helps!
We can only guess without more info. See my previous comment.
Avatar of mazur

ASKER

Yes, file sharing is on; but we need it. So that is not an option. As for TheHub's comments - there are no AppleShare servers running, we have a mix of both 68K and PPC Macs (both experience the freeze-ups) running various versions of MacOS from 7.1 to 7.6.1, and all of the Macs connect to Asante Hubs (and one Asante switch). One hub is 100Base-T; the rest are 10Base-T, as is the connection to the PC Router.
Upgrade the 68k machines to 7.5.5, OT 1.1.2, AppleShare 3.7. The most important part of the upgrade is AppleShare. All kinds of weird things happen when there are different versions of AppleShare running on the same network. AppleShare 3.7 allows for TCP/IP connections which greatly improve performance. AppleShare 3.7 requires OpenTransport (1.1.2 strongly reccommended). OpenTransport 1.1.2 requires 7.5.x. Please note that one motherboard design will not run on 7.5.5, but works on 7.5.3 or 7.6.1. It is either Tanznia or Alchemy (PPC), I forget. I will add a comment after I locate the info. The MacOS upgrade to 7.5 is not free, but the others are available for download from:
ftp://ftp.info.apple.com

If any 68k machines use the 68020, 68LC020 or 68000, you already know it is time to upgrade the machines. If you have an external drive, assemble a MacOS 7.5.5 (custom - for any Mac), save it in a Stuffit file (for future use) and test boot all the machines.
-GoodLuck, TheHub
Further speed improvements can be attained with the Shareway IP Gateway from OpenDoor Networks:
http://www2.opendoor.com/asip/
Avatar of mazur

ASKER

Well I tried updating a few machines and even left the 68K computers off for a while. Problem still exists. Of course, I have no control over other Macs on the WAN so I don't know their status. But I wouldn't think it would matter unless one was mounted. For that matter, the problem has shown up on machines which only have the PC server mounted (the one we use for MS Mail access). I haven't been able to see if that connection is directly related to the problem.
I have heard that MS Mail is a problem child.
We had some Macs connected through a gateway that converted appletalk to ethernet that was way overloaded and we would get a similar problem. This points to the problem with the Router....see if you can get hold of a replacement router.

I put this as a comment as it isn't really an answer, but to fix these sort of network problems you need to get in the network experts in to try lots of stuff....good luck!
Sounds to me like too much traffic on the network. I'd look into some traffic analysis applications. Perhaps your network needs to be segmented. If the router is not smart enough, it could be letting PC traffic onto your net that shouldn't be, or it could be letting your traffic onto the PC network. Are they experiencing any problems. If the network 10BT or Thin Net?
Avatar of mazur

ASKER

Our network is divided into four segments and none of them carry a large amount of traffic. When the freeze occurs, I've looked at the "% utilization" and "% collision" lights on all of the hubs - nothing unusual seems to be happening, at least traffic-wise.

The PC side of the LAN had experienced a similar problem a while back when certain docking stations were turned on. Seems the NIC cards in these stations would "put out DC" on the network causing freezes for 30-40 seconds. But they tell me all of these cards have been replaced and they no longer have this problem on their side.
Sounds like a device is causing a broadcast storm.  try plling wires to a couple computers at a time to see if you can narrow it down to which computer.  Also pull the connection to the WAN ans see if it happens.  They're responding to some sort of broadcast message and taking up the resources.  If all else fails you're going to have to sniff the wire and look at packets to find the offending device.  You may need to bring in a network engineer, unless you are familiar with protocol sniffers and packet level data.
Avatar of mazur

ASKER

This is a comment and an answer.

Comment is that the type of PC LAN you are using could play an important factor in the network as a whole.

Appletalk is a very slow protocol, and I have had a couple of sites where a similar event takes place, with only 5 MACs on the LAN.

If you are using Novell for the WAN, you can pull the ipx client off the novell web site.  I have tested it on the PPC machines, both laptop and desktop.  Refreshingly enough, the access time just screams compared to the standard appletalk.

Course this will only work if you are on a Novell.

I would also make certain that the IP software on the mac's is up to date.  I am using 7.5.5. on my laptop and have had much less problem with IP than on previous rev's of the OS.  

Also check the version of the print manager.  If you are using the Laserwriter 8 drivers, (mostly for postscript) I have found that you need 8.3 or newer.  This made a huge difference in a multi-platform site I have.

Last pessimistic item, make certain that ALL your cabling is run, not near any photocopiers, fridges, or other large electrical appliances.  The rf from these can cause a network to have heart attacks, which it does recover from, but can take a minute or so to clear collisions, etc.

Goot Luck


Good Luck!

To cemaylor: Thanks for the suggestions.

The server that we connect to is running Windows NT and we use it only for mail. We sometimes use IP (e.g. Netscape) through the WAN, but most everyhing on the Macs is via Ethertalk. Because of the range of computers we have (SE's though PowerMacs), there are various versions of system software running. And, once again, when the disturbance occurs, I see no unusual traffic on the hub's status indicators.

SO... you mentioned that you saw similar events on a couple of other networks. Did you eventually track them down and eliminate them?
Avatar of mazur

ASKER

This is probably a red herring, but it just so happens that the
PAP (Printer Access Protocol) retry timeout is 30 seconds.
Is it possible that you have a printer on the network that is
periodically going offline? I have seen similar networks delays
on our network when someone has tried getting PAP Status from
an offline printer. The PAPStatus() call in the the LW driver
is synchronous, and will certainly freeze the local Mac for 30
seconds if the printer is not responding. I don't know how much
traffic it puts on the network when it does this, but EtherPeek
would tell you.
To sidewinder,

Interesting possibility... although I don't know why a printer would go offline so often (adding paper perhaps?). Also, if I'm not printing at the time, would the LW driver be active (using Desktop Printing)? Finally, why would this freeze-up affect several machines simultaneously?
Avatar of mazur

ASKER

You're quite right. Adding paper would not stop a LaserWriter responding to status requests, in fact turning the power off is the only sure way that I know of doing this. Also I can't imagine why it could hit several machines at once unless the LW driver is bombing the network during futile status requests. I think not.

It sounds unlikely that a LaserWriter is the problem. Hmmmm...

Do you have a copy of EtherPeek? There is also a little utility on Apple's developer CDs called DMZ ( i Don't know what it MeanZ ) which lists everything which is registered on AppleTalk. I'm sure that it's available on the Web somewhere. I'll have a look.
This problem is most likely due to very heavy network traffic over the WAN.  If you are connected to any file servers, the MacOS will periodically attempt to refetch all the information about them.  Also, there is a constant monitoring of the ethernet port going on.  If there is a major surge of traffic which causes an overflow, it could easily cause the Macs to drop off the network momentarily (a self-protection device), and then in order to rejoin the network, they must pull off all the network info, a process which usually takes from 15-45 seconds from disconnect to reconnect.

Ask your network manager to run a network activity monitor to determine where the surges are coming from.  It could easily be a mis-configured NT server or the likes...  Alternately, it could just be that your company needs to expand its bandwidth...

I hope that I answered all your questions.
-Curtis
OK. I finally downloaded the demo version of EtherPeek. And fortunately was able to capture the "event" within the demo's 5 min window. So here is what appears to be happening:

Since the MS Mail Notifier needs to see its server, we always have this server mounted (it is an NT server on the PC side of the router). For what its worth, the server is mounted but no Finder windows are opened.

Thus, as caeisenb noted, the MacOS is constantly polling the server. In fact, EtherPeek showed these transactions occuring anywhere from 1-10 seconds apart. Each transaction consists of three packets. First, the Mac sends an ASP Cmd to GetVolParms. The server responds with an ATP TRsp packet which is then followed by the Mac sending an ATP TRel to end the transaction. Normally, of course, this entire transaction takes place in less than 1 second. Usually this transaction is followed by another trio of packets with the command GetFileDirParms.

So far OK. We have all of our Macs sending these queries to the server every few seconds. Then something happens! All of a sudden, the server no longer responds to the ASP Cmds from the Macs. No ATP Trsp packets. So the Macs continue to resend the ASP Cmds every couple of seconds or so. Sometimes the server will send back a rash of ASP Tckl packets (one to each Mac) letting them know that it's still there. But it is not until 30-45 seconds later that the TRsp packets come back (followed by each Mac's TRel). After this happens, the Macs then return to normal and start reacting again, executing any keystrokes , mousedowns, etc. that have been queued up.

So now we know what's happening.... but why? Why does the MacOS wait for these packets before it can go on? It must know that it may never get a response. And the timeout (also 30 seconds) should not be synchronous, right? I'm still trying to find out why the server stops responding as well. Any comments?
Avatar of mazur

ASKER

This is a known limitation of the current MacOS.  Until Rhapsody comes around, or until MacOS becomes fully multithreaded with preemptive multitasking, this network freeze will continue to plague your machines until you fix the network or the server.

My recommendation in the meantime is to update your Macs to OS8.  The new OS is much more threaded and multitasking, and the freezes should (if they happen at all) clear up quicker.  Besides, OS8 is a terrific speed enhancer for PowerMacs and improves stability all around!

I hope I answered your questions.

-Curtis
ASKER CERTIFIED SOLUTION
Avatar of caeisenb
caeisenb
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial