Link to home
Start Free TrialLog in
Avatar of DonGyton
DonGyton

asked on

workstations going offline

I am a computer technician in a high school where none of us are experts but everyone expects us to be!! I currently have a problem I cannot solve. We have a domain server running windows server 2003 with active directory  (SP1 installed) plus two other applications servers running student programmes (also server 2003). The network consists of about 200 workstations mostly running windows XP with SP2. For some time we have had a problem with a workstation unexpectedly going offline but this has not been a great problem (it was worst with laptop using a wireless LAN but this improved with better NICs installed). In the last week, howwever, nearly every time someone logs on, student, staff, administrators it makes no difference, the workstation will go offline after a few moments. A window pops up saying " No longer connected to chsserv - working offline. (The domain server is called chsserv). Users can log on okay and use the internet (via a proxy server which is also the DHCP server) but cannot access any share in particular their own home drive. The problem exists over the whole network so I have ruled out faulty switches, NICs etc. You can ping offline machines to one another using IP address or computer names. The only chage made to the network is a new file/applications server and appart from a few recent updates nothing else has changed. As far as I know we have sufficient licences for all workstations but anyway the problem has only just arisen. Netmon, Dcdiag etc offer no obvious solutions! We have Sophos virus software installed on all machines with auto/remote updating. I have spent all today on this problem with no progress. Any help much appreciated.
Don Gyton
Avatar of ECNSSMT
ECNSSMT

Sounds like an excellent opportunity for on the job training.  Don't worry about the blood stains; they'll dry up eventually.  Quick organizational suggestion:
1. per person; list the area of expertise
2. list the disciplines needed to keep the school running; e.g. PC, LAN, WAN, SERVERS, etc
3. according to expertise assign the disciplines to the resident expert in a nice and even manner; voluntary ownership is also good; these guys will keep track of all solutions; they may take point on solving a problem but its a team effort to solve.  OK?
You will either have great teamwork and solutions...  or... its status quo...

As for the network issues; what are the errors codes / messages associated with the event; what do you see on the workstations when this happens, what are the errors are seen on the servers.

What is your network topology like?  Is your really great LAN going thru a single 4 port 10MB hub to get to your really great server farm?  HOW ARE THE CLIENTS REACHING THE SERVER; what is this path like?  Was this problem noticed before the new server was put in?

Regards,

 

I am leaning towards your network being overly utilized; sight unseen.  Or should I say, SITE unseen.  You can probably use MS's perfmon on the server to see what network traffic is like.

Regards,
Avatar of DonGyton

ASKER

Thanks for the advice about on the job training - would be nice to have the resouces to do it but not in little ol' England!! There is only me as full-time technician/engineer/dogs body/shoulder to cry on plus my boss who has other admin responsibilities and an assistant likewise otherwisw disposed. Now you have dried your eyes I will continue!

There are no error codes as such, just a message on each workstation saying you are no longer connected to the server but you can continue working normally. The connection comea and goes for some users but not for others and some workstations are worse than others.Trouble is, students use a variety of computers so it is necessary to have files located centrally. I have stopped syncronisation and redirected my documents to the server but have temporarily undone these policy settings as a work-around. Files on either of the file servers are accessible as are the programmes and it is possible to ping machines from one to another.
Network traffic is heavyish but nothing has really changed in the past week or so to give rise to the problem which has only recently arisen.
The topology is, briefly: each 'area' of the site is connected to a switch via copper and the switches back to the systems office mostly via fibre. There is a central HP procurve switch 'block' to which the domain server and remote links are connected. I have swapped ports for the server but this has made no difference. The Proxy server is connected to this also and works fine as the internet is always available. As users can log in okay I am inclined to think that whilst not maybe perfect, the network itself is not the problem. I think the fault is in the server itself. Could the server NIC give these symptoms if it is faulty?
Sorry to run on without the usual high tech lingo!
I'm thinking about this, but it possible that the NIC could be misconfigured or defective.  The easy solution here would be to see if the configuration of the server's NIC matches the port configuration of the HP procurve.  The most common issue is one device is set up for a specific speed and duplexing i.e. 10mb at 1/2 duplex and the other device something like 100mb at full duplex (or auto).  Best thing to do is check, but get both settings to something like 100mb at 1/2 duplex.  Hopefully that will be the setting of the switch interface.  

You may also want to check the amount of traffic going into the suspect server with perfmon; just set the builtin app to graph in and out going network traffic; ideally the traffic pattern should be anywhere between 0% and 30% with an occassional spike above 50%.  If its a flat line at 100%; well... its time to consider having 2 NICs teamed for network traffic (if the NICs are able to do so; work with the server vendor on this; otherwise I'd be writing a couple more paragraphs).

I doubt that the NIC is defective, but its a possibility. Check the system event logs and it should be blaring out at you with red errors.

If its only the two of you; one should handle the customer relations and prioritize the work while the other just concentrate on getting the work done.  Tackle the problems that effect the entire school first or is the most visible; it'll set the idea that you care about the well being of your customer (apparently solving a problem & solving a problem and then having the customer know about it are two entirely different things)

Regards,  
Thanks, I value your comments. I have checked system logs and there is nothing to suggest any hardware fault. In fact aoart from occasional blips plus an enduring but random problem with active directory, there are very few error reports anywhere. However, I have run perfmon and the network traffic is, at least for some of the time, way off the scale. I am developing a strategy at present to offload some of the traffic to this server as it is clearly overloaded so we may be homing in on the problem here somewhere. Incidently, the server is connected to the HP procurve via 1000mb full duplex and shows up as such. However, I will look into upgrading to two NICs if possible. For what it's worth, I changed a NIC on a workstation today from the on-board one to a seperate card. The workstation didn't go offline once. Network traffic rears its head again perhaps. I can think of no reason, though, why this problem should arise so suddenly unless something has gone wrong somewhere which is bringing the whole network down. We plan to have this tested in the next few days.

At present we are fortunate in that the 'customers' think we are wonderful and working really hard (we help out with a lot of other things around the school!) so we hope to keep it that way. But then there is noone else who knows anything much about IT so we are fortunate!!

Thanks again for your time. If any other thoughts spring to mind I would be grateful to hear.
ASKER CERTIFIED SOLUTION
Avatar of ECNSSMT
ECNSSMT

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You're not far off the mark re the network traffic but it appears the problem lay with a Group Policy we had not quite correctly set. I had set folder re-direction to the home files on the server and disabled offline files for students (which stops the infernal synchronisation at every log-on/off). This policy is apparantly better set per machine rather than to a user group and because we had bulit up a huge cache on machines and one thing led to another it resulted in work-stations not being able to acces the files on the server so going off-line for a few minutes before trying again. Modifying thie relevant group policy and re-arranging the OUs in active directory seem to have resolved the problem. Clearing the cached files in the CSC folder in windows also speeds up the local machine too.

I'm blaming microsoft's default settings! XP seems cleary designed for an office environment with possibly only one or two users of a particular computer. In a school each computer has between 100 and 200 users so local files get huge and sync. takes ages - no good when another class is waiting to og on. Trying to pout this right caused our problem as files etc built up.

We live and learn!!

Thanks for your help - it wasw not wasted at all. I will have a look at the sniffer programme as I am sure it will prove useful at some stage.

Best wishes.
thanks for the points.  I couldn't see that scenario coming but I am glad there were some positives in this.

Regards,