Solved

workstations going offline

Posted on 2006-06-19
8
450 Views
Last Modified: 2013-12-23
I am a computer technician in a high school where none of us are experts but everyone expects us to be!! I currently have a problem I cannot solve. We have a domain server running windows server 2003 with active directory  (SP1 installed) plus two other applications servers running student programmes (also server 2003). The network consists of about 200 workstations mostly running windows XP with SP2. For some time we have had a problem with a workstation unexpectedly going offline but this has not been a great problem (it was worst with laptop using a wireless LAN but this improved with better NICs installed). In the last week, howwever, nearly every time someone logs on, student, staff, administrators it makes no difference, the workstation will go offline after a few moments. A window pops up saying " No longer connected to chsserv - working offline. (The domain server is called chsserv). Users can log on okay and use the internet (via a proxy server which is also the DHCP server) but cannot access any share in particular their own home drive. The problem exists over the whole network so I have ruled out faulty switches, NICs etc. You can ping offline machines to one another using IP address or computer names. The only chage made to the network is a new file/applications server and appart from a few recent updates nothing else has changed. As far as I know we have sufficient licences for all workstations but anyway the problem has only just arisen. Netmon, Dcdiag etc offer no obvious solutions! We have Sophos virus software installed on all machines with auto/remote updating. I have spent all today on this problem with no progress. Any help much appreciated.
Don Gyton
0
Comment
Question by:DonGyton
  • 5
  • 3
8 Comments
 
LVL 14

Expert Comment

by:ECNSSMT
ID: 16939049
Sounds like an excellent opportunity for on the job training.  Don't worry about the blood stains; they'll dry up eventually.  Quick organizational suggestion:
1. per person; list the area of expertise
2. list the disciplines needed to keep the school running; e.g. PC, LAN, WAN, SERVERS, etc
3. according to expertise assign the disciplines to the resident expert in a nice and even manner; voluntary ownership is also good; these guys will keep track of all solutions; they may take point on solving a problem but its a team effort to solve.  OK?
You will either have great teamwork and solutions...  or... its status quo...

As for the network issues; what are the errors codes / messages associated with the event; what do you see on the workstations when this happens, what are the errors are seen on the servers.

What is your network topology like?  Is your really great LAN going thru a single 4 port 10MB hub to get to your really great server farm?  HOW ARE THE CLIENTS REACHING THE SERVER; what is this path like?  Was this problem noticed before the new server was put in?

Regards,

 

0
 
LVL 14

Expert Comment

by:ECNSSMT
ID: 16939225
I am leaning towards your network being overly utilized; sight unseen.  Or should I say, SITE unseen.  You can probably use MS's perfmon on the server to see what network traffic is like.

Regards,
0
 

Author Comment

by:DonGyton
ID: 16941070
Thanks for the advice about on the job training - would be nice to have the resouces to do it but not in little ol' England!! There is only me as full-time technician/engineer/dogs body/shoulder to cry on plus my boss who has other admin responsibilities and an assistant likewise otherwisw disposed. Now you have dried your eyes I will continue!

There are no error codes as such, just a message on each workstation saying you are no longer connected to the server but you can continue working normally. The connection comea and goes for some users but not for others and some workstations are worse than others.Trouble is, students use a variety of computers so it is necessary to have files located centrally. I have stopped syncronisation and redirected my documents to the server but have temporarily undone these policy settings as a work-around. Files on either of the file servers are accessible as are the programmes and it is possible to ping machines from one to another.
Network traffic is heavyish but nothing has really changed in the past week or so to give rise to the problem which has only recently arisen.
The topology is, briefly: each 'area' of the site is connected to a switch via copper and the switches back to the systems office mostly via fibre. There is a central HP procurve switch 'block' to which the domain server and remote links are connected. I have swapped ports for the server but this has made no difference. The Proxy server is connected to this also and works fine as the internet is always available. As users can log in okay I am inclined to think that whilst not maybe perfect, the network itself is not the problem. I think the fault is in the server itself. Could the server NIC give these symptoms if it is faulty?
Sorry to run on without the usual high tech lingo!
0
 
LVL 14

Expert Comment

by:ECNSSMT
ID: 16942526
I'm thinking about this, but it possible that the NIC could be misconfigured or defective.  The easy solution here would be to see if the configuration of the server's NIC matches the port configuration of the HP procurve.  The most common issue is one device is set up for a specific speed and duplexing i.e. 10mb at 1/2 duplex and the other device something like 100mb at full duplex (or auto).  Best thing to do is check, but get both settings to something like 100mb at 1/2 duplex.  Hopefully that will be the setting of the switch interface.  

You may also want to check the amount of traffic going into the suspect server with perfmon; just set the builtin app to graph in and out going network traffic; ideally the traffic pattern should be anywhere between 0% and 30% with an occassional spike above 50%.  If its a flat line at 100%; well... its time to consider having 2 NICs teamed for network traffic (if the NICs are able to do so; work with the server vendor on this; otherwise I'd be writing a couple more paragraphs).

I doubt that the NIC is defective, but its a possibility. Check the system event logs and it should be blaring out at you with red errors.

If its only the two of you; one should handle the customer relations and prioritize the work while the other just concentrate on getting the work done.  Tackle the problems that effect the entire school first or is the most visible; it'll set the idea that you care about the well being of your customer (apparently solving a problem & solving a problem and then having the customer know about it are two entirely different things)

Regards,  
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:DonGyton
ID: 16945383
Thanks, I value your comments. I have checked system logs and there is nothing to suggest any hardware fault. In fact aoart from occasional blips plus an enduring but random problem with active directory, there are very few error reports anywhere. However, I have run perfmon and the network traffic is, at least for some of the time, way off the scale. I am developing a strategy at present to offload some of the traffic to this server as it is clearly overloaded so we may be homing in on the problem here somewhere. Incidently, the server is connected to the HP procurve via 1000mb full duplex and shows up as such. However, I will look into upgrading to two NICs if possible. For what it's worth, I changed a NIC on a workstation today from the on-board one to a seperate card. The workstation didn't go offline once. Network traffic rears its head again perhaps. I can think of no reason, though, why this problem should arise so suddenly unless something has gone wrong somewhere which is bringing the whole network down. We plan to have this tested in the next few days.

At present we are fortunate in that the 'customers' think we are wonderful and working really hard (we help out with a lot of other things around the school!) so we hope to keep it that way. But then there is noone else who knows anything much about IT so we are fortunate!!

Thanks again for your time. If any other thoughts spring to mind I would be grateful to hear.
0
 
LVL 14

Accepted Solution

by:
ECNSSMT earned 250 total points
ID: 16976626
hmm, 1GB NIC and it still flatlines occassionally, wow...

well it sound like the server agrees with the switch; its kinda interesting that you mentioned that replacing a NIC on the workstation alleviated possibly one drop issue.  I wonder if you have other bad NICs on your network;  I can't imagine 200-250 devices collectively doing anything to purposely cause the total utilization of a 1GB NIC; unless it was 'EVERYONE' logging in when they first get to school or a similiar scenario.  If you have the time and the hardware capability you may want to eyeball what kind of traffic is hitting the server when network services flatline.  If you have an available PC, and the HP procurve can do port mirroring , you can get a sniffer program like Ethereal (www.ethereal.com) to view the traffic.  You can mirror the port that supports the target server and what ever traffic hits the server will hit Ethereal;  if there are only a few specific MAC addresses hitting the server; those PC should be examined to see what is causing the traffic; if its logon or if you have roaming profiles ; then it could be legitimate traffic utilizing tons of bandwidth; but if its oddball broadcasts, then you may have 1 or more bad NICs causing a network storm.  

BTW the Ethereal may be a no cost item for your situation; I use it at home and it does compare rather well with the paid-for Sniffers that I have at work.

Good luck...

Regards,
0
 

Author Comment

by:DonGyton
ID: 16978338
You're not far off the mark re the network traffic but it appears the problem lay with a Group Policy we had not quite correctly set. I had set folder re-direction to the home files on the server and disabled offline files for students (which stops the infernal synchronisation at every log-on/off). This policy is apparantly better set per machine rather than to a user group and because we had bulit up a huge cache on machines and one thing led to another it resulted in work-stations not being able to acces the files on the server so going off-line for a few minutes before trying again. Modifying thie relevant group policy and re-arranging the OUs in active directory seem to have resolved the problem. Clearing the cached files in the CSC folder in windows also speeds up the local machine too.

I'm blaming microsoft's default settings! XP seems cleary designed for an office environment with possibly only one or two users of a particular computer. In a school each computer has between 100 and 200 users so local files get huge and sync. takes ages - no good when another class is waiting to og on. Trying to pout this right caused our problem as files etc built up.

We live and learn!!

Thanks for your help - it wasw not wasted at all. I will have a look at the sniffer programme as I am sure it will prove useful at some stage.

Best wishes.
0
 
LVL 14

Expert Comment

by:ECNSSMT
ID: 16978806
thanks for the points.  I couldn't see that scenario coming but I am glad there were some positives in this.

Regards,
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

FIPS stands for the Federal Information Processing Standardisation and FIPS 140-2 is a collection of standards that are generically associated with hardware and software cryptography. In most cases, people can refer to this as the method of encrypti…
This article is in response to a question (http://www.experts-exchange.com/Networking/Network_Management/Network_Analysis/Q_28230497.html) here at Experts Exchange. The Original Poster (OP) requires a utility that will accept a list of IP addresses …
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now