Server locks up

We have a Gateway 980 Series server running Raid 5, Windows 2000 Server, 2 GB RAM.  The server started locking up about once a week.  Sometimes it would just freeze on the desktop screen and the mouse and keyboard functions would not function. Other times the monitor screen would go black.  All of the system fans continued to run and the tape drive would eject its tape and the CD would respond to open the tray. Our only option would be to power off and start back up. We had not added any new hardware or installed any new programs.  We did not have Windows Automatic updates turned on. There are no errors recorded in the BIOS Event Log; no information in the Memory.dmp file; and the only error recorded in the Event Logs is Event ID 1000 - "The reason for the unexpected shutdown is unknown".  The lockups started to occur more frequently to a once or twice on a daily basis to the point where we would sometimes power off the server and start it back up but it would not even get to the POST screen until it had sat powered off for a few hours.  We have replaced the power supplies and the power distribution board.  We have tested the RAM and used CPU Tester Pro 4 to check the motherboard for errors and they both came up clean.  

We formatted the drives and installed Windows Server 2003 but we continue to experience the same problems. Could a bad motherboard be creating the lockups or is there some other test we can run?

Thanks for your input.
Who is Participating?
Jbirk1Connect With a Mentor Commented:
You definitly have a motherboard problem.  Your motherboard is unstable and although it checks out okay, the tests don't catch the problem.

I have seen this on a different kind of server and several workstations.  BTW, does the motehrboard have any bulging capacitors.  Just though I would ask.

Regardless, since you have replaced the power supplies and the problem stays the same, your memory is good and CPUs almost never fail, it is the motherboard.

You do not have a software problem, else the system would always POST, and you would have had a memory dump or soemthing in the event log.  You may wish to turn on the show blue screen/core dump and not auto reboot.  However, I don't think it has bluescreened or any software problem.

It is definitly hardware.  Contact Gateway and replace the motherboard.  The problem will be gone.

cyarboroughAuthor Commented:
Thanks Justin.  The server is out of warranty but I can buy a new motherboard for about $200.00.  Our next step was to replace the motherboard but just wanted a confirmation from someone who has been through this before that we are making the right move.

Yes, you are making the right move.  I would change the motherboard.

I asked one of my friends, who is very good at what he does and he seconds my opinion that it is the motherboard.

We both agree that it is not a software problem; therefore, it must be a hardware problem.  Since you have replaced the server's power supply, it isn't likely a power problem.  Therefore, we think it is either a memory problem or a motherboard problem.  CPUs just don't fail often.  At work, our entire department has replaced 2 CPUs in the last 3 years and we have over 10,000 computers and at least 80 servers.   Even then one CPU was replaced becuase someone bent a pin. CPUs are just very reliable and not suspect.  A bad CPU usually makes the computer or server not even POST.  You said you tested the memory.  You did this with something like Memtest 86 or Memtest 86+ we assume, or something like it.

IF the memory is bad, it usually turns up something in the memory test.  My friend and I also think you should use an Uninterruptible Power Supply for really good measure because it could be just a sensitivity to power.  However, we really thing the only solution is to spend $200 for a motherboard.

It cannot be the CPU or the New Power Supply (at least not reasonably).  This leaves you with very little other hardware.  Teh only other hardware is memory, RAID, NICs, and probably a video card.  The memory is most likely Registered/ECC, so you cannot put it in a non-server motherboard in all honesty.  We aslo don't think it is an issue with RAID or a NIC, but it could be even though it is unlikely.  If it was our department, we both agree we would risk the $200 for a motherboard.

We figure it is 99% likely to fix your issue.  If it doens't it will be a waste of $200 unless you can return it and even then you may loose a restocking fee.  We think it is a good idea to buy from a vendor that allows a return just to be sure you do not have the 1% chance of being out $200.  However, in the scheme of things $200 probalby doens't matter much.  It looks like the point count for this question was 250 meaning it was about $3.15 to confirm the motherboard.

Justin and Chris
cyarboroughAuthor Commented:
The server is running on a UPS.

I checked the capacitors and none of them are bulging or leaking.  I am going to follow through with your recommendation to order the new motherboard.

Cool, Keep me apprised.

I am glad I could help and will be very happy when the new board solves your problem.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.