random power down / Tried everything I can think of

I have never had this problem before in 15 years... and just can't finger it out...

I have a server I built that powers down at random intervels.

This is NOT I repeat NOT an OS issue.
This occurs in the BIOS setup, at a DOS prompt, inside win2k, etc.

Here's what I have done;
1. replaced MB
2. replaced power supply [two different ones]
3. tried with all combinations of 1/2 cpu's in both sockets
4. tried with all combinations of DIMMS 1,2,3,4 in all slots [possible anyway]
5. setup with the very BARE MINIMUM of components to start [MB+PS+1 DIMM + 1 CPU] and NOTHING else.
6. Flashed all of last three versions of BIOS

7. Monitored heat/fan speed/etc of all components via BIOS and OS Tyan hardware monitors and found nothing O.O.O...

based on 3 & 4 I find it hard to beleive that BOTH cpu's and all four DIMMS could have a problem that could cause this...
based on 1, 2 and 5, I'm not sure what else to try

Here's what the systems has [built];
1. Tyan Tiger MP 2468 MB
2. 2x AMD Athlon MP 1800+ cpu's
3. 4x Corsair Registered ECC 512Mb DDR266 DIMMs
4. Ci Designs 2100 2U server enclosure built for dual AMD + Tyan MB
5. EMACS P2G-6460P 450W PS for Ci2100/24 pin dual Athlon MB's [Tyan approved]
6. Mylex AcceleRAID 170 PCI
7. (4) Seagate Cheetah 36Gb 10k rpm SCA U160 HD in SCA trays on backplane
8. plugging power into different sources..[diff. ups's, different outlets, panels etc.]

Any ideas????

500 points if you gimmie something I havent tried yet that works!!!!!!!!!!!!!!!

Adrian DobrotaNetworking EngineerCommented:
All that comes in my mind now is a short on the case, probably the motherboard. You'll have to check carefully for that.
Have you excluded the powercord and the power switch that may short the power pins by mistake ?
Have you checked your site power?  Perhaps you're AC power is not steady and the machine is shutting down when the power fails momentarily.  Try an UPS.
I have seen issues where one bad component "kills" others which cause failures. Assuming you made the changes you mentioned in order, your first power supply could have fried something on both mobos making them both unstable.  You could try replacing both the power supply and the mother-board simultaneously.

If that doesn't work, you could try replacing the case.  It sounds like you tired everything else already.
BrianBaleyAuthor Commented:
more details in regards to your posts;

jhance: I have tried multpile UPS and surge protectors. The power is coming from the same rack of UPS's all other 9 servers are running off of [including swapping with the UPS outlet of known good systems] and trying other small 450VA standalones etc.

terageek: When I got the RMA'd new MB, I plugged it into my spare {unused} PS, not the old one as the PS was my very first suspicion after seeing it power down even in the BIOS, etc.

kronostm: I have done this each time I have removed the board and was carefull to make sure nothing fell down in there etc... the weird thing is that it was running for quite a time before.... at any rate, I think I will remove the board and PS from the rack case and try running it outside of the box.

I wish I had a backup case ;-) but $750 bucks is hard to come by! find out soon...

Is sabotage a possibility?  Perhaps some disgruntled person is shutting this thing down, either directly or remotely just to torque you off.
Not giving up yet...

I noticed above you mentioned that you built a bare system with only MB+PS+1 DIMM + 1 CPU.  Did you have a HD in that setup?  How about case fans?  Any single electrical device could have a short, draw too much power and cause a "brownout" in the system which could power it down.

Alternatively, I might think that it is a flaky power switch, but that should cause the system to randomly turn on as well as off.
BrianBaleyAuthor Commented:
jhance: no. I have sat there and stared at the fan rpms + cpu temps and watched it shutdown before my eyes while in the BIOS.

terageek: let's, see... I have tried it with each cpu separately [and no other fans etc], in both sockets alternately [4 combinations] so both fans would have to be shorted...

as far as the brownout situation, yeah, maybe, but it would be pretty strange for it to do it both after 2 minutes OR after 5 ours... if it was a component that was shorted one would think it would take relatively the same amount of time to fail, overheat, saturate, etc.

power switch: hhhmmm... this is one of those "membrane" switches.... but it seems to act normal when in use, i.e., it seems to take exactly the same amount of pressure and distance during pressing to make contact, etc... if the physical [membrane] switch were flaky you'd think it would power on randomly too....  

I'll remove it [switch] and short the green PS wire to ground and see what happens......

Adrian DobrotaNetworking EngineerCommented:
That's getting weirder dude ...
I know RAM usually doesn't cause reboots, but try replacing with some other brand than Corsair. Mybe you can borrow a kingston or something. I'm not saying you have a bad RAM stick since you swapped them but maybe an incompatibility between Corsair memory and your  mobo.

BrianBaleyAuthor Commented:
kronostm: the server ran as is with these 4 sticks for weeks before this happened....
Adrian DobrotaNetworking EngineerCommented:
BrianBaleyAuthor Commented:
I promise to let you guys know what I find and give you some points irregardless....

I have to finish a fresh linux install, then I'll get back to it.....
Here is a question for you....  In your server bios do you have options for the different power save functions?  If so have you checked them to see if maybe your hard drives are set to shutdown at a certain time etc.... This does not sound like an issue where you have bad hardware components at all.  I also have thought of external power fluctuations but do not think that would be the cause either.  I would strongly suggest you look into your bios settings. I would personally test this by setting some powersave settings when the OS is up and see if they will perform as you set them.  If they do not then it is definately a setting in your bios.
Adrian DobrotaNetworking EngineerCommented:
Brian ... can you provide some details too, please?
BrianBaleyAuthor Commented:
Wish I could.... but I had to back-burner the project....

I still have yet to take out of case[rackmnt] and run without the switch [shorting power on pin at connector]...

When I get the chance I will forward info. It's the only thing it could be....
BrianBaleyAuthor Commented:

well, I removed the power switch and shorted the connection and it has been running for two days with no shutdown....

For reference, the case is a Ci Designs 2100. It utilizes a small "membrane" on-off switch. the switch is mounted on a small 1"x2" PCB behind the faceplate and the only other components on the board are three LED's, a small momentary [reset] switch and two resistors....

My guess is the switch was shorting....

Maybe that's why no one has used them since the T.Sinclair ;-)

Adrian DobrotaNetworking EngineerCommented:
Probably. So, my guess was probably right. You should have accepted my first post as answer since that led you probably to the solution. I'm saying this because this Q will be used maybe by others in the future to find solutions to similar problems, and an accepted answer is usually the one that points towards a good suggestion. Please remember this if you'll post other Qs in the future.

Good luck


