Solved

random power down / Tried everything I can think of

Posted on 2003-11-19
16
333 Views
Last Modified: 2010-04-26
Hey,
I have never had this problem before in 15 years... and just can't finger it out...

I have a server I built that powers down at random intervels.

This is NOT I repeat NOT an OS issue.
This occurs in the BIOS setup, at a DOS prompt, inside win2k, etc.

Here's what I have done;
1. replaced MB
2. replaced power supply [two different ones]
3. tried with all combinations of 1/2 cpu's in both sockets
4. tried with all combinations of DIMMS 1,2,3,4 in all slots [possible anyway]
5. setup with the very BARE MINIMUM of components to start [MB+PS+1 DIMM + 1 CPU] and NOTHING else.
6. Flashed all of last three versions of BIOS

7. Monitored heat/fan speed/etc of all components via BIOS and OS Tyan hardware monitors and found nothing O.O.O...

based on 3 & 4 I find it hard to beleive that BOTH cpu's and all four DIMMS could have a problem that could cause this...
based on 1, 2 and 5, I'm not sure what else to try

Here's what the systems has [built];
1. Tyan Tiger MP 2468 MB
2. 2x AMD Athlon MP 1800+ cpu's
3. 4x Corsair Registered ECC 512Mb DDR266 DIMMs
4. Ci Designs 2100 2U server enclosure built for dual AMD + Tyan MB
5. EMACS P2G-6460P 450W PS for Ci2100/24 pin dual Athlon MB's [Tyan approved]
6. Mylex AcceleRAID 170 PCI
7. (4) Seagate Cheetah 36Gb 10k rpm SCA U160 HD in SCA trays on backplane
8. plugging power into different sources..[diff. ups's, different outlets, panels etc.]

Any ideas????

500 points if you gimmie something I havent tried yet that works!!!!!!!!!!!!!!!

0
Comment
Question by:BrianBaley
  • 6
  • 5
  • 2
  • +2
16 Comments
 
LVL 14

Expert Comment

by:kronostm
ID: 9785154
All that comes in my mind now is a short on the case, probably the motherboard. You'll have to check carefully for that.
Have you excluded the powercord and the power switch that may short the power pins by mistake ?
0
 
LVL 32

Expert Comment

by:jhance
ID: 9786081
Have you checked your site power?  Perhaps you're AC power is not steady and the machine is shutting down when the power fails momentarily.  Try an UPS.
0
 
LVL 3

Expert Comment

by:terageek
ID: 9792303
I have seen issues where one bad component "kills" others which cause failures. Assuming you made the changes you mentioned in order, your first power supply could have fried something on both mobos making them both unstable.  You could try replacing both the power supply and the mother-board simultaneously.

If that doesn't work, you could try replacing the case.  It sounds like you tired everything else already.
0
 
LVL 1

Author Comment

by:BrianBaley
ID: 9797882
more details in regards to your posts;

jhance: I have tried multpile UPS and surge protectors. The power is coming from the same rack of UPS's all other 9 servers are running off of [including swapping with the UPS outlet of known good systems] and trying other small 450VA standalones etc.

terageek: When I got the RMA'd new MB, I plugged it into my spare {unused} PS, not the old one as the PS was my very first suspicion after seeing it power down even in the BIOS, etc.

kronostm: I have done this each time I have removed the board and was carefull to make sure nothing fell down in there etc... the weird thing is that it was running for quite a time before.... at any rate, I think I will remove the board and PS from the rack case and try running it outside of the box.

I wish I had a backup case ;-) but $750 bucks is hard to come by! find out soon...

0
 
LVL 32

Expert Comment

by:jhance
ID: 9798128
Is sabotage a possibility?  Perhaps some disgruntled person is shutting this thing down, either directly or remotely just to torque you off.
0
 
LVL 3

Expert Comment

by:terageek
ID: 9798200
Not giving up yet...

I noticed above you mentioned that you built a bare system with only MB+PS+1 DIMM + 1 CPU.  Did you have a HD in that setup?  How about case fans?  Any single electrical device could have a short, draw too much power and cause a "brownout" in the system which could power it down.

Alternatively, I might think that it is a flaky power switch, but that should cause the system to randomly turn on as well as off.
0
 
LVL 1

Author Comment

by:BrianBaley
ID: 9799154
jhance: no. I have sat there and stared at the fan rpms + cpu temps and watched it shutdown before my eyes while in the BIOS.

terageek: let's, see... I have tried it with each cpu separately [and no other fans etc], in both sockets alternately [4 combinations] so both fans would have to be shorted...

as far as the brownout situation, yeah, maybe, but it would be pretty strange for it to do it both after 2 minutes OR after 5 ours... if it was a component that was shorted one would think it would take relatively the same amount of time to fail, overheat, saturate, etc.

power switch: hhhmmm... this is one of those "membrane" switches.... but it seems to act normal when in use, i.e., it seems to take exactly the same amount of pressure and distance during pressing to make contact, etc... if the physical [membrane] switch were flaky you'd think it would power on randomly too....  

I'll remove it [switch] and short the green PS wire to ground and see what happens......

0
 
LVL 14

Accepted Solution

by:
kronostm earned 500 total points
ID: 9802412
That's getting weirder dude ...
I know RAM usually doesn't cause reboots, but try replacing with some other brand than Corsair. Mybe you can borrow a kingston or something. I'm not saying you have a bad RAM stick since you swapped them but maybe an incompatibility between Corsair memory and your  mobo.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 1

Author Comment

by:BrianBaley
ID: 9802750
kronostm: the server ran as is with these 4 sticks for weeks before this happened....
0
 
LVL 14

Expert Comment

by:kronostm
ID: 9803049
AARRGGHH  ! ! !
0
 
LVL 1

Author Comment

by:BrianBaley
ID: 9803217
I promise to let you guys know what I find and give you some points irregardless....

I have to finish a fresh linux install, then I'll get back to it.....
0
 

Expert Comment

by:nealbing
ID: 9808651
Here is a question for you....  In your server bios do you have options for the different power save functions?  If so have you checked them to see if maybe your hard drives are set to shutdown at a certain time etc.... This does not sound like an issue where you have bad hardware components at all.  I also have thought of external power fluctuations but do not think that would be the cause either.  I would strongly suggest you look into your bios settings. I would personally test this by setting some powersave settings when the OS is up and see if they will perform as you set them.  If they do not then it is definately a setting in your bios.
0
 
LVL 14

Expert Comment

by:kronostm
ID: 9887508
Brian ... can you provide some details too, please?
0
 
LVL 1

Author Comment

by:BrianBaley
ID: 9888550
Wish I could.... but I had to back-burner the project....

I still have yet to take out of case[rackmnt] and run without the switch [shorting power on pin at connector]...

When I get the chance I will forward info. It's the only thing it could be....
0
 
LVL 1

Author Comment

by:BrianBaley
ID: 9899668
kronstm:

well, I removed the power switch and shorted the connection and it has been running for two days with no shutdown....
Bizzare.

For reference, the case is a Ci Designs 2100. It utilizes a small "membrane" on-off switch. the switch is mounted on a small 1"x2" PCB behind the faceplate and the only other components on the board are three LED's, a small momentary [reset] switch and two resistors....

My guess is the switch was shorting....

Maybe that's why no one has used them since the T.Sinclair ;-)

0
 
LVL 14

Expert Comment

by:kronostm
ID: 9903350
Probably. So, my guess was probably right. You should have accepted my first post as answer since that led you probably to the solution. I'm saying this because this Q will be used maybe by others in the future to find solutions to similar problems, and an accepted answer is usually the one that points towards a good suggestion. Please remember this if you'll post other Qs in the future.

Good luck

Kronos

0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Many people don't really know what the difference is between memory and storage. And most regular users don't understand the relationship between any of those fancy words printed on the front of their new computer. Of course, it's perfectly fine -- …
Skype is a P2P (Peer to Peer) instant messaging and VOIP (Voice over IP) service – as well as a whole lot more.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now