Server Freezing

I have a Compaq DL380 w\ SCSI raid 5 and a 1 gig NIC and a scsi adapter for tape library added. The server is running Veritas as its only application.
I did have Virtual Server 2005 installed but removed it during troubleshooting.
The server will suddenly freeze and when it does this I have to do a hard boot in order to get it back up. When it freezes nothing will respond, keyboard, mouse, no Ctrl Alt Delete, and I can not ping the box (no network connection) completely unresponsive. The event logs are clean as far as no errors at the time of shutdown. When I preform the hard reboot there is no determination on how long it will stay responsive. Sometimes it will be up for 1 week and sometimes 15 minutes. Usually no errors pop up and it is still at the Ctrl Alt Delete screen but this morning it started giving me this error.

***Hardware Malfunction***
Call you vendor for support
NMI: Parity Check / Memory Parity Error
***The Sysytem Has Halted***

At bootup I get this alert as it is running its checks.
No SCSI Bios installed
I do not know what this means

This morning I have swapped out all RAM and if it freezes I will remove the Gig card. I am guessing that this is a hardware malfunction so I am going to start swapping out hardware to pinpiont the issue.

I have no clue why this has started and not sure what I should be trying to get it resloved.
Where do I start with issues like this???
Any advice is greatly appreciated.
LVL 2
UnifiedITAsked:
Who is Participating?
 
Lee W, MVPTechnology and Business Process AdvisorCommented:
I would say you start with what you did.  Swap out the memory first.  But if that doesn't work and you are certain it's good memory, then it's probably the motherboard - that's where the memory plugs in to and if the board (in the memory subsystem) is damaged, you could get error messages suggesting the memory is bad.

I would also be calling HP/Compaq, especially if the system is under warranty.
0
 
LucFEMEA Server EngineerCommented:
Hi UnifiedIT,
>>***Hardware Malfunction***
>>Call you vendor for support
>>NMI: Parity Check / Memory Parity Error
>>***The Sysytem Has Halted***
This means that your bios found a problem in the parity of your RAM, so exchanging the RAM like you did is the best thing to start with. Bad RAM can give all sorts of errors, from minor problems to BSOD's and freezing.

>>At bootup I get this alert as it is running its checks.
>>No SCSI Bios installed
>>I do not know what this means
Most likely you have more than one SCSI card installed on which one doesn't have any devices attached, at the moment the SCSI card detects this it'll unload it's bios to save on system resources, nothing to worry about.

Greetings,

LucF
0
 
LucFEMEA Server EngineerCommented:
Sorry, missed your comment leew...
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
sciwriterCommented:
Many possibilities.  It could be the ram speed is too slow for the system.  Make sure the RAM in the system is at least CAS2.5 -- CAS 3 is too slow, and this MB might require CAS 2 to run right.  Check the BIOS for the RAM setting -- make sure the refresh is set to standard, not too fast.  If need be, do a "load power on defaults" in the BIOS, which will set the RAM refresh to the "safest speed".

After you have done all that, go through the other options.  Don't discount the possiblity that when windows was installed, something went into the registry wrong.  This could be something as innocuous as the ACPI power management state of the motherboard, but unless that is corrected, it will not work right, no matter what.
0
 
PromethylCommented:
I have an answer for this one and you're going to love it.

I had a HP (Compaq Proliant) DL380 Generation 3 which hung when I started doing network IO.

In fact, I have three of them do the same thing. I worked it from all ends.

I called my guys. They didn't help.

I installed firmware drivers. HBA, BIOS, PERC 3i... Ran diagnostics til I was blue in the face.

Finally I called HP. They told me to call Microsoft it was an OS issue.

I explained to him it was a hardware issue. Cited emprical evidence. Told him to ship me a motherboard.

He told me could get fired if he did that. I called his bluff. He said he could at least get a demerit.

So I made him a bet. I bet him 25$ (in the form of a thinkgeek.com gift cert) the problem was the motherboard. I told him if I was right he owed me nothing.

He shipped the board, it worked, I claimed victory.

It's a manufacturing defect. Call HP. Demand a system board under warranty. I bet you 25$ that's it. <g>

0
 
sciwriterCommented:
No surprise, HP and Dell MBs are full of these kinds of problems.  If you are right, it saves the questioner a lot of problem, as there are about 25 other possibilities, but it sounds like a poker game to get a MB??
0
 
PromethylCommented:
Eh. Welcome to the cost saving world of call centers. I got Canada when I called. I'm 90% positive. The secondary problem also indicates motherboard / bios to me. So... It's worth a shot.
0
 
UnifiedITAuthor Commented:
I put 4 gig of new RAM in the server and it has bee running smooth for 8 days. Easy fix compared to a motherboard.
Thanks for all the comments
UnifiedIT
0
 
PromethylCommented:
Oh well, it could have been the mobo. =) Was in my case x3.

Just goes to show you -- you can't rely on machines being brand new to be perfect.

Keep in mind the G4s just rolled out. It's been better IMHO,.
0
 
LucFEMEA Server EngineerCommented:
Good to hear UnifiedIT ;)

LucF
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.