Avatar of homerslmpson
homerslmpsonFlag for United States of America

asked on 

Help with HP ProLiant ML350 G5 Constant BSOD:

Hi all.
We've had this server as a domain controller in a remote branch for a few years now and have had problems with rebooting at random.
The rebooting would log a BSOD error in the event log.
I've looked at the minidump using BSOD View and it always states the same thing:
"The problem seems to be caused by the following file: ntoskrnl.exe
PAGE_FAULT_IN_NONPAGED_AREA
*** STOP: 0x00000050 (0xe2474000, 0x00000000, 0xbfab34ec, 0x00000000)
*** ntoskrnl.exe - Address 0x8087c4a0 base at 0x80800000 DateStamp 0x4b27c5b8
"
Some info was omitted because it doesn't seem relevant.
If you need more details from the crash, just let me know.
This server is running Windows Server 2003 R2 SP2 (32-bit).

There are also the same events repeating in the event log every xx minutes:
1)   HP NC373i: The network link is down.  Check to make sure the network cable is properly connected.
2)  The power subsystem is now in a non-redundant state.
3)  Power supply 2 has failed.
4)  Power supply 2 is now operating correctly.

I figured these are related to the HP Management agents.

I tried updating the server to the newest HP PSP at the time (8.60) but it doesn't seem to help.
I downloaded the newest one (8.70) but I don't feel like it's going to solve the problem.

Are the blue screen reboots and the HP agent warnings related?

This has been an issue for quite a while now and I honestly don't know how to handle it.

Any help would be appreciated.
Windows Server 2003Server HardwareHardwareComponents

Avatar of undefined
Last Comment
David
SOLUTION
Avatar of rindi
rindi
Flag of Switzerland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

I downloaded the Windows version of UBCD but I'm unsure if I should have done that.  It looks like they have a different one on the main page.  Does it matter?

The antivirus is up to date.

This may actually be the first time I got the warning about the power supply.

The one that usually shows up is below:
"System Information Agent: Health: The Fan Sub-system has lost redundancy.  Replace any failed or missing fans.
 Chassis: '0'
[SNMP TRAP: 6037 in CPQHLTH.MIB]
"

That will immediately be followed up with this one:
"System Information Agent: Health: The Fan Sub-system has returned to a redundant state.
 Chassis: '0'
[SNMP TRAP: 6055 in CPQHLTH.MIB]
"

Perhaps I can have someone in that branch take a look at the back of the server to see if there are any warning lights, etc.

The memtest needs to be done before Windows loads, yes?

Any idea how long I should run it?

Sorry for all the questions and thanks in advance.
Avatar of rindi
rindi
Flag of Switzerland image

I meant the other UBCD, but as memtest is on both it should be OK. The problem is that the windows version you have to build first using an XP CD or probably also a 2003 CD (never used that though). The standard UBCD is just an iso which you make a CD of. Both have to be booted from, so you need someone at the server's location to put it in the server and boot from it, and yes, you need to run memtest by booting from CD without windows running. 3 Passes should be fine (how long that takes is difficult to say as it depends on the CPU speed and size of the RAM).

You will need someone there anyway so he can clean out the dust, and look at the fans. If they cause errors like you just posted that could certainly be a reason for the crashes.
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

Hmm.  I see.
I will download the other UBCD as that sounds a lot easier to work with.
Looks like I'm going to have to coordinate with someone in that branch to assist me (this is going to be fun).
Well, thanks for your help.  I guess we're going to have to put this thread on hold for a while as I'll need to have the memory tested and all that jazz.
We'll be in touch!!
Thanks again.
SOLUTION
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of Dusty Thurman
Dusty Thurman
Flag of United States of America image

Most ntoskrnl.exe dumps I have seen come back to hardware. That is definitely not to imply that is the only thing that can cause this, just the most common. If the memory checks out, be sure to go to HP's website and find any bootable hardware diagnostics they offer. With a mix of fan errors, power supply, and BSOD, this could also be motherboard or chipset related.
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

OK well I'm sending the UBCD to the manager of that branch and gave him clear instructions on what to do in order to test the RAM.
He's going to let it run overnight and then send me a picture of the screen the next morning.
Guess we'll take it from there.
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

Wow.  After almost 2 months I finally got someone in that branch to run the memory test.
After running the test overnight, the test showed there were no errors.  That's kind of a bummer.
I was hoping that was the issue.  Now I'm not sure what the next step is.
Any ideas?
User generated image
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

I ran the newest version of the HP Insights Diagnostics Online Edition software and when you go to the diagnostics tab you can only run "Logical Drive 1, Storage Controller in Slot 0".
Power Supply 1 and 2 are greyed out and are "not diagnosable".

So I ran the diagnostics for the logical drive and get the following:

Hard drive 1:
Error: F155: The read/write hard error rate recorded in the monitor and performance log is above the acceptable threshold.

Hard drive 2:
OK

Hard drive 3:
Error: F155: The read/write hard error rate recorded in the monitor and performance log is above the acceptable threshold.

Do I take this to be the truth and replace the drives?  Or is this something likely to do with the agents reporting inaccurate information?

Thanks.
Avatar of Member_2_4984608
Member_2_4984608

I would advice pulling the drives one at a time and running the manufacturers diagnostics on them on another PC.
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

Hmmm.  I see.
The thing is they are 2.5" SAS drives.
I'd need to find a server that accepts these drives which is unlikely in my company.
Any other options?
What if we order one replacement drive, replace one of the bad drives and then run the diagnostics on the server again?   If the error doesn't show up, we can assume the drive was bad and if the new drive also shows up as bad, we can assume it's not the drive and it's a different problem altogether.
Avatar of homerslmpson
homerslmpson
Flag of United States of America image

ASKER

Well I'm at the point now where the HP tools are confusing me all too much.
The HP Insights Diagnostic show the errors listed 3 posts up but if I run the HP Array Diagnostics Utility (8.12.1.0) it shows no errors at all.
Do I need to replace these drives or not?
I'm showing one spare drive already in that server so I don't know if that's of any use.
Any help would be appreciated.
ASKER CERTIFIED SOLUTION
Avatar of David
David
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Windows Server 2003
Windows Server 2003

Windows Server 2003 was based on Windows XP and was released in four editions: Web, Standard, Enterprise and Datacenter. It also had derivative versions for clusters, storage and Microsoft’s Small Business Server. Important upgrades included integrating Internet Information Services (IIS), improvements to Active Directory (AD) and Group Policy (GP), and the migration to Automated System Recovery (ASR).

129K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo