[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1441
  • Last Modified:

Windows Server 2003 BSOD

Hardware is a Dell PowerEdge 2600
Perc 4e D1 SCSI Ultra 320 Drives Maxtor

Windows Server 2003 SP2

Stop Code = 0x000000F4 (0x00000003,0x08590C280,0x8590C3E4,0x8967C6C)

Any help would be great.
0
itpro365
Asked:
itpro365
  • 23
  • 15
  • 3
1 Solution
 
PowerEdgeTechIT ConsultantCommented:
Is the amber light on your system on (the one that is usually blue)?
Does it boot up and you just get this every so often?
Or does it not start and this is the error message you get?
Did you try Last Known Good Configuration and/or Safe Mode?
Do you have your OS CD's handy?
0
 
itpro365Author Commented:
Yes the amber light is flashing.
Yes it boots, but then eventually crashes.
No to LKGC
Yes to the CD
I have the minidump  Mini030811-01---Copy.txt
0
 
itpro365Author Commented:
Sorry wrong file.
 Mini030811-01.dmp
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 
PowerEdgeTechIT ConsultantCommented:
Amber light = you have a hardware problem.  
Are their any other amber lights on the server - on the drives, power supplies, etc.?
As the system is going through its BIOS/POST screens, look at everything that scrolls on the screen.  What messages do you see?
0
 
itpro365Author Commented:
More Info
On Wed 3/9/2011 7:19:15 AM GMT your computer crashed
crash dump file: C:\Windows\Minidump\Mini030811-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x7C4A0)
Bugcheck code: 0xF4 (0x3, 0xFFFFFFFF85959D88, 0xFFFFFFFF85959EEC, 0xFFFFFFFF80967CEC)
Error: CRITICAL_OBJECT_TERMINATION
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that a process or thread crucial to system operation has unexpectedly exited or been terminated.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.
0
 
pc-cytCommented:
0
 
itpro365Author Commented:
THe whocrashed application provided that information in my last post.
0
 
itpro365Author Commented:
I pulled out all the drives and the power supply. Re-seated them and now the amber light is gone. I will see if we still get a dump.
0
 
PowerEdgeTechIT ConsultantCommented:
The PE2600 is kind of stupid, cuz it doesn't tell you on the LCD panel that practically every other PE has what is wrong.  Best thing to do is to try and run diagnostics to see if that will tell you the faulty part:
http://support.dell.com/support/downloads/download.aspx?c=us&cs=04&l=en&s=bsd&releaseid=R206154&SystemID=PWE_FOS_XEO_2600&servicetag=&os=WNET&osl=en&deviceid=196&devlib=0&typecnt=0&vercnt=17&catid=-1&impid=-1&formatcnt=0&libid=13&typeid=-1&dateid=-1&formatid=-1&source=-1&fileid=288046

You could also try booting to an OMSA Live CD ... if you can, you can check the Hardware Log for the exact error:
http://linux.dell.com/files/openmanage-contributions/omsa-54-live/omsa-54-040308.iso
0
 
PowerEdgeTechIT ConsultantCommented:
If you get back into Windows, check the Hardware Logs in OMSA.  If you don't already have it installed, here it is:
ftp://ftp.dell.com/sysman/OM_5.5.0_ManNode_A00.exe

Download and run to extract, then run C:\Openmanage\windows\setup.exe
0
 
itpro365Author Commented:
Well amber light is now gone, but BSOD is still happening. I will try the Open Manage.
0
 
pc-cytCommented:
Here are some things I'd try:

Test the RAM

http://www.memtest.org/
http://www.ultimatebootcd.com/

Boot a linux boot disk and toture the system parts such as RAM/CPU/DISK?

http://fedoraproject.org/en/get-fedora
0
 
itpro365Author Commented:
So here are the results of the first test.


photo.JPG
0
 
itpro365Author Commented:
Well I cleared the log files as instructed. Rebooted. Replaced the memory with RAM from an identical box.  Ran the test again and got the same exact results.
0
 
PowerEdgeTechIT ConsultantCommented:
Did you clear the Hardware logs (OMSA, System, Logs, Clear)?  After clearing them, it should not have shown the error from 2007 ... are you saying that it failed DIMM_1A on the actual test (not the Event Log Scan)?
0
 
itpro365Author Commented:
I cleared all the log files in event viewer - application, hardware, system, etc. I just re-read the error, it is failing pre-test only at this point. But I dont see any other logs to clear. I do not have Open Manage installed.
0
 
PowerEdgeTechIT ConsultantCommented:
The Hardware Log has nothing to do with Windows Event Logs ... it is kept by the ESM/BMC for hardware-related errors and warnings.  You can view these logs and clear the logs in OMSA (see post above for link and installation instructions).

Or ...

You can clear it with DSET - Run and Clear option:
ftp://ftp.dell.com/sysman/Dell_DSET_1.6.0.131_A01.msi

0
 
itpro365Author Commented:
Well I tried to install Open Manage and during the setup I received multiple errors stating there was a delayed write... And then BSOD
0
 
PowerEdgeTechIT ConsultantCommented:
Then try DSET - you can run that option without actually installing ... that should reduce the effort of the PC to run it.  If that doesn't work, then you might consider the OMSA Live CD (linux-based) to run OMSA to view the logs (find out what the errors are) and clear the logs (so you can get a clean test if the hardware log is not enough).
0
 
itpro365Author Commented:
Ok - I will first try DSET and then I will try the OMSA Live CD. Just a note. The drives seem to just go quiet for long periods of time, just before the BSOD. Also, they seem to take forever to spin up during POST. POST takes approximately 9 mins.
0
 
PowerEdgeTechIT ConsultantCommented:
Very likely a failed drive - if you're lucky - or a controller/motherboard problem.  If it is a failed drive, could be causing problems (if it hasn't already).  Any amber lights on your drives?
0
 
itpro365Author Commented:
Ok RAM DSET Run and Clear Option. The ESM file that was created only has the following in it:
Embedded System Management (ESM) Log

Health : Ok

Embedded System Management Log contains...

Severity      : Ok
Date and Time : Fri Mar 11 15:23:35 2011
Description   : Log cleared
Embedded System Management (ESM) Log

Health : Ok

Embedded System Management Log contains...

Severity      : Ok
Date and Time : Wed Jul 26 06:31:40 2006
Description   : Log cleared

Severity      : Critical
Date and Time : Thu Jul 27 05:55:52 2006
Description   : Bezel Intrusion sensor detected an intrusion

Severity      : Ok
Date and Time : Thu Jul 27 05:56:12 2006
Description   : Bezel Intrusion sensor return to normal
0
 
PowerEdgeTechIT ConsultantCommented:
Did it create a DSET Report (DSETsomethingsomethingservicetag.zip) on your Desktop?  If so, try to attach it here so we can see the hardware logs.
0
 
itpro365Author Commented:
I ran the 3rd option that just clears the log. I can run the first option to create the report, but wont it be useless now that I cleard the logs?
0
 
PowerEdgeTechIT ConsultantCommented:
Right.  Best now to run diagnostics (now that the log contains no errors).
0
 
itpro365Author Commented:
Here is the DSET Report
0
 
itpro365Author Commented:
File is blocked because of OSX and HTA
0
 
PowerEdgeTechIT ConsultantCommented:
You can send it to poweredgetech@gmail.com, if you want (that is, if gmail takes password-protected ZIP's).

Another option ... call Dell Tech Support.  They have a "dropbox" you can upload it to so they can review it.  Support is always free, whether in or out of warranty.
0
 
itpro365Author Commented:
I went ahead and ran the tool and didfind this:
Degraded
0
 
itpro365Author Commented:
THanks PowerEdgeTech - I just sent you the zip without a password.
0
 
itpro365Author Commented:
0
 
itpro365Author Commented:
Or if that link doesnt work, then you can use this one:
http://dl.dropbox.com/u/18479552/DSETReport.zip
0
 
PowerEdgeTechIT ConsultantCommented:
Well...

The Windows Event Log logged some RAID controller events, but since OpenManage was not installed to tell Windows what it meant, it doesn't contain much information.

Since OMSA was not installed, DSET could not pull the RAID controller log to take a look at the array.

Power Supply 2 is probably unplugged?  If not, it could be bad.

The RAID battery should be addressed but it's probably not responsible.  You can clear the "Recharge Count" in the controller BIOS (CTRL-M, Objects) ... resetting it will put it back under the "error" threshold and your system will stop telling you about it until it reaches that 1100 count threshold again.  At least that way, you can confirm or refute the idea that the amber light is on only because of the battery.

Check the BIOS, under Integrated Devices and make sure the connector going to the tape drive is set to SCSI rather than RAID (sometimes this is not reported correctly in the DSET, particularly without OMSA installed, but a tape device should not be connected to a RAID controller.

Other than that, based on what we have so far, nothing else really jumped out at me.

I would run the diagnostics to see what comes up.  You can run another DSET should it happen again so we can see the Hardware entry for it.
0
 
itpro365Author Commented:
So is it possible the degraded state of the raid is causing the bald?
0
 
itpro365Author Commented:
Sorry not bald but bsod.
0
 
pc-cytCommented:
If you are running out of options, why not try the 3rd party memory test?  You say you changed the memory 'sticks/chips', but you still have the error right?
The RAM chips themselves might not be bad, but the RAM i/o controller or some other part could be.   If the memtest86 tests runs for a few hours without any problem then you will have learnt something...  at least one part of the system is 'good'.   If you find any errors, test each RAM 'Stick' one at a time in the first slot.

Why not take it a step further and label up all drives, connectors etc and REMOVE any hardware that is not required to perform the memory tests.  ie unplug the RAID controller from the main board (if it is a seperate device) or disconnect the drives at a minimum.  Please be careful and take lots of photos of the system and label everything!  If you're not confident doing this, please call someone that can help.   - Do you have a backup of this server??

I would also advise to make a physical check of the boards inside the machine.. can you see any bad capactors for example?  or siezed cooling fans? Take a look here for an example:

http://www.mikerepairscomputers.com/blog/wp-content/uploads/2009/07/Bad_Capacitor_01-200x150.jpg

Apologies if this seems all a bit basic,
0
 
PowerEdgeTechIT ConsultantCommented:
It is only the RAID battery that is "degraded" - and only because it has exceeded the pre-defined number of recharges.  Even if the battery charge is too low or the battery dies altogether, the controller simply turns off the cache that the battery is for.  So, in a word: no, I don't believe it is causing the blue screen.

I would start with diagnostics from here to find out what the failed hardware is (if any).  As pc-cyt suggests, memory is a good place to start, and now that we have cleared the log we actually can run a clean test.  I would also test the rest of the machine - not just memory.

Remember the amber light means there is a hardware error, but the following things that you have will turn on the amber light:
- Chassis being open and/or the front bezel is removed.
- Second power supply is not plugged in (but still in the machine).
- RAID battery is "degraded".

So, your "hardware" error may simply be one of the above ... easily fixed.  If that is all the light is on for and your system (including memory) passes diagnostics, then we're looking at a deeper OS issue.
0
 
itpro365Author Commented:
Ok,
sorry for the delay.
The memory test ran perfectly fine.
I cleared the battery count.
I was able to actually run several OS updates and driver updates last night. It looked like I was in the clear, but this morning the system is down again.
Now the drives dont seem to want to spin up, even though all lights are green on the drives. I cant even get into the RAID controller by using CTRL M
0
 
PowerEdgeTechIT ConsultantCommented:
Is it "hanging" at the CTRL-M waiting for the drives to spin up?  

With the system powered off and unplugged, try reseating the backplane cables (power and data) and drives:
http://support.dell.com/support/edocs/systems/pe2600/en/sm/remove.htm#wp1039152

You might also try clearing the NVRAM using the jumpers on the motherboard.  After doing this, you will need to go to the BIOS Setup (F2) to re-enable RAID (go to BIOS first to see which channels are set to RAID and which to SCSI.  You can ignore the scary messages about data loss by switching from SCSI to RAID.
http://support.dell.com/support/edocs/systems/pe2600/en/sm/jumpers.htm#1045573

If you have a backplane-splitting daughterboard, reseat it as well:
http://support.dell.com/support/edocs/systems/pe2600/en/sm/remove.htm#1101823

At this point, with the system down, I would advise you to call Dell Technical Support ... it is much easier to work through these types of things with a real person on the phone.  (800) 822-8965.  Support is always free whether in or out of warranty.
0
 
itpro365Author Commented:
Thanks PowerEdge.
I took the drives out and placed them into a spare 2600 and I have been up and running for 2 hours now without any issues.
0
 
PowerEdgeTechIT ConsultantCommented:
That works too :)  If it was hardware-related, then it shouldn't BSOD again.  If it is the OS, I would assume it would.
0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

  • 23
  • 15
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now