Solved

Windows Server 2003 BSOD

Posted on 2011-03-11
41
1,385 Views
Last Modified: 2012-05-11
Hardware is a Dell PowerEdge 2600
Perc 4e D1 SCSI Ultra 320 Drives Maxtor

Windows Server 2003 SP2

Stop Code = 0x000000F4 (0x00000003,0x08590C280,0x8590C3E4,0x8967C6C)

Any help would be great.
0
Comment
Question by:itpro365
  • 23
  • 15
  • 3
41 Comments
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35112534
Is the amber light on your system on (the one that is usually blue)?
Does it boot up and you just get this every so often?
Or does it not start and this is the error message you get?
Did you try Last Known Good Configuration and/or Safe Mode?
Do you have your OS CD's handy?
0
 

Author Comment

by:itpro365
ID: 35112562
Yes the amber light is flashing.
Yes it boots, but then eventually crashes.
No to LKGC
Yes to the CD
I have the minidump  Mini030811-01---Copy.txt
0
 

Author Comment

by:itpro365
ID: 35112566
Sorry wrong file.
 Mini030811-01.dmp
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35112609
Amber light = you have a hardware problem.  
Are their any other amber lights on the server - on the drives, power supplies, etc.?
As the system is going through its BIOS/POST screens, look at everything that scrolls on the screen.  What messages do you see?
0
 

Author Comment

by:itpro365
ID: 35112617
More Info
On Wed 3/9/2011 7:19:15 AM GMT your computer crashed
crash dump file: C:\Windows\Minidump\Mini030811-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x7C4A0)
Bugcheck code: 0xF4 (0x3, 0xFFFFFFFF85959D88, 0xFFFFFFFF85959EEC, 0xFFFFFFFF80967CEC)
Error: CRITICAL_OBJECT_TERMINATION
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that a process or thread crucial to system operation has unexpectedly exited or been terminated.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.
0
 
LVL 1

Expert Comment

by:pc-cyt
ID: 35112634
0
 

Author Comment

by:itpro365
ID: 35112662
THe whocrashed application provided that information in my last post.
0
 

Author Comment

by:itpro365
ID: 35112686
I pulled out all the drives and the power supply. Re-seated them and now the amber light is gone. I will see if we still get a dump.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35112690
The PE2600 is kind of stupid, cuz it doesn't tell you on the LCD panel that practically every other PE has what is wrong.  Best thing to do is to try and run diagnostics to see if that will tell you the faulty part:
http://support.dell.com/support/downloads/download.aspx?c=us&cs=04&l=en&s=bsd&releaseid=R206154&SystemID=PWE_FOS_XEO_2600&servicetag=&os=WNET&osl=en&deviceid=196&devlib=0&typecnt=0&vercnt=17&catid=-1&impid=-1&formatcnt=0&libid=13&typeid=-1&dateid=-1&formatid=-1&source=-1&fileid=288046

You could also try booting to an OMSA Live CD ... if you can, you can check the Hardware Log for the exact error:
http://linux.dell.com/files/openmanage-contributions/omsa-54-live/omsa-54-040308.iso
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35112704
If you get back into Windows, check the Hardware Logs in OMSA.  If you don't already have it installed, here it is:
ftp://ftp.dell.com/sysman/OM_5.5.0_ManNode_A00.exe

Download and run to extract, then run C:\Openmanage\windows\setup.exe
0
 

Author Comment

by:itpro365
ID: 35112810
Well amber light is now gone, but BSOD is still happening. I will try the Open Manage.
0
 
LVL 1

Expert Comment

by:pc-cyt
ID: 35112885
Here are some things I'd try:

Test the RAM

http://www.memtest.org/
http://www.ultimatebootcd.com/

Boot a linux boot disk and toture the system parts such as RAM/CPU/DISK?

http://fedoraproject.org/en/get-fedora
0
 

Author Comment

by:itpro365
ID: 35112991
So here are the results of the first test.


photo.JPG
0
 

Author Comment

by:itpro365
ID: 35113126
Well I cleared the log files as instructed. Rebooted. Replaced the memory with RAM from an identical box.  Ran the test again and got the same exact results.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113153
Did you clear the Hardware logs (OMSA, System, Logs, Clear)?  After clearing them, it should not have shown the error from 2007 ... are you saying that it failed DIMM_1A on the actual test (not the Event Log Scan)?
0
 

Author Comment

by:itpro365
ID: 35113222
I cleared all the log files in event viewer - application, hardware, system, etc. I just re-read the error, it is failing pre-test only at this point. But I dont see any other logs to clear. I do not have Open Manage installed.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113298
The Hardware Log has nothing to do with Windows Event Logs ... it is kept by the ESM/BMC for hardware-related errors and warnings.  You can view these logs and clear the logs in OMSA (see post above for link and installation instructions).

Or ...

You can clear it with DSET - Run and Clear option:
ftp://ftp.dell.com/sysman/Dell_DSET_1.6.0.131_A01.msi

0
 

Author Comment

by:itpro365
ID: 35113311
Well I tried to install Open Manage and during the setup I received multiple errors stating there was a delayed write... And then BSOD
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113330
Then try DSET - you can run that option without actually installing ... that should reduce the effort of the PC to run it.  If that doesn't work, then you might consider the OMSA Live CD (linux-based) to run OMSA to view the logs (find out what the errors are) and clear the logs (so you can get a clean test if the hardware log is not enough).
0
 

Author Comment

by:itpro365
ID: 35113358
Ok - I will first try DSET and then I will try the OMSA Live CD. Just a note. The drives seem to just go quiet for long periods of time, just before the BSOD. Also, they seem to take forever to spin up during POST. POST takes approximately 9 mins.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113376
Very likely a failed drive - if you're lucky - or a controller/motherboard problem.  If it is a failed drive, could be causing problems (if it hasn't already).  Any amber lights on your drives?
0
 

Author Comment

by:itpro365
ID: 35113394
Ok RAM DSET Run and Clear Option. The ESM file that was created only has the following in it:
Embedded System Management (ESM) Log

Health : Ok

Embedded System Management Log contains...

Severity      : Ok
Date and Time : Fri Mar 11 15:23:35 2011
Description   : Log cleared
Embedded System Management (ESM) Log

Health : Ok

Embedded System Management Log contains...

Severity      : Ok
Date and Time : Wed Jul 26 06:31:40 2006
Description   : Log cleared

Severity      : Critical
Date and Time : Thu Jul 27 05:55:52 2006
Description   : Bezel Intrusion sensor detected an intrusion

Severity      : Ok
Date and Time : Thu Jul 27 05:56:12 2006
Description   : Bezel Intrusion sensor return to normal
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113422
Did it create a DSET Report (DSETsomethingsomethingservicetag.zip) on your Desktop?  If so, try to attach it here so we can see the hardware logs.
0
 

Author Comment

by:itpro365
ID: 35113463
I ran the 3rd option that just clears the log. I can run the first option to create the report, but wont it be useless now that I cleard the logs?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113470
Right.  Best now to run diagnostics (now that the log contains no errors).
0
 

Author Comment

by:itpro365
ID: 35113518
Here is the DSET Report
0
 

Author Comment

by:itpro365
ID: 35113529
File is blocked because of OSX and HTA
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35113561
You can send it to poweredgetech@gmail.com, if you want (that is, if gmail takes password-protected ZIP's).

Another option ... call Dell Tech Support.  They have a "dropbox" you can upload it to so they can review it.  Support is always free, whether in or out of warranty.
0
 

Author Comment

by:itpro365
ID: 35113565
I went ahead and ran the tool and didfind this:
Degraded
0
 

Author Comment

by:itpro365
ID: 35113597
THanks PowerEdgeTech - I just sent you the zip without a password.
0
 

Author Comment

by:itpro365
ID: 35113650
0
 

Author Comment

by:itpro365
ID: 35113692
Or if that link doesnt work, then you can use this one:
http://dl.dropbox.com/u/18479552/DSETReport.zip
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35114035
Well...

The Windows Event Log logged some RAID controller events, but since OpenManage was not installed to tell Windows what it meant, it doesn't contain much information.

Since OMSA was not installed, DSET could not pull the RAID controller log to take a look at the array.

Power Supply 2 is probably unplugged?  If not, it could be bad.

The RAID battery should be addressed but it's probably not responsible.  You can clear the "Recharge Count" in the controller BIOS (CTRL-M, Objects) ... resetting it will put it back under the "error" threshold and your system will stop telling you about it until it reaches that 1100 count threshold again.  At least that way, you can confirm or refute the idea that the amber light is on only because of the battery.

Check the BIOS, under Integrated Devices and make sure the connector going to the tape drive is set to SCSI rather than RAID (sometimes this is not reported correctly in the DSET, particularly without OMSA installed, but a tape device should not be connected to a RAID controller.

Other than that, based on what we have so far, nothing else really jumped out at me.

I would run the diagnostics to see what comes up.  You can run another DSET should it happen again so we can see the Hardware entry for it.
0
 

Author Comment

by:itpro365
ID: 35114687
So is it possible the degraded state of the raid is causing the bald?
0
 

Author Comment

by:itpro365
ID: 35114710
Sorry not bald but bsod.
0
 
LVL 1

Expert Comment

by:pc-cyt
ID: 35114950
If you are running out of options, why not try the 3rd party memory test?  You say you changed the memory 'sticks/chips', but you still have the error right?
The RAM chips themselves might not be bad, but the RAM i/o controller or some other part could be.   If the memtest86 tests runs for a few hours without any problem then you will have learnt something...  at least one part of the system is 'good'.   If you find any errors, test each RAM 'Stick' one at a time in the first slot.

Why not take it a step further and label up all drives, connectors etc and REMOVE any hardware that is not required to perform the memory tests.  ie unplug the RAID controller from the main board (if it is a seperate device) or disconnect the drives at a minimum.  Please be careful and take lots of photos of the system and label everything!  If you're not confident doing this, please call someone that can help.   - Do you have a backup of this server??

I would also advise to make a physical check of the boards inside the machine.. can you see any bad capactors for example?  or siezed cooling fans? Take a look here for an example:

http://www.mikerepairscomputers.com/blog/wp-content/uploads/2009/07/Bad_Capacitor_01-200x150.jpg

Apologies if this seems all a bit basic,
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35116305
It is only the RAID battery that is "degraded" - and only because it has exceeded the pre-defined number of recharges.  Even if the battery charge is too low or the battery dies altogether, the controller simply turns off the cache that the battery is for.  So, in a word: no, I don't believe it is causing the blue screen.

I would start with diagnostics from here to find out what the failed hardware is (if any).  As pc-cyt suggests, memory is a good place to start, and now that we have cleared the log we actually can run a clean test.  I would also test the rest of the machine - not just memory.

Remember the amber light means there is a hardware error, but the following things that you have will turn on the amber light:
- Chassis being open and/or the front bezel is removed.
- Second power supply is not plugged in (but still in the machine).
- RAID battery is "degraded".

So, your "hardware" error may simply be one of the above ... easily fixed.  If that is all the light is on for and your system (including memory) passes diagnostics, then we're looking at a deeper OS issue.
0
 

Author Comment

by:itpro365
ID: 35150424
Ok,
sorry for the delay.
The memory test ran perfectly fine.
I cleared the battery count.
I was able to actually run several OS updates and driver updates last night. It looked like I was in the clear, but this morning the system is down again.
Now the drives dont seem to want to spin up, even though all lights are green on the drives. I cant even get into the RAID controller by using CTRL M
0
 
LVL 32

Accepted Solution

by:
PowerEdgeTech earned 500 total points
ID: 35150948
Is it "hanging" at the CTRL-M waiting for the drives to spin up?  

With the system powered off and unplugged, try reseating the backplane cables (power and data) and drives:
http://support.dell.com/support/edocs/systems/pe2600/en/sm/remove.htm#wp1039152

You might also try clearing the NVRAM using the jumpers on the motherboard.  After doing this, you will need to go to the BIOS Setup (F2) to re-enable RAID (go to BIOS first to see which channels are set to RAID and which to SCSI.  You can ignore the scary messages about data loss by switching from SCSI to RAID.
http://support.dell.com/support/edocs/systems/pe2600/en/sm/jumpers.htm#1045573

If you have a backplane-splitting daughterboard, reseat it as well:
http://support.dell.com/support/edocs/systems/pe2600/en/sm/remove.htm#1101823

At this point, with the system down, I would advise you to call Dell Technical Support ... it is much easier to work through these types of things with a real person on the phone.  (800) 822-8965.  Support is always free whether in or out of warranty.
0
 

Author Comment

by:itpro365
ID: 35151195
Thanks PowerEdge.
I took the drives out and placed them into a spare 2600 and I have been up and running for 2 hours now without any issues.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 35151261
That works too :)  If it was hardware-related, then it shouldn't BSOD again.  If it is the OS, I would assume it would.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

I have never ceased to be amazed how many problems you can encounter on a fresh install of a Windows operating system.  This is certainly case in point& Unable to complete ANY MSI installation.  This means Windows Updates are failing and I can't …
The HP utility "HP Lights-Out Online Configuration Utility for Windows Server 2003/2008" could be of great use when it comes to remotely configure a HP servers ILO WITHOUT rebooting the server. We would only need to create and run scripts using thi…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now