Link to home
Start Free TrialLog in
Avatar of Marcus N
Marcus NFlag for United Kingdom of Great Britain and Northern Ireland

asked on

HP, Proliant ML150 G3, Server with MS SBS 2003 R2 Premium, Core Memory Dump Blue Screen

I get irregular, yet frequent, system crash blue screen core memory dumps on my HP Proliant ML150G3 running MS SBS2003 R2 Premium with SQL and ISA Server installed. Everything worked fine until about 10 days ago.

I get a memory.dmp file and have downloaded the latest analysis and debugging tools from the MS website but I don't know where to start or what to do.

Other than MS updates and patches, there have been no system reconfigurations of any sort, no new hardware or software, nothing. My usually very reliable server is now rubbish!
Avatar of Keith Alabaster
Keith Alabaster
Flag of United Kingdom of Great Britain and Northern Ireland image

Are you also running sp2?
Can you tell us the error code you get on the Blue Screen?

I'd also test the RAM with memtest96+, make sure all the fans are running, and test the disks using the manufacturer's utility. The tools are on the UBCD.

http://ultimatebootcd.com
Avatar of Marcus N

ASKER

Keith asked, "am I running SP2?"
Yes

Rindi asked, "what is the blue screen error code?"
How do I find it? Is it logged somewhere?

Rindi asked whether I've tested the RAM and disks.
I've not checked these but the message in the event log is as follows.

Event Type:      Error
Event Source:      System Error
Event Category:      (102)
Event ID:      1003
Date:            22/11/2007
Time:            19:58:26
User:            N/A
Computer:      SATURN
Description:
Error code 0000000a, parameter1 00004074, parameter2 d000001b, parameter3 00000001, parameter4 808312bd.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 53 79 73 74 65 6d 20 45   System E
0008: 72 72 6f 72 20 20 45 72   rror  Er
0010: 72 6f 72 20 63 6f 64 65   ror code
0018: 20 30 30 30 30 30 30 30    0000000
0020: 61 20 20 50 61 72 61 6d   a  Param
0028: 65 74 65 72 73 20 30 30   eters 00
0030: 30 30 34 30 37 34 2c 20   004074,
0038: 64 30 30 30 30 30 31 62   d000001b
0040: 2c 20 30 30 30 30 30 30   , 000000
0048: 30 31 2c 20 38 30 38 33   01, 8083
0050: 31 32 62 64               12bd  
ASKER CERTIFIED SOLUTION
Avatar of Keith Alabaster
Keith Alabaster
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Keith mentioned a problem stemming from TCP receive side scaling.

Humm, this is interesting on two counts.
a) The article males no mention of core memory dump system crashes, so I don't see how it is really relevant to my problem.
b) If it is relevant, and I confess to not understanding it, could it have anything to do with my Intel PRO 1000/MT Dual Port NIC (the firmware of which I recently updated to version 8.9.1.0) or Adaptec AAR-2420SA 4 channel SATA RAID controller (of which I recently updated the driver to version 5.2.0.11737)?

Is there anything that the file C:\WINDOWS\Memory.dmp file can help to identify?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The error code is shown on the Bluescreen itself, the first number usually.
Keith queried my apparently contradicting statement about no changes.

Yes, the two changes I mention pre-date the start of the system instability by several weeks. They are the only two changes in the past year that I have consciously made that affect either the NIC or the RAID controller. That's why I mentioned them.

Rindi asked about the numbers on the Blue Screen.

Looking at the event log they would appear to be:
0000000a, parameter1 00004074, parameter2 d000001b, parameter3 00000001, parameter4 808312bd.

Interestingly, the eventlog for the crash that preceeded the one I provided data for in an earler post is as follows. It looks a little different.

Event Type:      Error
Event Source:      System Error
Event Category:      (102)
Event ID:      1003
Date:            21/11/2007
Time:            08:42:55
User:            N/A
Computer:      SATURN
Description:
Error code 00000019, parameter1 00000020, parameter2 88119ed0, parameter3 88119f40, parameter4 0a0e0005.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 53 79 73 74 65 6d 20 45   System E
0008: 72 72 6f 72 20 20 45 72   rror  Er
0010: 72 6f 72 20 63 6f 64 65   ror code
0018: 20 30 30 30 30 30 30 31    0000001
0020: 39 20 20 50 61 72 61 6d   9  Param
0028: 65 74 65 72 73 20 30 30   eters 00
0030: 30 30 30 30 32 30 2c 20   000020,
0038: 38 38 31 31 39 65 64 30   88119ed0
0040: 2c 20 38 38 31 31 39 66   , 88119f
0048: 34 30 2c 20 30 61 30 65   40, 0a0e
0050: 30 30 30 35               0005  


SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Rindi recommends a barebones re-configuration.

Humm, I would like to narrow down this activity firstly my analysing the minidump files that I have (lots of now) and by analysing the current and next MEMORY.DMP file. What I need to know is how to do that.

As I said, I have downloaded the tools from the MS website but don't know what to do with them (and I have tried a fair bit of surfing to seek ideas). I want to narrow down the drivers which relate to the devices which are failing and then to remove those to test rather than remove a stack of things un-necessarily.

Are there any easy to understand references which help me to work with windbg.exe and the other debugging tools?