RyanIrish
asked on
SBS2011 crashing/rebooting with Event 41 Kernel-Power error
Hi,
In the past 6 days my SBS2011 server has crashed 5 times with the following error:
The BugCheckCode is the only piece that seems to change with each crash, as I have events showing 80, 10, 59, and the 209 shown above.
I've tried some of the simple stuff...making sure the system is relatively dust free and not overheating, I've bypassed the UPS and powered straight into the wall, and verified that no new updates were installed.
No hardware changes have been made, nor has any new software been installed recently.
There is a redundant power supply on the server and I've forced the server to run solely on each psu for a short period with no issues.
A few google searches have suggested updating drivers, etc. but that doesn't seem to fit my scenario as nothing has changed on the hard/soft side.
The %SystemRoot%\MEMORY.DMP is empty...that is where I would expect to find the dump file from a crash, no?
My only guess is something power related, based on the missing dump files, but I've still had a crash after changing how the server is powered...so I'm at a loss.
Does anyone have any ideas based on the info I've been able to provide?
It's always fun when the only DC/file/exchange server starts flaking out...
In the past 6 days my SBS2011 server has crashed 5 times with the following error:
- System
- Provider
[ Name] Microsoft-Windows-Kernel-Power
[ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B 4}
EventID 41
Version 2
Level 1
Task 63
Opcode 0
Keywords 0x8000000000000002
- TimeCreated
[ SystemTime] 2020-03-05T12:34:15.827218100Z
EventRecordID 49526557
Correlation
- Execution
[ ProcessID] 4
[ ThreadID] 8
Channel System
Computer SBS2011.tg3.local
- Security
[ UserID] S-1-5-18
- EventData
BugcheckCode 209
BugcheckParameter1 0x0
BugcheckParameter2 0x2
BugcheckParameter3 0x0
BugcheckParameter4 0xfffff880063e9006
SleepInProgress false
PowerButtonTimestamp 0
The BugCheckCode is the only piece that seems to change with each crash, as I have events showing 80, 10, 59, and the 209 shown above.
I've tried some of the simple stuff...making sure the system is relatively dust free and not overheating, I've bypassed the UPS and powered straight into the wall, and verified that no new updates were installed.
No hardware changes have been made, nor has any new software been installed recently.
There is a redundant power supply on the server and I've forced the server to run solely on each psu for a short period with no issues.
A few google searches have suggested updating drivers, etc. but that doesn't seem to fit my scenario as nothing has changed on the hard/soft side.
The %SystemRoot%\MEMORY.DMP is empty...that is where I would expect to find the dump file from a crash, no?
My only guess is something power related, based on the missing dump files, but I've still had a crash after changing how the server is powered...so I'm at a loss.
Does anyone have any ideas based on the info I've been able to provide?
It's always fun when the only DC/file/exchange server starts flaking out...
ASKER
Hi Andy,
Thank you for the link. I'm able to now put names to the bug check codes, but that's about where I'm now stuck.
If I had to guess, and that's all I'm able to do, I would think I have a memory issue as a few of the parameters seem to suggest read errors...but I could be completely wrong.
I ran the native Windows memory test last night and it reported no errors...but I'm not sure I can trust that test. Should I try a different test? I don't have too much faith in mem testers as I've had ram sticks that were bad pass mem tests before.
I put together a list with each crash and the corresponding crash info.
Does the additional info help at all?
I guess the initial suggestion of a driver issue is a good start, but unfortunately doesn't really get me that far as I still have no idea what driver could be giving me issues. All of these error codes still seem extremely vague to me.
Thank you for the link. I'm able to now put names to the bug check codes, but that's about where I'm now stuck.
If I had to guess, and that's all I'm able to do, I would think I have a memory issue as a few of the parameters seem to suggest read errors...but I could be completely wrong.
I ran the native Windows memory test last night and it reported no errors...but I'm not sure I can trust that test. Should I try a different test? I don't have too much faith in mem testers as I've had ram sticks that were bad pass mem tests before.
I put together a list with each crash and the corresponding crash info.
Does the additional info help at all?
I guess the initial suggestion of a driver issue is a good start, but unfortunately doesn't really get me that far as I still have no idea what driver could be giving me issues. All of these error codes still seem extremely vague to me.
What's the hardware platform? Most major manufacturers keep a log of hardware faults like memory corrections on the motherboard.
ASKER
It's a Dell PowerEdge r510.
OMSA or iDRAC will both get to the SEL on a Dell, any RAM faults would get logged.
https://www.dell.com/support/article/en-hk/sln292270/poweredge-server-error-messages-in-system-event-log-and-how-they-can-be-viewed?lang=en
https://www.dell.com/support/article/en-hk/sln292270/poweredge-server-error-messages-in-system-event-log-and-how-they-can-be-viewed?lang=en
ASKER
I'll check it out, thank you again.
ASKER
Well I was hoping to see more in the logs...is this all I'm going to get from OpenManage?
This is from the hardware section of the logs:
The alert section is no better, it shows no signs of a crash, just startup info.
I can't shut the server down again in order to look for additional logs on startup, as suggested in the link above, so for now, this is the best I can do for logs.
If you were me, and this was the only server keeping the business running and you were no closer to a fix than when you started a week ago, what might you do? I can't quit my job, which would be the easiest fix...
This is from the hardware section of the logs:
The alert section is no better, it shows no signs of a crash, just startup info.
I can't shut the server down again in order to look for additional logs on startup, as suggested in the link above, so for now, this is the best I can do for logs.
If you were me, and this was the only server keeping the business running and you were no closer to a fix than when you started a week ago, what might you do? I can't quit my job, which would be the easiest fix...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Alright...one vote for toss it in the dumpster.
Any second opinions out there?
Any second opinions out there?
The other 3 are also driver related normally. Convert them to hex and look up at https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-code-reference2