PowerEdge R710 BSOD

I have a R710 server with all updated firmware running 2012 R2 and I get periodic unexpected system crashes BSOD's.
I would like to get some help diagnosing this issue. not sure what could be causing this. I'm pasting info from event viewer and OSMA below for clues:

Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          4/29/2015 9:50:53 PM
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (2)
User:          SYSTEM
Computer:      server.GDS.lan
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
    <EventID>41</EventID>
    <Version>3</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000002</Keywords>
    <TimeCreated SystemTime="2015-04-30T04:50:53.256791400Z" />
    <EventRecordID>79845</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>server.GSV.lan</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">313</Data>
    <Data Name="BugcheckParameter1">0x3</Data>
    <Data Name="BugcheckParameter2">0xffffd00023d4a2f0</Data>
    <Data Name="BugcheckParameter3">0xffffd00023d4a248</Data>
    <Data Name="BugcheckParameter4">0x0</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
  </EventData>
</Event>

Error      4/29/2015 9:51:13 PM      BugCheck      1001      None
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000139 (0x0000000000000003, 0xffffd00023d4a2f0, 0xffffd00023d4a248, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 042915-20937-01.

OMSA errors:

A runtime critical stop occurred. (frequent)
CPU 2 has an internal error (IERR). (frequent)

whocrashed report attached as well.
WhoCrashedOutput.htm
Anthony H.Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

arnoldCommented:
According to the OMSA error, you have a faulty CPU 2.

If it is still under warranty, contact dell to get a replacement.

Since it is infrequent, you have to achieve the conditions under which this occurs.
Changing the advanced settings to dump the complete memory. may ....

But I think contacting Dell and getting a replacement is the quickest resolution. All the tests you would run would likely lead to the same inevitable conclusion, CPU.
0
Anthony H.Author Commented:
unfortunately out of warranty period.

I don't need two cpu's, is it possible to disable CPU 2?
0
arnoldCommented:
You have to get a CPU blank, WK640 Dell Part number
Note you will lose access to all memory in bank B when CPU2 is removed.
http://en.community.dell.com/support-forums/servers/f/956/t/19411630
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Anthony H.Author Commented:
Do I also need a CPU blank?
0
arnoldCommented:
Saw that heat sink blank, but I think you need the cou blank.

The link fromthe forum includes Dell part number, check with them.
You can also find on eBay CPU pairs that ....
0
Mr TorturSystem EngineerCommented:
Hi,
I think you can have a look at your bios settings because there is a lot of server where you can disable a cpu.
0
andyalderCommented:
Removing the CPU and putting the heatsink back should preserve airflow. Don't forget to move all the RAM over to CPU1 as it can't be accessed in CPU2 sockets.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2012

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.