Server with S500VSA motherboard gets blue screen after several hours of normal functioning

I use a custom built server with an S5000VSA Intel motherboard and 2 quad core X5460 processors for encoding several video channels. I have 5 such servers. On one of the I'm getting, after aprox 18 hours ( varies between 12 and 24 hours) a blue screen and the computer resets. I have windows 2003 R2 SP2 with all updates done. I installed Windows twice and it didn't change anything. I have 2 HDD Western Digital RE3 500 GB in a RAID 1 configuration and 4 GB Kigston RAM that is reccomended by Intel for this board. I checked with Memtest x86 the memory and it came clean. I chacked HDDs with Drive fitness test, and they came clean too. I used Intel diagnostics that came with the board and again It passed all tests. This is a brand new server. The server is located in a datacenter and the CPUs temperature while encoding  4 channels simultaneously is reaching around 60 degrees Celsius. The processors load is at around 60 percent.  

This is the minidump report and I also attached the minidump itself to the message. Change the extension from TXT in DMP to analyze it.
Here is the result of Windbg


FAULTING_MODULE: 80800000 nt


BUGCHECK_STR:  0x9C_GenuineIntel




LAST_CONTROL_TRANSFER:  from 80a64154 to 80827c83

WARNING: Stack unwind information not available. Following frames may be wrong.
f773d280 80a64154 0000009c 00000000 f773d2b0 nt+0x27c83
f773d3b4 80a5b86f f7737fe0 00000000 00000000 hal+0xa154
00000000 00000000 00000000 00000000 00000000 hal+0x186f


80827c83 5d              pop     ebp


SYMBOL_NAME:  nt+27c83

FOLLOWUP_NAME:  MachineOwner

IMAGE_NAME:  ntkrnlpa.exe


Followup: MachineOwner

Please help me. I don't know what could be the problem. It is annoying for the fact that it works well for several hours before crashing. It did this 3 times now. I'm thinking at temperature problems, but I think around 60 degrees Celsius for an X5460 at 3,16 GHZ should be OK. Please advice.

Thank you.
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

manav08Connect With a Mentor Commented:
1. Did you try a "chkdsk /f /r" test on it yet. If not firstly try that to make sure any corruption is fixed.
2. If above doesn't work try isolating the RAM CHIPS and performing the test with only one plugged in at a time. (Yes, I know you have already done a RAM test)
3. Try replacing the SATA/SAS data cable as this might be caused by Input/Output error and I have seen that sometimes it is caused by DATA CABLE or the Raid controlled card being faulty.

Try these 3 steps first and let me know how you go.
60 degrees for a data center environment should be OK
cosminpop77Author Commented:
I'm using the motherboard embeded raid controller. If tht's faulty I'll have to replace motherboard. I'll try to change Sata cables tomorrow.

I used to encode 4 channels on this encoder. The other identical server that I have only encodes 3 channels and it's ben working for a couple of days without problems. I changed the roles now and on the problematic server I left one channel. I'll have to wait aprox. 24 hours to see if anything happens. I suspected the encoding software as being a problem, but as I said there was no problem running 3 instances simultaneously, so I see no problem in running 4. The program is built to run multiple instances. I'll see tomorrow what results do I get, and I'll make a decision either about replacing some parts or decreasing the number of channels per encoder to 3. One more remark about this encoder. This was the first one I built from the entire series of 5. When I installed CPU heatsink, I didn't realize I forgot to install the support underneath the board for fixing the heatsink, so I had to take out the heatsink after it made contact with the CPU. I dind't have any other termal paste so I just put it back with whatever paste had on it from Intel. I'm wondering if that paste is displaced properly now, or there might be some areas of the CPU that are way hotter than reported. As I said, now I only encode one channel with this encoder and first core is at 42-45 Celsius and the last one is at 60.That's 15-18 degrees difference. That's for both processors. It's true that only the last core from each processor has activity. I have 7 fans in this server (including the CPU fans) and I think the temperature inside should be lower. In the datacenter there are about 21-23 degrees Celsius. What is the maximum acceptable temperature for the X5460?

Thanks .
If your other DATA CENTRE's with the same configuration etc. are running at 23 C then this one should as well, provided they are in the same environment. Unless ofcourse the processor is in heavy load.

Also refer to

It is the arrangement of FANs that really matter at the end of the day and not HOW MANY. If your CPU is heating up you will see the FAN run at a faster speed. Did you compare it with your other servers?
cosminpop77Author Commented:
The problem was overheating and posibly old drivers and firmware from Intel. When I installed the system I used the deployment CD from Intel and over there there was an option to download latest firmwares and drivers. It did that, but it didn't donwload the lastest ones. It downloaded some drivers and firmwares from 2007. I relied on that and I didn't check them in Windows. When I did checked it I realized what a mistake I've done, so I updated all firmwares BIOS and drivers to the latest version. I also replaced the CPU thermal paste with Arctic Silver. The temperature dropped approximately 10 degress Celsius in the same load conditions, so Intel should eat their thermal paste. Not only that is worse but it wasn't enough on the heatsink. It came with only 3 lines on paste and when I removed the heatsink from CPU I saw that only approx. 75 % of the CPU had paste on it. The rest of the CPU  was clean. No wonder the temperature reached close to 70. It was 60 on 2 cores and 70 on the other 2. Probably the are that had no paste on it was at 70. I retested the memory, but I got no errors and since I've done the above operations everything seems to work OK.
All Courses

From novice to tech pro — start learning today.