• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1001
  • Last Modified:

System randomly hangs and becomes unresponsive

Greetings EE!

I have 2 identical boxes that we built to initially use as a sandbox, but we recently moved one into production use (about 2 months ago) and users were complaining that the server was constantly offline.  Because I had two identicle boxes I was fortunate to have one to test solutions on before implementing those solutions onto the production machine.

What is happening is the system would randomly hang.  Not crash or BSOD, just lock up completely.

This is the build:

Hardware Specs:
	BIOS Updated on 1 from 3206 to 5109
Configured with 2 seperate RAID Groups.
Group0 - RAID 1 - OS Installed here
Group1 - RAID 5 - VM's stored here

(RAID 5 shows as "Intel Raid 5 Volume SCSI Disk Device)

CPU: 2x Intel Xeon E5-2609 Sandy Bridge-EP, 2.4GHz 80w Quad-Core, BX80621E52609
RAM: 8x 4GB - Kingston 240-pin DDR3 SDRAM ECC Registered DDR 1333 1.35V VLP KVR13LR9D8L/4HC
NIC: 2x onboard, 1x Intel Gigabit CT Desktop Adapter
HDD: 5x HGST Travelstar H2IK5001672SP (0S02858) 500GB 7200 RPM 32MB Cache SATA 6.0Gb/s 2.5" Internal 
PSU: CORSAIR Professional Series Gold AX1200 (CMPSU-1200AX) 1200W ATX12V v2.31 / EPS12V v2.92
Graphics: EVGA e-GeForce 8400 GS (Nvidia 8400) PCI-E 2.0 x16

Drive Cady: Thermaltake RC1600101A MAX-1562 5.25" (x1) Bay to 2.5" (x6) Bay Mobile Rack HDD Canister 

O/S: Windows Server 2008 R2, SP1 (Build 7601)
Roles: Hyper-V

Open in new window

The BIOS version initially on the test machine was 3206, but I updated it to 5109.  The system stopped hanging, but now keeps crashing.  I also enabled WHEA in the BIOS, so now other than being told by "WhoCrashed" that the failing module was "hal.dll", it actually shows that there is a fatal memory issue.

I ran MemTest86 and it kept locking up at around 15% or so.  Some research showed that setting the voltage from automatic to the recommended settings by the RAM Manufacturer could resolve this, and it did.  MemTest86 passed with flying colors...but the system will still BSOD with the same error.

I contacted ASUS, and the support tech said to flash the bios to version 3302 (currently running 5109), as that is the highest version I need for my processor.

The issue I'm having and they seem unwilling to assist is None of the tools provided will update the BIOS.  The EZ Flash utility refuse to use the file as it is older than the currently installed version, same with the windows utility, and the BUPDATER.exe won't work because it is a CAP file and not a ROM file.

In the mean time I still have to bounce the production server at least once a day when it hangs up...with no resolution in sight.  Again, when the server hangs it gives no error message at all.  I'm at a loss as to what else could possibly be wrong, as the odds that I have two sets of bad hardware is insurmountable, but not implausible.
  • 4
  • 3
  • 2
  • +1
6 Solutions
Go to


Upload the last three minidump for analysis.

Pramod UbheCommented:
could be a memory leak issue.


Do you see any kind ow error/warnings in event logs?

If it is taking longer to troubleshoot, you can think of replacing production box with it its identical one. Or you can just swap the disks to see (as they are identical) to see if it is a hardware issue or OS issue.
Brian BIndependant Technology ProfessionalCommented:
Based on what the previous expert said, if everything works, there are a couple of items that spring to mind.

Swap the memory.
Check the board on the "bad" unit to see if any of the capacitors have "popped".
Check to make sure fans are moving freely and that you don't have any other overheating issues.
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

cjake2299Author Commented:
@ded9 The MiniDumps are added.

The production and test unit are identical.  The test unit is the one I updated the BIOS on to see if it would correct the issue with the system hanging, and this unit is now the one that crashes within 5-10 minutes after startup.

The production unit still hangs, but only once or twice a day.

Looking on the ASUS website, BIOS 3109 was supposed to fix the issue with the NVRAM causing the system to lock-up, both units had BIOS ver 3206.

@Pramod_ubhe I'll take a look at the link and get back to you.

@TBone2k no popped caps, cooling is fine (climate controls in server room keep temp at 68*F)  CPU Temp barely get over 34*C.

MemTest86 after two separate test (after setting voltage statically to MFG specs) passed fine with no issue.
cjake2299Author Commented:
@pramod_ubhe, after reviewing the article, the issue that is on the production server (and was on the test server before the BIOS update) is definitely a Hard Hang.  System is totally unresponsive...not even the monitor works.

The only error that shows in the event log before I reboot the system is from the SCCM Ops Manager connector looking for the old SCCM Server.

Both of these servers were previously in a cluster together as part of our sand-box.  The cluster was destroyed following MS Guidelines.  There were no apparent issues, but the servers were only on while vetting updates/modifications to the application and testing the security of the application.  Once that was complete for the day, the servers were powered off (rarely on for longer than 4 hours).
Dmp points to overheating issue. Check whether the fan on the processor is seated properly.

If overclocking is enabled in bios then disable it. Check cpu temp

Brian BIndependant Technology ProfessionalCommented:
Maybe I missed that you already have don eit, but I would still suggest swapping the memory. it's quick and you don't have to change it back if you are wrong.

Just to clarify, you said both units are now crashing/hanging?
cjake2299Author Commented:
Both units were hanging.

Unit1 (test unit) I updated the BIOS to 5109 and it started crashing.

Unit2 I have left alone thus far.  Manufacture finally responded and said I need to update the BIOS on Unit2 to 3302 to accurately support my CPU, but has thus far failed at helping to roll-back the BIOS on Unit1.

I'd prefer to set the BIOS to version 3302 on Unit1 to verify it corrects the issue before I update the BIOS on Unit2 to the same version.
Brian BIndependant Technology ProfessionalCommented:
I am surprised in all of this that there would be these kinds of problems related to BIOS, but it sounds like Acer has confirmed it. Have you tried searching on google with your specific model of server to see if others are having the same problem?

It really does sound BIOS related at this point.... or memory, or both. I know I keep saying that, but just because it passed memtest doesn't mean there isn't some random problem. That's the last I'll bring it up.
cjake2299Author Commented:
UPDATE, ASUS says I need the BIOS update to correctly support my CPU.  Updated Machine2 to the BIOS level that supports my CPU and it has been running fine.

Had many other weird issues with Machine1 while trying to get the BIOS rolled back, but ASUS it can't be done (after talking to the fifth tech support rep in two weeks).

I ordered a BIOS chip from ASUS with the correct BIOS version installed on it for Machine1, should be here by Friday.  During my internet searches I've found a large variety with this particular ASUS board (Z9PE-D8 WS), so I'll avoid it in the future.  My Sabertooth boards have been running fine for years, said to see that the only ASUS board that supports 2 CPU sockets is having so many issues.

Once I get the BIOS chip I'll test the memory as well.  This has been a very odd experience.  I'd avoid the Z9PE-D8 WS board if you can.  Also going to tighten up the interior to see if I can improve air flow, maybe replace a few of the standard CoolerMaster fans that came with the Chassis with something that pushes through a mush higher volume.

I'm going to close this for now and issue points evenly to each of you for your assistance.

Thanks again!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

  • 4
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now