Server Freezes or Reboots on its own

Hi Guys
I have a brand new R610 rack server that keeps giving problems. It was fine for a week but for the last three weeks it has behaved very sporadically. It either reboots on its own or just freezes and i cannot access it over the network or locally and I have hold the power button in to shut it down. Please see attached screenshots for the errors.

Here is the HW config of the server:
Intel Xeon E5620 Processor (2.40Ghz, 4C, 12M Cache, 5.86 GT/s QPI, 80W TDP,Turbo, HT) 1066MHz Max Memory
8GB Memory for 1CPU (4x2GB Dual Rank RDIMMs) 1066MHz
2 x 146GB SAS 6Gbps 15k 2.5" HD Hot Plug (RAID1)
3 x 146GB SAS 6Gbps 15k 2.5" Additional HD Hot Plug (RAID5)
PERC 6/i RAID Controller Card 256MB PCIe, 2x4 Connectors
16X DVD-ROM Drive SATA
High Output Redundant Power Supply (2 PSU) 717W, Performance BIOS Setting
Embedded Broadcom GbE LOM with TOE and iSCSI Offload HW Key
C8 MSS R1/R5 for PERC 6i/H700, Exactly 2 Primary and 3-4 Additional Drives

Currently I have teaming enabled for two of the NICs on a 1Gbps network.

This server is running Exchange 2010 SP1 and Server 2008 R2. It also has the latest version of ESET Mail Security for Exchange installed.

Is there any way I can fix this?

Thanks!!

ScreenShot-1.jpg
ScreenShot-2.jpg
GenasysTechnologiesIT ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

faizbaigCommented:
CPU Temperature need monitoring
Maybe download and install a program called Speed fan to see the CPU core temps. None of them should be above about 52 degrees C.

Download link
http://www.almico.com/sfdownload.php
faizbaigCommented:
....and

You might want to check the memory for errors, memory problems can often cause crashes and random restarts.

A good program for checking your memory is made by Microsoft. It's free and runs from a boot disk.

Go to this page:
http://oca.microsoft.com/en/windiag.asp

Click on the second link from the top "Download Windows Memory Diagnostic".

Download the little program and get a blank floppy disk. When you run the program it will create a bootable floppy disk with a memory testing program on. Switch off your computer and remove the second stick of memory. Start it up and boot from the floppy disk (you may have to change the boot order in the BIOS).

When the program loads it will immediately start checking your memory in 'standard mode'. Any errors encountered will be displayed at the bottom.
If your memory passes the test without any errors then it's probably okay, but just to be safe you can press 'T' which will make it go into a more thorough mode (this takes a bit longer - I recommend leaving it overnight to do this one).

Yes, Freezing or restart could be because of One of the memory stick bad.
dax_badCommented:
If it's hardware causing the reboots, you should be able to find clues to the cause in an integrated log in the dell server management system (Openmanage).
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

GenasysTechnologiesIT ManagerAuthor Commented:
Thanks Faiz!

The temperature should not be the problem as it never goes over 20 degrees C. The memory thing i will check out now. I just updated the controller firmware to the latest version and rebooting.
Viral RathodConsultantCommented:
Is there any Memory Dump has generated on the server ? If not then genearte Manual Memory Dump and Analyze the Memory Dump to find the root cause

http://support.microsoft.com/kb/244139

GenasysTechnologiesIT ManagerAuthor Commented:
Hi Dax

I checked the hardware logs in OpenManage already and there are no errors.
BasheerptCommented:
Just want to check, Have you installed this server through the bootable cd accompanied by the Dell server? Or manually loaded the drivers? If not the first method, do it that way. May be the drivers are wrongly applied. Also, download the latest firmware update utility CD from Dell website (SUU cd) and update all FW to the latest.

Keep ON the server in character mode (dont allow to get in to windows) for a certain hours and see whether the server restart or not. If not restarting, the cause will not be hardware, can be driver issue in windows. If it is hardware problem, it may restart in both in windows and non windows sessions.

Good luck

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GenasysTechnologiesIT ManagerAuthor Commented:
Hey Basheer

I installed it with the bootable cd yes. I will download the SUU and do the needed updates.

Thanks!
BasheerptCommented:
Thanks for the update,
Try the second test also. Keep the server ON in non windows mode and see it is restarting or not. That way i guess you can isolate if its a windows or hardware problem.

Regards
GenasysTechnologiesIT ManagerAuthor Commented:
Great, will do. I'll only be able to do it on the weekend though as this is our production exchange server.

Will update once I'm done.
BasheerptCommented:
Okay, good luck..:) and dont forget to update us.
ryan680Commented:
I have approx 45 R610's in my inventory and i've had a problem with a number of them where the memory gets slightly loose in transit. I eject all the chips and reseat them as standard practice now. I'd give that a try.
andyalderCommented:
"Fatal firmware error" on the RAID controller and you hide it in a screenshot?

Just log a hardware fault with Dell assuming said firmware is up to date.
GenasysTechnologiesIT ManagerAuthor Commented:
@ryan

Thanks, will try that this weekend as well and send an update.

@andy

What I hid in the screenshot was the name of my server, all the other info is there. I'll log a call as soon as I cannot fix it myself, thanks.
andyalderCommented:
I meant that you hadn't put the message "fatal firmware error" in the text or the title but only in the screenshot, a lot of people just skim the threads and don't read through screenshots.

Bad RAM can't cause that, it can only be the RAID controller, whether onboard or in a PCI slot, it can't even be the PCI slot connection. A fatal firmware error on anything is integral to that specific device. At least the controller shuts itself down because it knows it's had a brain fart rather than writing gibberish to the disks, and with the I/O subsystem stopped the server hangs although the mouse & video might still work.
GenasysTechnologiesIT ManagerAuthor Commented:
Ahh, ok, thanks for clearing that up, I will update the title now.

I did update the firmware of the controller yesterday to the latest version from Dell's site so I'm keeping an eye on it. For now all seems ok, but I don't want to speak too soon.
abafadelCommented:
Ok, wait and see
GenasysTechnologiesIT ManagerAuthor Commented:
Ok, the first thing I tried on saturday was to install the latest firmware from the latest Update Utility Image I downloaded on friday. After doing this, the fatal firmware errors disappeared and the server has been fine till now. Usually freezes once or twice per day.

So for me the solution was to update all the firmware to the latest versions.

Thanks basheer and everyone else!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.