?
Solved

Server DELL PowerEdge 1800 reboots when idle after second CPU added

Posted on 2009-12-17
19
Medium Priority
?
432 Views
Last Modified: 2013-12-10
After installing an extra CPU to a DELL Power Edge 1800 server it reboots when the work load is low. It is a Terminal Server in application mode. When there are many users connected and the work load is high it never reboots. The problem happens mostly in the nights, from 8:00PM to 6:00AM, not in working hours, but sometimes it reboots when there are only a few users connected.

After the installation of the new extra CPU we installed W2003 SP2 but that did not solve the problem. The reboot does not seem to be from a hardware failure, but launched by the OS, because it writes something to the event log.

I attached here the errors in the event log. Also an everest report of the hardware.
eventlogerrors.txt
everest.txt
0
Comment
Question by:enriquecadalso
  • 8
  • 4
  • 2
  • +3
18 Comments
 
LVL 19

Expert Comment

by:Delphineous Silverwing
ID: 26074306
Have you reviewed your power management settings on the server?
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26074505
Power Options in control panel don't have too much to review. There the Power Scheme is on "Always on", never turn off the hdds and no UPS service (we have a very heavy DC AC Power Inverter).

In the BIOS I have loaded the default values.

I am sure the settings are the same when there was only one CPU. Is there any setting I have to change when installing a new CPU?
0
 
LVL 7

Expert Comment

by:jgpd
ID: 26074727
did you ran  the System Diagnostics?
The system diagnostics can be run either from the utility partition on your hard drive or from a set of diskettes that you create using the Dell OpenManage Server Assistant CD
also check the sever have all  firmware updated
Regards,
Jose
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 19

Expert Comment

by:Delphineous Silverwing
ID: 26074755
Go into Device manager and right-click the Processors  folder and then click "Scan for hardware changes"  Do the same for Computer.

Have you updated the chipset/motherboard drivers and firmware?
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26074776
Yes. I ran all checks and no problems found. Also check RAM and HDDs with other tools.
0
 
LVL 3

Expert Comment

by:JoltinJoe
ID: 26074786
We might need to check your HAL-type (hardware abstraction layer-type).  If your system is still set to one of the uniprocessor HALs, the HAL driver might need to be updated to a multiprocessor type.  The HAL type can be seen in Device Manager under the Computer device.

References
http://technet.microsoft.com/en-us/library/bb727149.aspx
http://support.microsoft.com/kb/309283
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26074819
Delphineous, windows detect both CPUs working. In fact they work very well. Only that when they are idle SO reboots.

I also tried to modify the Enhanced Halt State (C1E) parameter but did not found how to do it. In the BIOS there is nothing related. It is currently enabled.
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26074864
JoltinJoe, HAL reads ACPI Multiprocessor PC.

If the cause is that SO needs to readapt to the new environment we have the hope that a full reinstall will solve the problem. Do you think it will work?
hal.JPG
0
 
LVL 69

Expert Comment

by:Callandor
ID: 26077365
That HAL is the correct one.  This could be caused by an external event, such as a power surge or drop as equipment comes online or offline.  Does the few users correspond to a time when the power source is sporadic or inconsistent?
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26079894
Source power is ok all the time. We have a online UPS PowerWare Prestige 3000. It takes power from a DC AC Power Inverter able to supply for more than 10 hours. We have consulted electrician specialist, they take measurements of charges and power supplies and everything is ok. Besides there are other server under the same power supply without incident.
0
 
LVL 26

Expert Comment

by:PCBONEZ
ID: 26085831
Are the two CPU's the same sSpec?
.
0
 
LVL 3

Accepted Solution

by:
JoltinJoe earned 2000 total points
ID: 26101221
0x7f is a kernel mode exception, and the first parameter of the bsod - 0x0d - is a general protection fault.  This combination is usually caused by bad drivers, bad memory, or a bad cpu.  I wonder if there are any thorough Intel Xeon tests out there - especially one that can thoroughly verify both the l1 and l2 cache.
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26184305
PCBONEZ Everest report they are exactly the same. Also tried Intel Processor Identification Utility and reports the same.

JoltinJoe the System Diagnostics tests includes test to L1 and L2 caches (http://www.dell.com/downloads/global/power/ps1q05-20040119-Patel-OE.pdf). I ran all of the tests and no errors found. Anyway I will run it again later to be sure (can't do it right now because the server is in production).

Thank you all for your attention.
0
 
LVL 69

Expert Comment

by:Callandor
ID: 26186152
You may need a hotfix, as described in this thread: http://forums.techarena.in/windows-server-help/895739.htm
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 26189895
Thanks Callandor. Installed the hotfix. Now  I have to wait a few days to see the results. Reboots are unexpected, we have no idea when they will happen but I think a week will be enough to be sure.
0
 
LVL 69

Expert Comment

by:Callandor
ID: 27593215
Unless the author can verify what actually fixed the problem, there isn't any reason to keep this.
0
 
LVL 11

Author Comment

by:enriquecadalso
ID: 27599597
I think JoltinJoe was right. Tried all patches and hotfixes. Tried a full reinstall with no luck. It seems a defective CPU. We are replacing it. Thanks everyone for your help.

Only one question left: JoltinJoe, where can I get the meaning of those codes?

(0x7f is a kernel mode exception, and the first parameter of the bsod - 0x0d - is a general protection fault)
0
 
LVL 69

Expert Comment

by:Callandor
ID: 27620321
Here's a site with some codes listed: http://www.updatexp.com/stop-messages.html
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Or at least that’s the word according to a new blog from Tech Target on AWS’s new Managed Services (MS) offering. According to the blog, AWS is launching their AWS MS program to expedite the adoption of cloud by Fortune 1000 and Global 2000 companie…
New style of hardware planning for Microsoft Exchange server.
Are you ready to place your question in front of subject-matter experts for more timely responses? With the release of Priority Question, Premium Members, Team Accounts and Qualified Experts can now identify the emergent level of their issue, signal…
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question