Link to home
Start Free TrialLog in
Avatar of enriquecadalso
enriquecadalsoFlag for Colombia

asked on

Server DELL PowerEdge 1800 reboots when idle after second CPU added

After installing an extra CPU to a DELL Power Edge 1800 server it reboots when the work load is low. It is a Terminal Server in application mode. When there are many users connected and the work load is high it never reboots. The problem happens mostly in the nights, from 8:00PM to 6:00AM, not in working hours, but sometimes it reboots when there are only a few users connected.

After the installation of the new extra CPU we installed W2003 SP2 but that did not solve the problem. The reboot does not seem to be from a hardware failure, but launched by the OS, because it writes something to the event log.

I attached here the errors in the event log. Also an everest report of the hardware.
eventlogerrors.txt
everest.txt
Avatar of Delphineous Silverwing
Delphineous Silverwing
Flag of United States of America image

Have you reviewed your power management settings on the server?
Avatar of enriquecadalso

ASKER

Power Options in control panel don't have too much to review. There the Power Scheme is on "Always on", never turn off the hdds and no UPS service (we have a very heavy DC AC Power Inverter).

In the BIOS I have loaded the default values.

I am sure the settings are the same when there was only one CPU. Is there any setting I have to change when installing a new CPU?
did you ran  the System Diagnostics?
The system diagnostics can be run either from the utility partition on your hard drive or from a set of diskettes that you create using the Dell OpenManage Server Assistant CD
also check the sever have all  firmware updated
Regards,
Jose
Go into Device manager and right-click the Processors  folder and then click "Scan for hardware changes"  Do the same for Computer.

Have you updated the chipset/motherboard drivers and firmware?
Yes. I ran all checks and no problems found. Also check RAM and HDDs with other tools.
We might need to check your HAL-type (hardware abstraction layer-type).  If your system is still set to one of the uniprocessor HALs, the HAL driver might need to be updated to a multiprocessor type.  The HAL type can be seen in Device Manager under the Computer device.

References
http://technet.microsoft.com/en-us/library/bb727149.aspx
http://support.microsoft.com/kb/309283
Delphineous, windows detect both CPUs working. In fact they work very well. Only that when they are idle SO reboots.

I also tried to modify the Enhanced Halt State (C1E) parameter but did not found how to do it. In the BIOS there is nothing related. It is currently enabled.
JoltinJoe, HAL reads ACPI Multiprocessor PC.

If the cause is that SO needs to readapt to the new environment we have the hope that a full reinstall will solve the problem. Do you think it will work?
hal.JPG
That HAL is the correct one.  This could be caused by an external event, such as a power surge or drop as equipment comes online or offline.  Does the few users correspond to a time when the power source is sporadic or inconsistent?
Source power is ok all the time. We have a online UPS PowerWare Prestige 3000. It takes power from a DC AC Power Inverter able to supply for more than 10 hours. We have consulted electrician specialist, they take measurements of charges and power supplies and everything is ok. Besides there are other server under the same power supply without incident.
Are the two CPU's the same sSpec?
.
ASKER CERTIFIED SOLUTION
Avatar of JoltinJoe
JoltinJoe
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
PCBONEZ Everest report they are exactly the same. Also tried Intel Processor Identification Utility and reports the same.

JoltinJoe the System Diagnostics tests includes test to L1 and L2 caches (http://www.dell.com/downloads/global/power/ps1q05-20040119-Patel-OE.pdf). I ran all of the tests and no errors found. Anyway I will run it again later to be sure (can't do it right now because the server is in production).

Thank you all for your attention.
You may need a hotfix, as described in this thread: http://forums.techarena.in/windows-server-help/895739.htm
Thanks Callandor. Installed the hotfix. Now  I have to wait a few days to see the results. Reboots are unexpected, we have no idea when they will happen but I think a week will be enough to be sure.
Unless the author can verify what actually fixed the problem, there isn't any reason to keep this.
I think JoltinJoe was right. Tried all patches and hotfixes. Tried a full reinstall with no luck. It seems a defective CPU. We are replacing it. Thanks everyone for your help.

Only one question left: JoltinJoe, where can I get the meaning of those codes?

(0x7f is a kernel mode exception, and the first parameter of the bsod - 0x0d - is a general protection fault)
Here's a site with some codes listed: http://www.updatexp.com/stop-messages.html