Link to home
Start Free TrialLog in
Avatar of Steven Vona
Steven VonaFlag for United States of America

asked on

ACPI on Linux Server

We have several Dell R210 servers running Red Hat Linux.  We are seeing issues that cause the server to stick when rebooted.  By stick I mean they shutdown, but stay in a state where the power is never turned off and back on.  They just sit there with the diagnostics light #3 steady green.  If we hold down the power button and then power them back on they boot fine.  If we shut them down completely they power off without issues.  We have no issues booting from a powered off state either.

The green diag light #3 being lite is supposed to mean a processor failure.  We ran every diagnostic we can think of on the processors and everything checks out.  Plus this happens on all 9 of our servers, which pretty much rules out processor issues since it is virtually impossible for all our processors to go bad at the same time.

One of our older Linux admins thinks it is ACPI causing the reboot to fail.  Sure enough I disabled ACPI by adding acpi=off to the kernel line in grub.conf and also stopped and disabled the acpid from starting on boot.  After several reboots we are having no issues.

My question is...

What is the downside of disabling ACPI?  Is it even needed for a server?  The server obviously never sleeps, and there is no lid to shut like on a laptop.  Any input would be appreciated.
Avatar of rindi
rindi
Flag of Switzerland image

No, normally you don't need ACPI on servers. But sometimes a BIOS/Firmware upgrade can fix issues like that, or also OS updates.
I am prety sure you dont need to disable ACPI.

Downside: most devices unconfigured, no software-driven poweroff (basically you are here now), fans either always off or always at maximum speed. No CPU frequency scaling.

I'd say it is very likely you will fry CPU in seconds (or server will consume 5x more power than normal and generate maximum possible noise)
Avatar of Steven Vona

ASKER

I wish more experts would chime in since I am getting conflicting answers.
That you ask via "request attention" link near question.
Which part of "cpufreq being driven by acpi" is so hard to understand?
@gheist, you are very polite.  What part of conflicting answers is so hard to explain.
Experts cannot call more attention to the question. You need to notify moderators.
I will respectfully disagree with gheist: ACPI is generally a power consumption control mechanism introduced to help REDUCE the power consumption by components of a PC -- part of the 1990's "green" drives.

You are, IMHO, perfectly OK to disable ACPI in your Linux servers, as you very seldom need or want to reduce your CPU speeds on a server. However, if you DO want to have fine control of your CPU speeds, there is a separate package (surprisingly called cpuspeed) that may or may not depend upon ACPI to work (most do NOT have a dependency, some do -- in my experience).

ACPI is, on MOST hardware, separate from fan speed control. ACPI is designed to save power. Fan speed control is designed to dissipate heat. Thus, disabling ACPI should not affect your fan control -- nor should it affect any of your hardware monitoring tools (lmsensors) -- many of which will work just fine without ACPI enabled... in the kernel or in BIOS.

All of that being said, I believe you may find that there is a new BIOS available for your Dell servers that may fix the ACPI incompatibility with Linux. In the olden days, Dell couldn't care less about Linux -- if it worked for Windows, it worked. But these are not your Father's Dell servers -- and they know a sizeable number of them are running Linux... so Dell pays attention, and an ACPI incompatibility isn't likely something Dell actively decided NOT to fix. I would check out your system BIOS.

For What Its Worth, ACPI is one of several "stock" Linux services (acpid) that I turn OFF on servers -- including bluetooth, iSCSI (unless I'm actually USING iSCSI), certmonger, xinetd, & ALL port mapping, RPC & NFS services... just to name a few.

I do hope this helps.

Dan
IT4SOHO
You ignore the C letter in ACPI. It is also system configuration storage.
Avatar of noci
noci

ACPI  (Advanced Configuration and Power Interface) is the low level software that initialises & manages your motherboard. It has all kinds of tables about video modes your system supports.
It manages the cpu frequency WITH voltages that belong with those frequencies.
ACPI also manages the power to your system or the portions of your system that need power.
Depending on WOL etc.
Also ACPI code monitors the temperature of the system shutsdown in case of overheating.

In short it is very unwise to disable ACPI.
ASKER CERTIFIED SOLUTION
Avatar of Daniel McAllister
Daniel McAllister
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
1) disable ACPI in BIOS - computer follows very old MP specification e.g. Linux will not route IRQs to other processors
2) disble acpid service in Linux - it handles just power button - i.e no problem
3) disable various acpi options in kernel - read their documentation, basically in case of problems thay exist to trade some performance for stability.
To my knowledge, the original question never posited disabling acpi in BIOS -- only in the kernel (and the daemon) with the boot option "acpi=off".

That being said, thanks to gheist for disassembling the acpi functionality. That should make things more clear to savone -- and especially help to explain why different experts are saying different things about acpi -- in part because they're talking about different aspects (the different parts) of acpi.

Dan
IT4SOHO
There are many problems associated with full acpi=off
Some cards will not work and it will route all interrupts on lowest core (maybe hyperthreading works, maybe not etc)
It is sort of heaviest case of "trading performance for stability"

I'd suggest installing latest BIOS(failure to power off is attributable to power supply, which is programmed by either BIOS or iDRAC/ or BMC firmware) from DELL and contacting their support if it still does not work (BTW thay are fairly friendly folks)
Also worth trying - mcelog on linux, and switching on/off irqbalance and microcode_ctl for diagnostics.