I have been having issues with my mail server and drives suddenly and randomly not responding.
The relevant error messages are for example:
Dec 31 03:08:24 mail kernel: ata1: command 0x35 timeout, stat 0x50
Dec 31 05:47:25 mail kernel: ata2: command 0x35 timeout, stat 0x50
I believe this means that Linux can not access the drives anymore, it is still running but I can not login anymore because the /etc/passwd file is not available to authenticate me. Which is lovely and awesome, and I can't believe I am so lucky.
Anyway I have talked with many of my friends who know a lot more then I do about this, and they gave me the possible problems. Faulty or overheated hard drives. Faults in the hard disk or raid controller or a similar fault on the mainboard that affects the hard disks. They said that if my server was old it was likely a hardware issue and would need to replace it, if the server was new it could be an incompatibility issue and should try using the NOAPIC switch. However they warned that if I started with the NOAPIC switch it could make the system unaccessible and I would be in a world of hurt.
Okay now your wondering well whats wrong you seem to have been guided well and should be able to move on and apply the switch. Well I want to make this as quick and painless as possible for me. :) I would like to know what is the best way to have something standing by in case this should fail so that I can get the system back up and working, I don't have a LiveCD it was not shipped with the system so I guess I would have to create it or do something else. I have a system with Webmin 1.300 installed that acts as my administration setup, however I am comfortable on the command line.
Please help. Also if there were a better way then NOAPIC switch or some tests I can run the system through I would greatly appreciate the help.