random system binaries in /bin inexplicably seg faulting even after restore
Posted on 2004-09-01
I first ran into this problem two days ago, on the same system. The first time I noticed it was during a reboot. The operating system is RedHat 7.3 with the 2.4.20-28.7 kernel and CPanel control panel. During the init process, I noticed an unusually large number of "FAILS" being displayed - along with shell errors scrolling too fast to read. I was presented with a login prompt and was able to successfully login. Next, I attempted to use /etc/init.d/* to bring up some of the failed services. Some of the failed services include the network interfaces (eth0 and eth1) and MySQL. When I tried to load these by hand I saw that 'sed' was generating a shell script error. Next I ran 'sed' by hand and was presented with what seemed like normal usage information followed by machine code. After resetting my terminal I continued on, I discovered that 'ls' seg faulted without displaying any information, as did 'umount'. I'm sure there were others, but I cannot remember them now and they're ultimately not important - all of the problem binaries were located in /bin/ and exited with similar errors. I tried several things to fix this problem, I used 'scp' to copy known-good versions of the broken binaries onto the system. Each time, however, the newly copied binaries produced the same results. Something else interesting to note is that the listed file sizes (via 'stat') of the broken binaries were different than that of the known-good copies. Thinking that the machine had been compromised, I performed on a fresh install of RedHat, followed by CPanel on completely new hardware and an empty hard drive. Afterwards I migrated all of the configuration settings and user information to the new drive, and placed the server back online. Yesterday I spent the entire day examining the old drive, looking for any signs of a backdoor (find SUID files, world writeables, check crontabs, system init scripts) - and came up completely empty. Thinking I was out of the woods, I double checked that the new server was all patched up (it was), and removed the old drive for good. Late yesterday evening I checked on the server one more time - only to discover that the binaries in the /bin/ directory are doing it again! Some of the binaries are different this time - sed and ls work fine, grep, umount, and awk are all broken, but the symptoms are the same - seg faults, and broken system scripts. This time, I have been able to 'scp' known-good copies of the system binaries for short periods of time, but after some random interval they eventually break again. I'm nervous because I feel like I have exhausted every possible option with no success, including a last resort reinstall. Any suggestions or help would be greatly appreciated.