Solved

random system binaries in /bin inexplicably seg faulting even after restore

Posted on 2004-09-01
9
263 Views
Last Modified: 2011-09-20
I first ran into this problem two days ago, on the same system.  The first time I noticed it was during a reboot.  The operating system is RedHat 7.3 with the 2.4.20-28.7 kernel and CPanel control panel.  During the init process, I noticed an unusually large number of "FAILS" being displayed - along with shell errors scrolling too fast to read.  I was presented with a login prompt and was able to successfully login.  Next, I attempted to use /etc/init.d/* to bring up some of the failed services.  Some of the failed services include the network interfaces (eth0 and eth1) and MySQL.  When I tried to load these by hand I saw that 'sed' was generating a shell script error.  Next I ran 'sed' by hand and was presented with what seemed like normal usage information followed by machine code.  After resetting my terminal I continued on, I discovered that 'ls' seg faulted without displaying any information, as did 'umount'.  I'm sure there were others, but I cannot remember them now and they're ultimately not important - all of the problem binaries were located in /bin/ and exited with similar errors.  I tried several things to fix this problem, I used 'scp' to copy known-good versions of the broken binaries onto the system.  Each time, however, the newly copied binaries produced the same results.  Something else interesting to note is that the listed file sizes (via 'stat') of the broken binaries were different than that of the known-good copies.  Thinking that the machine had been compromised, I performed on a fresh install of RedHat, followed by CPanel on completely new hardware and an empty hard drive.  Afterwards I migrated all of the configuration settings and user information to the new drive, and placed the server back online.  Yesterday I spent the entire day examining the old drive, looking for any signs of a backdoor (find SUID files, world writeables, check crontabs, system init scripts) - and came up completely empty.  Thinking I was out of the woods, I double checked that the new server was all patched up (it was), and removed the old drive for good.  Late yesterday evening I checked on the server one more time - only to discover that the binaries in the /bin/ directory are doing it again!  Some of the binaries are different this time - sed and ls work fine, grep, umount, and awk are all broken, but the symptoms are the same - seg faults, and broken system scripts.  This time, I have been able to 'scp' known-good copies of the system binaries for short periods of time, but after some random interval they eventually break again.  I'm nervous because I feel like I have exhausted every possible option with no success, including a last resort reinstall.  Any suggestions or help would be greatly appreciated.

Thanks!
0
Comment
Question by:astanley218
  • 4
  • 4
9 Comments
 
LVL 17

Expert Comment

by:owensleftfoot
Comment Utility
I could be wrong and often am but I would guess whatever patching you are doing isnt up to date and you are getting script kiddies hitting your box as soon as it accesses the internet. Either that or your harddrive is on the way out. My first concern would be why you are using RH 7.3 - its ancient. I would recommend an upgrade/reinstall of fedora. Jim will recommend paying for RHE :)
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
I won't discount the possibility that the system has been cracked, but most of those binaries aren't generally targets of a cracker. It's the security related objects they usually change (kernel, login, etc).

What I'd try at this point is to boot up in rescue mode from a 7.3 or later CD and see what an 'rpm --root /mnt/sysimage --verify -a' shows. By running the OS and rpm binary from the CD any corrupted parts of the system are out of the picture (including the affects of a root kit) and the results of the verify should be trustworthy. If in fact the verify shows that files have inexplicably changed and there's no evidence of a break in (check with one of the tools from http://www.cerias.purdue.edu/homes/carrier/forensics/ and/or http://www.porcupine.org/forensics/tct.html) a hardware problem becomes a possibility.
0
 

Author Comment

by:astanley218
Comment Utility
owensleftfoot - We've stuck with RH 7.3 because CPanel does not recommend upgrading a production server.  In the past it has relied on Fedora Legacy updates to remain updated, but they are no longer active.  Perhaps I am being compromised very quickly, but the facts don't add up for me.  Script kiddies very often leave large trails and that just isn't the case here.  There are no odd ports open and as jlevie mentioned, the popular trojans are not present either - how would a script kiddie be able to get back in without script kiddie trojans (other than the original hole)?  Maybe there is someone more experienced lurking, but if he/she can properly hide all of the other work - why leave gzip,sed,etc. broken?

jlevie - I will try your suggestion tonight, I tried to think of a way to do just that the night it first happened, but I'm not very good with rpm.  My personal preference is a slackware system with lots of hand compiling, but the control panels just don't want to support that.

As a side note, here is the output of 'stat' on /bin/gzip before and after whatever change is being made:

Working (known good) gzip:
File: `/bin/gzip'
  Size: 53716           Blocks: 120        IO Block: 4096   Regular File
Device: 301h/769d       Inode: 2632740     Links: 1    
Access: (0555/-r-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-09-01 17:47:35.000000000 -0400
Modify: 2004-09-01 17:46:49.000000000 -0400
Change: 2004-09-01 17:47:26.000000000 -0400


failing gzip:
root@orf-lx-4 [/bin]# gzip
Q‹D$‹\$‹L$Í€Y[ÉÐU‰åSQR‹D$‹\$‹L$‹T$ Í€ZY[ÉÍvƒ½(ÿÿÿroot@orf-lx-4 [/bin]#

  File: `/bin/gzip'
  Size: 73034           Blocks: 152        IO Block: 4096   Regular File
Device: 301h/769d       Inode: 2632740     Links: 1    
Access: (0555/-r-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-09-01 19:12:05.000000000 -0400
Modify: 2004-09-01 17:46:49.000000000 -0400
Change: 2004-09-01 17:54:21.000000000 -0400
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
> Script kiddies very often leave large trails and that just isn't the case here.  There are no odd ports open and as jlevie
> mentioned, the popular trojans are not present either.

The lack of obvious evidence doesn't necessarily mean that the server hasn't been cracked. It just means that you might not be able to see the evidence from the running system. Checking from a boot from other media could reveal the evidence.

Given the stat output it does look like something is modifying those binaries and I'll bet that you'll get all kinds of MD5 sum errors from the rpm verify. Taking a conservative approach I'd isolate the system once I've verified that the changes are real and run the forensics tests to see if it is an attack. If that comes up positive you are going to have to re-think the wisdom of running 7.3 since it would have attacked twice. If the forensics don't show an attack we need to consider a hardware issue or some misbehaving application as the cause.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 

Author Comment

by:astanley218
Comment Utility
I apologize for the late inclusion, I made a huge mistake in my original post.  The current server is, in fact, RedHat 9.0 (it was a clean install, so I took the opportunity to upgrade).  The problem originally presented itself on RedHat 7.3.  I'm going to go ahead with the rpm --verify late tonight, I'll post my results.
0
 
LVL 40

Expert Comment

by:jlevie
Comment Utility
That makes a difference. I could see there being an un-patched security hole in 7.3 since it hasn't been supported since last Dec, but 9 dropped off Redhat support in May and has been maintained by the Fedora Legacy project since. It would be unlikely that both had a vulnerability if the 9 box was up to date (it was, wasn't it?).

That mostly leaves a config problem as a security issue, but even there I'd expect a botched root kit not to behave the same way on both OS's. So, using Occam's Razor, the real problem probably lies elsewhere and my suspicions now would be some fault in an application that runs with root privs that's been added to the system or some hardware flaw. I've run enough RedHat boxes to believe that this wouldn't be a result of some RedHat compnent.
0
 

Author Comment

by:astanley218
Comment Utility
Well, I had horrible luck with the rpm --verify, I'm posting this update from the datacenter (where I've been since 1:00AM EST).  Many, many files were listed as having been changed in the RPM database, from all sorts of packages (iputils, net-tools, coreutils, glibc-common just to name a few).  I manually went through the list and did rpm -ivh --force installs for all of the packages that were listed.  Then I tried to restart the system - even worse than before!  More binaries were corrupted, even less of the boot scripts were being run, and essentially I couldn't get back into the machine (booting into single user mode left the root partition in read-only mode because 'mount' is segfaulting, so I can't make any more changes to the machine).  Currently, I'm installing RH 9 on a brand spanking new machine (new mobo, RAM, HD, CPU) - the motherboard this time is an entirely different manufacturer (Intel), the CPU is a socket 478 (370 before), and the RAM is DDR (SDRAM before).  I'm hoping and praying that this works out.  What is the best way to update RH 9 out of the box?  Does RedHat's up2date service still work?  I haven't had to do this since support dropped off, I'm considering using the Fedora Legacy instructions for setting up updates, would that cover my bases?
0
 
LVL 40

Accepted Solution

by:
jlevie earned 500 total points
Comment Utility
> I manually went through the list and did rpm -ivh --force installs for all of the packages that were listed.  Then I tried
> to restart the system - even worse than before!

That really sounds like a hardware problem. Apparently blocks aren't being written to disk correctly. Could be processor, memory, or disk controller.

> Does RedHat's up2date service still work?

I don't think so.

> I'm considering using the Fedora Legacy instructions for setting up updates, would that cover my bases?

That should be fine.
0
 

Author Comment

by:astanley218
Comment Utility
I suspected hardware on the first night as well.  I could have sworn that I switched out the motherboard + processor, and I know I definitely ran memtest86.  I even tried a new IDE cable!  I must have missed something though, because things have been back online for over 2 hours without so much as a hitch as of now.  This is significant considering that for the last 3 days I couldn't even complete an Apache compile because the binaries were being messed up so fast.  I spent significantly more time patching up the system this time (fresh install, up2date, Fedora Legacy apt-get upgrade, and CPanel's /scripts/upcp).  I took care to setup iptables to drop all incoming packets during the patch phases also.  In all, I ended up replacing everything - HD, CPU, RAM, Power Supply, and Motherboard.  Something sure worked.  I'm confident this is over with, if not I'll make it a new question since I have so much more information.  Thanks for all of your suggestions.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Hello EE, Today we will learn how to send all your network traffic through Tor which is useful to get around censorship and being tracked all together to a certain degree. This article assumes you will be using Linux, have a minimal knowledge of …
BIND is the most widely used Name Server. A Name Server is the one that translates a site name to it's IP address. There is a new bug in BIND (https://kb.isc.org/article/AA-01272), affecting all versions of BIND 9 from BIND 9.1.0 (inclusive) thro…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now