[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

random system binaries in /bin inexplicably seg faulting even after restore

Posted on 2004-09-01
9
Medium Priority
?
275 Views
Last Modified: 2011-09-20
I first ran into this problem two days ago, on the same system.  The first time I noticed it was during a reboot.  The operating system is RedHat 7.3 with the 2.4.20-28.7 kernel and CPanel control panel.  During the init process, I noticed an unusually large number of "FAILS" being displayed - along with shell errors scrolling too fast to read.  I was presented with a login prompt and was able to successfully login.  Next, I attempted to use /etc/init.d/* to bring up some of the failed services.  Some of the failed services include the network interfaces (eth0 and eth1) and MySQL.  When I tried to load these by hand I saw that 'sed' was generating a shell script error.  Next I ran 'sed' by hand and was presented with what seemed like normal usage information followed by machine code.  After resetting my terminal I continued on, I discovered that 'ls' seg faulted without displaying any information, as did 'umount'.  I'm sure there were others, but I cannot remember them now and they're ultimately not important - all of the problem binaries were located in /bin/ and exited with similar errors.  I tried several things to fix this problem, I used 'scp' to copy known-good versions of the broken binaries onto the system.  Each time, however, the newly copied binaries produced the same results.  Something else interesting to note is that the listed file sizes (via 'stat') of the broken binaries were different than that of the known-good copies.  Thinking that the machine had been compromised, I performed on a fresh install of RedHat, followed by CPanel on completely new hardware and an empty hard drive.  Afterwards I migrated all of the configuration settings and user information to the new drive, and placed the server back online.  Yesterday I spent the entire day examining the old drive, looking for any signs of a backdoor (find SUID files, world writeables, check crontabs, system init scripts) - and came up completely empty.  Thinking I was out of the woods, I double checked that the new server was all patched up (it was), and removed the old drive for good.  Late yesterday evening I checked on the server one more time - only to discover that the binaries in the /bin/ directory are doing it again!  Some of the binaries are different this time - sed and ls work fine, grep, umount, and awk are all broken, but the symptoms are the same - seg faults, and broken system scripts.  This time, I have been able to 'scp' known-good copies of the system binaries for short periods of time, but after some random interval they eventually break again.  I'm nervous because I feel like I have exhausted every possible option with no success, including a last resort reinstall.  Any suggestions or help would be greatly appreciated.

Thanks!
0
Comment
Question by:astanley218
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 17

Expert Comment

by:owensleftfoot
ID: 11958487
I could be wrong and often am but I would guess whatever patching you are doing isnt up to date and you are getting script kiddies hitting your box as soon as it accesses the internet. Either that or your harddrive is on the way out. My first concern would be why you are using RH 7.3 - its ancient. I would recommend an upgrade/reinstall of fedora. Jim will recommend paying for RHE :)
0
 
LVL 40

Expert Comment

by:jlevie
ID: 11958597
I won't discount the possibility that the system has been cracked, but most of those binaries aren't generally targets of a cracker. It's the security related objects they usually change (kernel, login, etc).

What I'd try at this point is to boot up in rescue mode from a 7.3 or later CD and see what an 'rpm --root /mnt/sysimage --verify -a' shows. By running the OS and rpm binary from the CD any corrupted parts of the system are out of the picture (including the affects of a root kit) and the results of the verify should be trustworthy. If in fact the verify shows that files have inexplicably changed and there's no evidence of a break in (check with one of the tools from http://www.cerias.purdue.edu/homes/carrier/forensics/ and/or http://www.porcupine.org/forensics/tct.html) a hardware problem becomes a possibility.
0
 

Author Comment

by:astanley218
ID: 11958985
owensleftfoot - We've stuck with RH 7.3 because CPanel does not recommend upgrading a production server.  In the past it has relied on Fedora Legacy updates to remain updated, but they are no longer active.  Perhaps I am being compromised very quickly, but the facts don't add up for me.  Script kiddies very often leave large trails and that just isn't the case here.  There are no odd ports open and as jlevie mentioned, the popular trojans are not present either - how would a script kiddie be able to get back in without script kiddie trojans (other than the original hole)?  Maybe there is someone more experienced lurking, but if he/she can properly hide all of the other work - why leave gzip,sed,etc. broken?

jlevie - I will try your suggestion tonight, I tried to think of a way to do just that the night it first happened, but I'm not very good with rpm.  My personal preference is a slackware system with lots of hand compiling, but the control panels just don't want to support that.

As a side note, here is the output of 'stat' on /bin/gzip before and after whatever change is being made:

Working (known good) gzip:
File: `/bin/gzip'
  Size: 53716           Blocks: 120        IO Block: 4096   Regular File
Device: 301h/769d       Inode: 2632740     Links: 1    
Access: (0555/-r-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-09-01 17:47:35.000000000 -0400
Modify: 2004-09-01 17:46:49.000000000 -0400
Change: 2004-09-01 17:47:26.000000000 -0400


failing gzip:
root@orf-lx-4 [/bin]# gzip
Q‹D$‹\$‹L$Í€Y[ÉÐU‰åSQR‹D$‹\$‹L$‹T$ Í€ZY[ÉÍvƒ½(ÿÿÿroot@orf-lx-4 [/bin]#

  File: `/bin/gzip'
  Size: 73034           Blocks: 152        IO Block: 4096   Regular File
Device: 301h/769d       Inode: 2632740     Links: 1    
Access: (0555/-r-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-09-01 19:12:05.000000000 -0400
Modify: 2004-09-01 17:46:49.000000000 -0400
Change: 2004-09-01 17:54:21.000000000 -0400
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 40

Expert Comment

by:jlevie
ID: 11959080
> Script kiddies very often leave large trails and that just isn't the case here.  There are no odd ports open and as jlevie
> mentioned, the popular trojans are not present either.

The lack of obvious evidence doesn't necessarily mean that the server hasn't been cracked. It just means that you might not be able to see the evidence from the running system. Checking from a boot from other media could reveal the evidence.

Given the stat output it does look like something is modifying those binaries and I'll bet that you'll get all kinds of MD5 sum errors from the rpm verify. Taking a conservative approach I'd isolate the system once I've verified that the changes are real and run the forensics tests to see if it is an attack. If that comes up positive you are going to have to re-think the wisdom of running 7.3 since it would have attacked twice. If the forensics don't show an attack we need to consider a hardware issue or some misbehaving application as the cause.
0
 

Author Comment

by:astanley218
ID: 11959441
I apologize for the late inclusion, I made a huge mistake in my original post.  The current server is, in fact, RedHat 9.0 (it was a clean install, so I took the opportunity to upgrade).  The problem originally presented itself on RedHat 7.3.  I'm going to go ahead with the rpm --verify late tonight, I'll post my results.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 11960492
That makes a difference. I could see there being an un-patched security hole in 7.3 since it hasn't been supported since last Dec, but 9 dropped off Redhat support in May and has been maintained by the Fedora Legacy project since. It would be unlikely that both had a vulnerability if the 9 box was up to date (it was, wasn't it?).

That mostly leaves a config problem as a security issue, but even there I'd expect a botched root kit not to behave the same way on both OS's. So, using Occam's Razor, the real problem probably lies elsewhere and my suspicions now would be some fault in an application that runs with root privs that's been added to the system or some hardware flaw. I've run enough RedHat boxes to believe that this wouldn't be a result of some RedHat compnent.
0
 

Author Comment

by:astanley218
ID: 11962119
Well, I had horrible luck with the rpm --verify, I'm posting this update from the datacenter (where I've been since 1:00AM EST).  Many, many files were listed as having been changed in the RPM database, from all sorts of packages (iputils, net-tools, coreutils, glibc-common just to name a few).  I manually went through the list and did rpm -ivh --force installs for all of the packages that were listed.  Then I tried to restart the system - even worse than before!  More binaries were corrupted, even less of the boot scripts were being run, and essentially I couldn't get back into the machine (booting into single user mode left the root partition in read-only mode because 'mount' is segfaulting, so I can't make any more changes to the machine).  Currently, I'm installing RH 9 on a brand spanking new machine (new mobo, RAM, HD, CPU) - the motherboard this time is an entirely different manufacturer (Intel), the CPU is a socket 478 (370 before), and the RAM is DDR (SDRAM before).  I'm hoping and praying that this works out.  What is the best way to update RH 9 out of the box?  Does RedHat's up2date service still work?  I haven't had to do this since support dropped off, I'm considering using the Fedora Legacy instructions for setting up updates, would that cover my bases?
0
 
LVL 40

Accepted Solution

by:
jlevie earned 2000 total points
ID: 11962915
> I manually went through the list and did rpm -ivh --force installs for all of the packages that were listed.  Then I tried
> to restart the system - even worse than before!

That really sounds like a hardware problem. Apparently blocks aren't being written to disk correctly. Could be processor, memory, or disk controller.

> Does RedHat's up2date service still work?

I don't think so.

> I'm considering using the Fedora Legacy instructions for setting up updates, would that cover my bases?

That should be fine.
0
 

Author Comment

by:astanley218
ID: 11966969
I suspected hardware on the first night as well.  I could have sworn that I switched out the motherboard + processor, and I know I definitely ran memtest86.  I even tried a new IDE cable!  I must have missed something though, because things have been back online for over 2 hours without so much as a hitch as of now.  This is significant considering that for the last 3 days I couldn't even complete an Apache compile because the binaries were being messed up so fast.  I spent significantly more time patching up the system this time (fresh install, up2date, Fedora Legacy apt-get upgrade, and CPanel's /scripts/upcp).  I took care to setup iptables to drop all incoming packets during the patch phases also.  In all, I ended up replacing everything - HD, CPU, RAM, Power Supply, and Motherboard.  Something sure worked.  I'm confident this is over with, if not I'll make it a new question since I have so much more information.  Thanks for all of your suggestions.
0

Featured Post

Looking for the Wi-Fi vendor that's right for you?

We know how difficult it can be to evaluate Wi-Fi vendors, so we created this helpful Wi-Fi Buyer's Guide to help you find the Wi-Fi vendor that's right for your business! Download the guide and get started on our checklist today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

​Being a Managed Services Provider (MSP) has presented you  with challenges in the past— and by meeting those challenges you’ve reaped the rewards of success.  In 2014, challenges and rewards remain; but as the Internet and business environment evol…
BIND is the most widely used Name Server. A Name Server is the one that translates a site name to it's IP address. There is a new bug in BIND (https://kb.isc.org/article/AA-01272), affecting all versions of BIND 9 from BIND 9.1.0 (inclusive) thro…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question