[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 400
  • Last Modified:

Errors on an SGI machine, can't login for long...

Hey everyone,

I'm not a UNIX guru (just helping out) and our administrator is having a serious issue with one of our SGI computers. If you start the computer, there are various error messages that show up as soon as it is booted up and logged on as root. The messages shown are:

"The customization panels cannot communicate with the FileManager. Please restart the system."

(this shows up in 3 different windows)

and

"The File Alteration Monitor has stopped responding. This may cause the background and File Manager to be inaccurate.

To fix this problem, it is recommended that you save all your work, log out, and log back in."

After these messages are displayed you can work with root for about 3 minutes (if you type "id" it shows as root), however after a couple of minutes then it can't find root ("id" shows uid=0 gid=0). If you log out and then try to log on as root it will not let you log on anymore. You have to reboot the computer and then you can log on again (and the process repeats).

It's running IRIX 6.5.23m

any idea what is going on? I'll try to provide any more information you need.

thanks
guid
0
guidway
Asked:
guidway
  • 4
  • 4
2 Solutions
 
Nisus091197Commented:
Hi there,

I'n not an IRIX/SGI bod but would offer these suggestions:

 - look for a startup script or scheduled job that may be malfunctioning
 - perform a file system check (like fsck on Solaris)
 - if all attempts fail restore to a known good backup

Hope this helps,

Regards, Nisus
http://www.omnimodo.com
0
 
jonkreislerCommented:
A couple of things to check on:

There must be an id "nobody" in /etc/passwd

The file alteration monitor may have been commented out in /etc/inetd.conf
If so, you will need to do the following (as root):

       Edit /etc/inetd.conf
      Find the string "sgi_fam" and uncomment the line (remove "#" character from
                                                                               beginning of line.)
       Save changes.

       Execute the following command:

           /etc/killall -HUP inetd

        (Above command is case sensitiive)

        Log out then log back in.

If sgi_fam is NOT commented out, it may be the fam.conf file has been improperly edited.
You can edit /etc/fam.conf to look as follows (between dashed lines):

----------------------------------------------------------------------------------------

# fam.conf
#
# For more information on the configuration options below, see the
# fam(1M) man page.

#
# insecure_compatibility disables authentication. This causes
# untrusted_user to be ignored, because the UID presented by every client
# connection will be believed.
#
# The -C command-line argument overrides this option.
#
insecure_compatibility = true

#
# untrusted_user is the user which will be used for unauthenticated
# clients. If a file can't be stat'ed by this user, those clients won't be
# able to fam it. The value can be a user name or a numeric UID.
#
untrusted_user = nobody

#
# local_only makes fam ignore requests from remote clients & remote fams.
# Note that this is ignored if fam is started by inetd.
#
# The -L command-line argument overrides this option.
#
local_only = false

#
# xtab_verification makes fam check the list of exported filesystems to
# verify that requests from remote hosts fall on filesystems which are
# exported to the hosts.
#
xtab_verification = false


----------------------------------------------------------------------------------------

Once you have fam.conf set up correctly, issue this command as root:

/etc/killall -HUP inetd

Then log out and log back in

(You can always change the "true"s to "false" and vice-versa, later, as desired at your site for security purposes.)


0
 
guidwayAuthor Commented:
hey everyone,

my apologies for not getting back sooner, I've been busy with another project.

We tried everything mentioned and to no avail (except restoring backups since we may just end up wiping the system anyway). I'm trying to learn how to fix it before we go that route though.

However, I have discovered it seems that all the problems start happening when the network daemon is started. While it is started, no exports happen (exportfs fails), if we leave it off, everything appears to work fine (of course we don't have a network at that point).

I haven't been able to perform specific detailed testing on the daemon yet so that may not be correct as it might be a lower level process that just appears to be happening in the network daemon. I noticed the mail daemon reports errors also (something about "user id 'sys' not found"). I'll let you know the results on it.

if you think of anything else, please let me know.

thanks
guid
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Nisus091197Commented:
Hi,

sys is a system account and should be in the password file.  On other UNIX systems this is /etc/passwd, not sure about IRIX.
0
 
guidwayAuthor Commented:
hey Nisus,

We have checked the shadow and passwd files extensively thinking there was an error in them. If so, we can't find it. sys is in the passwd file though.

p.s. sorry for not getting back here. I still haven't had time to get back to that machine. We're in a network changeover right now so we're kinda swamped.
0
 
Nisus091197Commented:
thanks for letting me know.
0
 
guidwayAuthor Commented:
Interesting turn of events...

My coworker and myself sat down for a couple of hours on Friday and walked code through code through the network daemon that was crashing. I began checking every single service (assuming that is the correct term) that is started through this daemon and found some that should have been loading but were not (two of them were autofs, and nfs). I made sure they were activated (as they should be compared to another system we have) and restarted the computer and root now works fine and the network is completely back up. I'm not sure what exactly fixed that problem, but at least the computer is stable again. We are still getting all the error messages stated in my original problem (FAM not coming up, etc...), but they are more annoying than harmful now. We still will sit down again and try to figure out what is causing them, but for now, at least the machine is working (almost) like all the others on the network. I'll let  you know if we fix the error messages for FAM next.
0
 
guidwayAuthor Commented:
Problem solved... the FAM issue turned out to be a problem in the /etc/config/inetd.options file (somehow a value was put in there that should not have been). Once I removed the value and rebooted the computer everything worked fine. The network is stable, FAM is running stable, no error messages and our admin is leaping for joy. :)

Thanks everyone for trying to help fix the problem. Splitting points for your effort.
0
 
Nisus091197Commented:
glad to be of assistance,

Regards, Nisus.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now