Link to home
Start Free TrialLog in
Avatar of guidway
guidwayFlag for United States of America

asked on

Errors on an SGI machine, can't login for long...

Hey everyone,

I'm not a UNIX guru (just helping out) and our administrator is having a serious issue with one of our SGI computers. If you start the computer, there are various error messages that show up as soon as it is booted up and logged on as root. The messages shown are:

"The customization panels cannot communicate with the FileManager. Please restart the system."

(this shows up in 3 different windows)

and

"The File Alteration Monitor has stopped responding. This may cause the background and File Manager to be inaccurate.

To fix this problem, it is recommended that you save all your work, log out, and log back in."

After these messages are displayed you can work with root for about 3 minutes (if you type "id" it shows as root), however after a couple of minutes then it can't find root ("id" shows uid=0 gid=0). If you log out and then try to log on as root it will not let you log on anymore. You have to reboot the computer and then you can log on again (and the process repeats).

It's running IRIX 6.5.23m

any idea what is going on? I'll try to provide any more information you need.

thanks
guid
ASKER CERTIFIED SOLUTION
Avatar of Nisus091197
Nisus091197

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of guidway

ASKER

hey everyone,

my apologies for not getting back sooner, I've been busy with another project.

We tried everything mentioned and to no avail (except restoring backups since we may just end up wiping the system anyway). I'm trying to learn how to fix it before we go that route though.

However, I have discovered it seems that all the problems start happening when the network daemon is started. While it is started, no exports happen (exportfs fails), if we leave it off, everything appears to work fine (of course we don't have a network at that point).

I haven't been able to perform specific detailed testing on the daemon yet so that may not be correct as it might be a lower level process that just appears to be happening in the network daemon. I noticed the mail daemon reports errors also (something about "user id 'sys' not found"). I'll let you know the results on it.

if you think of anything else, please let me know.

thanks
guid
Avatar of Nisus091197
Nisus091197

Hi,

sys is a system account and should be in the password file.  On other UNIX systems this is /etc/passwd, not sure about IRIX.
Avatar of guidway

ASKER

hey Nisus,

We have checked the shadow and passwd files extensively thinking there was an error in them. If so, we can't find it. sys is in the passwd file though.

p.s. sorry for not getting back here. I still haven't had time to get back to that machine. We're in a network changeover right now so we're kinda swamped.
thanks for letting me know.
Avatar of guidway

ASKER

Interesting turn of events...

My coworker and myself sat down for a couple of hours on Friday and walked code through code through the network daemon that was crashing. I began checking every single service (assuming that is the correct term) that is started through this daemon and found some that should have been loading but were not (two of them were autofs, and nfs). I made sure they were activated (as they should be compared to another system we have) and restarted the computer and root now works fine and the network is completely back up. I'm not sure what exactly fixed that problem, but at least the computer is stable again. We are still getting all the error messages stated in my original problem (FAM not coming up, etc...), but they are more annoying than harmful now. We still will sit down again and try to figure out what is causing them, but for now, at least the machine is working (almost) like all the others on the network. I'll let  you know if we fix the error messages for FAM next.
Avatar of guidway

ASKER

Problem solved... the FAM issue turned out to be a problem in the /etc/config/inetd.options file (somehow a value was put in there that should not have been). Once I removed the value and rebooted the computer everything worked fine. The network is stable, FAM is running stable, no error messages and our admin is leaping for joy. :)

Thanks everyone for trying to help fix the problem. Splitting points for your effort.
glad to be of assistance,

Regards, Nisus.