Link to home
Start Free TrialLog in
Avatar of FBMECS
FBMECS

asked on

unable to log in solaris 10

Dear experts,

We have a Solaris 10 installed on a SPARC machine and every 2-3 months no-one can log in.

the message we get is "Login failure: Error in Underlying Service module"

We can fix this by booting in signle mode and rmoving the password from the shadow file.

does anyone know why this is happening?

is there any other way to fix this without booting in sigle mode?
Avatar of dfke
dfke

You can make the password entry for root user empty and login into the machine as root and check /etc/pam.conf and verify if the related PAM SPI (pam modules) are present and valid (i.e /usr/lib/security/pam*).
Does this problem happen on any connection (including console only) or only certain connections (e.g. ssh only, or restricted to some subnet)?

The system logs (/var/adm/messages*) may be pretty useful in analysis but make sure you omit/mask all sensitive piece of information.
Avatar of FBMECS

ASKER

When this occurs all ssh connections are not working without any message.
console login fails with the message as discibed in the question.

it looks like when passwords expire this issue is happening.

I have tried to disable expiration of passwords but this is happening still.
As for ssh; can you use debug on your ssh client? Depending on your ssh version it may be any of the following:

ssh -vvvv
ssh -v -v -v -v

Open in new window


(in my experience maximum of four v's is sufficient)

Also, as dfke recommended; can you please post your /etc/pam.conf so that we can take a look?
Avatar of FBMECS

ASKER

This is my pam.conf file

#
#ident  "@(#)pam.conf-winbind   1.1     07/05/15 SMI"
#
# Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# PAM configuration
#
# Unless explicitly defined, all services use the modules
# defined in the "other" section.
#
# Modules are defined with relative pathnames, i.e., they are
# relative to /usr/lib/security/$ISA. Absolute path names, as
# present in this file in previous releases are still acceptable.
#
# Authentication management
#
# login service (explicit because of pam_dial_auth)
#
login   auth requisite          pam_authtok_get.so.1
login   auth required           pam_dhkeys.so.1
login   auth required           pam_unix_cred.so.1
login   auth required           pam_unix_auth.so.1
login   auth required           pam_dial_auth.so.1
#
# rlogin service (explicit because of pam_rhost_auth)
#
###rlogin       auth sufficient         pam_rhosts_auth.so.1
###rlogin       auth requisite          pam_authtok_get.so.1
###rlogin       auth required           pam_dhkeys.so.1
###rlogin       auth required           pam_unix_cred.so.1
###rlogin       auth required           pam_unix_auth.so.1
#
# Kerberized rlogin service
#
krlogin auth required           pam_unix_cred.so.1
krlogin auth binding            pam_krb5.so.1
krlogin auth required           pam_unix_auth.so.1
#
# rsh service (explicit because of pam_rhost_auth,
# and pam_unix_auth for meaningful pam_setcred)
#
###rsh  auth sufficient         pam_rhosts_auth.so.1
rsh     auth required           pam_unix_cred.so.1
#
# Kerberized rsh service
#
krsh    auth required           pam_unix_cred.so.1
krsh    auth binding            pam_krb5.so.1
krsh    auth required           pam_unix_auth.so.1
#
# Kerberized telnet service
#
ktelnet auth required           pam_unix_cred.so.1
ktelnet auth binding            pam_krb5.so.1
ktelnet auth required           pam_unix_auth.so.1
#
# PPP service (explicit because of pam_dial_auth)
#
ppp     auth requisite          pam_authtok_get.so.1
ppp     auth required           pam_dhkeys.so.1
ppp     auth required           pam_unix_cred.so.1
ppp     auth required           pam_unix_auth.so.1
ppp     auth required           pam_dial_auth.so.1
#
# Default definitions for Authentication management
# Used when service name is not explicitly mentioned for authentication
#
other   auth requisite          pam_authtok_get.so.1
other   auth required           pam_dhkeys.so.1
other   auth required           pam_unix_cred.so.1
other   auth required           pam_unix_auth.so.1
#
# passwd command (explicit because of a different authentication module)
#
passwd  auth required           pam_passwd_auth.so.1
#
# cron service (explicit because of non-usage of pam_roles.so.1)
#
cron    account required        pam_unix_account.so.1
#
# Default definition for Account management
# Used when service name is not explicitly mentioned for account management
#
other   account requisite       pam_roles.so.1
other   account sufficient      pam_unix_account.so.1
other   account required        pam_winbind.so
#
# Default definition for Session management
# Used when service name is not explicitly mentioned for session management
#
other   session required        pam_unix_session.so.1
#
# Default definition for  Password management
# Used when service name is not explicitly mentioned for password management
#
other   password required       pam_dhkeys.so.1
other   password requisite      pam_authtok_get.so.1
other   password requisite      pam_authtok_check.so.1
other   password sufficient     pam_winbind.so
other   password required       pam_authtok_store.so.1
#
# Support for Kerberos V5 authentication and example configurations can
# be found in the pam_krb5(5) man page under the "EXAMPLES" section.
#
###krlogin      auth required           pam_krb5.so.1
###krsh auth required           pam_krb5.so.1
###ktelnet      auth required           pam_krb5.so.1



And this is the message we get in the /var/adm/message

Jan  3 09:54:49 uni-test login: [ID 468494 auth.crit] login account failure: Error in underlying service module
Jan  3 09:54:49 uni-test svc.startd[11]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
I can't see any obvious errors in your pam.conf and the syslog contains the symptom, not the reason. Any entries above this one that may be related to the start of problem, i.e. something related to pam service, not to login service?

Also, please clarify "every 2-3 months no-one can log in":
- the problem appears every 2-3 months (exact time varies or not?) and it affects everyone at the same time? or
- the problem appears for everyone every 2-3 months, but not necessarily at the same time?

I assume the former and it doesn't sound like password expiration. What if you set password expiration for user longexp to 6 months and for user shortexp to 7 days? (or even 1 day) Just curious if the expiration of shortexp's password would trigger the problem.
one more thing; try adding "debug" to the pam.conf and verify that syslog for level DEBUG is redirected somewhere (may be usual /var/adm/messages but it'll probably cause lots of contamination)

e.g. (snippet)
login   auth requisite          pam_authtok_get.so.1 debug
login   auth required           pam_dhkeys.so.1 debug
login   auth required           pam_unix_cred.so.1 debug
login   auth required           pam_unix_auth.so.1 
login   auth required           pam_dial_auth.so.1 debug

Open in new window


Note: check man pages (e.g. "man pam_authtok_get") for details. The man page for pam_unix_auth doesn't mention the "debug" option but you may give it a try.
More info on PAM debug here:
http://blog.simplex-one.com/?p=515
Avatar of FBMECS

ASKER

I have the expiration of passwords set to 9999 in the /etc/default/passwd file.

But this does not work.

Also this happens on the same day for all users every 2-3 months (have not couneted excactly)

When we reset the root password this issue is solved.
Not sure but I think /etc/default/passwd doesn't influence already existing users.
But more interestingly, do you say that if you reset the root password _only_ then the issue is solved for _all_ users?
Avatar of FBMECS

ASKER

Yes that is correct.

once i Boot on single mode and remove the root password, then reset it to somthing else the issue is resolved for all users.
So my last idea is the set of debug traces but it's hard to justify if you need to wait 2-3 months till the next occasion. Assume it's a production server but if it's not; just to rule out the influence of system date, could you change system time to 4 Apr 2014 and see if problem occurs "immediately"?

One more question: does the problem reoccur every 2-3 months even if there are reboots in-between? To be more exact:
- when was the last time the problem occurred before 3 Jan 2014?
- how many times and when was the system rebooted between these two dates?

Explanation of this question:

Another thing popped into my mind when thinking about timers; in past some commands (e.g. ps) printed some cryptic message that turned out to be an inconsistency of system timer: "Unknown HZ value! (XX) Assume YY." where XX and YY were two integers. It typically happened a number of weeks or maybe months after reboot because the server was apparently damn too fast compared to what the kernel expected. This counter can't be cheated by setting the system time, only running without reboot for several weeks/months (depending on HW).
Avatar of FBMECS

ASKER

This is production server so I cannot change the date/time.

THe server is never rebooted unless there is a major Patching needing reboot.

the last time we have rebooted the server was mid October after installing latest Solaris 10 patches.

we had no issue until the root password expired. I am trying to change the root password every 1 month to check and confirm that this is what is causing the issue.

is there a way to have the root password never expire?
ASKER CERTIFIED SOLUTION
Avatar of Surrano
Surrano
Flag of Hungary image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial