Link to home
Start Free TrialLog in
Avatar of xberry
xberryFlag for Germany

asked on

Mandrake 10.0 gui functions so much slower than on other 'nix systems on same box

Any graphical login or X-Windows on my Mandrake 10.0 are performing so slow (loading) compared to any other Systems on same computer hardware. Other Systems are
a Suse 9.0, an 'old' Redhat 7.3, a Free BSD 5.2 (seemingly the fastest).
The only significant difference I reckognize in the kernel 2.6 series of Mandrake, while
SUSE or Redhat running on 2.4. Anyway, it doesn't sound logical, since kernels are developed to be an improvement, no ?
So, where is the crucial point in this, which will allow me to get Mandrake up to optimum
performance ?
Avatar of gheist
gheist
Flag of Belgium image

Must be a problem with name resolution.
Not sure about Mandrake solution,
but adding last name server 127.0.0.1
or resolving against /etc/hosts first usually solves bad slowness with X-Windows, terminal logins, ftp etc..
Avatar of jlevie
jlevie

A lack of memory can cause this. How much memory is installed in the system?
.... And are you starting the same type of session on all?

Note that Mdk10 will do things a bit like windoze... While it is still busily starting up services in the bg, it also starts the display manager service... So you could be seeing the effect of starting your session while it's doing ... somewhat much else... too. If you wait a bit it might be better... Or is it comparably slower regarless of when the session restart is done (during a system bootup, or after quiting and "loging back in")?

Also check the number of applets and other stuff you start...

If you've installed something light (non-kde/gnome) that should be fairly fast... IceWM, *box .... etc.

-- Glenn
For example ratpoison, which starts immediatly even on overloaded system.
Avatar of xberry

ASKER

Hmm, . . . did compare some of the functions in isolated manner & found this:

Some of the weightier programs such as Open Office Suite or Mozilla take some recognizable time to load, about same speed in Mandrake as they do in SUSE. I am aware that this is tribute to memory & processor speed. I am quite easy with it though & if becoming a matter, the remedy would be clear & does not need to discussed further here :

Other things are definitely out though, if compared against SUSE:

1.) graphical login as local user: after having typed the password, the same gui just sits there for some seconds before it disappears
& gnome starts doing it's initialisation. As 'root' though the login screen disappears instantly, giving way to gnome, after the last character
had been typed in (this works for any other of my installed Systems and in case of any additional user account there)

2.) starting a kppp dial up procedure with SUSE instantly starts to built the connection, while with Mandrake, also using the kppp,
again let me wait some seconds . . .

. . .

I did add nameserver 127.0.0.1 to /etc/resolv.conf. In /etc/hosts it has already been listed, as expected.
No noticeable changes due to that . . . seems some smaller applications opening faster, but rather the
famaous placebo effect giving the impression than any technical change : )  
Nevertheless to me it feels that 'gheist's first idea is aiming into key area . . ., but Mandrake expertise required.  


 
Hm well, I'm a Mdk guru I guess:-).
I've never seen the type of problem you're describing on any Mdk though...
Again, does it always behave the same (qf above)? Is it just one regular user, or is it all? Which display manager do you use? mdkkdm perhaps?

Taking 1&2 toghether gheists hunch does look to be right... Is there any difference  in how things are set in /etc/host.conf, /etc/resolv.conf  and /etc/hosts between Mdk10 and SuSE?

I've also seen some... interresting... behaviour where I've done some nono or other lately (mixing RAM speeds between banks on a Compaq/Hp that ... didn't support it. Showed as interrmittent... slowdowns. Or for that matter the system where one install turned on a bit too optimistic dma mode, so one operated OK (with seemingly slower hdparm settings) and the other had bursts of Drive Seek errors making it really choppy going. The latter was very visible in the logs though,m or through dmesg).

-- Glenn
My solution is for delays of 15..45s or so:
hostname command returns machines name
host machines.name must return accessible IP address
host that.ip.address must return same machine name

Smaller delays are out of my reach since I prefer seemingly fastest system ;-)
Avatar of xberry

ASKER

With Mandrake I use 'gdm'.

If I compare Suse with Mandrake then I find that
/etc/host.conf has exactly the same settings in both installations.
/etc/hosts though has additional IPv6 entries in Suse.
/etc/resolv.conf then doesn't exist in SUSE at all, only the /etc/ppp/resolv.conf it has.

So far only this one from /var/log/messages caught my attention:
  Sep 2 12:34:25 localhost pppd[2300]: Couldn't set pass-filter in kernel: Invalid argument.

When trying suggested
  host <machines.names>
it returnes:
;connection timed out: no servers could be reached

???
This sounds very much like a hostname resolution problem. The reference to PPP suggests to me that that this machine isn't on a network with a full time DNS server. That means that /etc/resolv.conf should be empty or not exist and that the   hostname of the machine must be listed in /etc/hosts. For a standalone machine (no ethernet connections) /etc/sysconfig/network might contain:

HOSTNAME=this-machine.no-domain.net

and /etc/hosts would then look like:

127.0.0.1        this-machine.no-domain.net this-machine localhost.localdomain localhost

If there is a local LAN the hosts file would look like:

127.0.0.1        localhost.localdomain localhost
192.168.0.1    this-machine.no-domain.net this-machine.

It's important that the hostname of the system be a Fully Qualified Domain Name (FQDN) which for practical purposes means that it contain at least one ".".
then adjust /etc/hosts

127.0.0.1 localhost.domain.dom localhost hostname hostname. hostname.domain.dom

hostname.domain.dom is what you get from hostname command.
Avatar of xberry

ASKER

I tried both adjustments of /etc/hosts, but neither did change anything
(hostname is a FQDN)

and still getting the 'connection timed out: no servers could be reached'...

Seems to be really difficult . . . : (
You are getting "no servers could be reached" from a 'host' command, right? That will happen if /etc/resolv.conf mentions a name server that doesn't exist or can't be reached. I think you said earlier that there's a "nameserver 127.0.0.1" entry in resolv.conf and that should not be there unless you are running a properly configued named on the local system.

Even when resolv.conf contains the IP of a valid name server you'll get that error from the host command if said name server is inaccessible, like when a dial-up machine isn't on-line and the name server IP's are those of your ISP.

Could we see the contents of /etc/hosts, /etc/resolv.conf, and /etc/sysconfig/network?
Dear jlevie - please read manual page of your resolver library before opening mouth.
gheist,

What part of what I said do you take exception to?
....unless you are running a properly configued named on the local .....
I don't see a problem with that in this context. The OP has mentioned that his Internet access is via dial-up and thus working DNS servers would only be available when the PPP link was up. If he had "nameserver 127.0.0.1" in resolv.conf and executed a host command when not on-line he'd then get that error if there wasn't a running named on the local system. And if that was in resolv.conf first it would cause things to be slow whether on-line or not since the query to the non-existant name server would have to time out before the resolver would try the next name server.
Since localhost returns icmp unreachable immediatly .....
Tempers eh? Cool it guys, we're trying to help Claus (or was it Klaus.... Hm, we've had that conversation before right xberry:-), not being "absolutely right", right?

Now it seems ghest was right on the money concerning this being a DNS problem... Since you've gone with gdm as dm, one might assume you start Gnome too? That could explain the ... lag (gnome runs it's own name resolution service thingy that might well be affected). If you just "remove" /etc/resolv.conf (mv /etc/resolv.conf /etc/resolv.conf.org), does it still behave the same?

If you switch to mdkkdm or kdm instead of gdm, does that make a difference (this on the off chance it's _not_ DNS related)? You could just edit /etc/sysconfig/desktop and change DISPLAYMANAGER ... KDE == mdkkdm and KDM == kdm ...

.... And that "pass-filter ... Invalid argument" thing bugs me...

-- Glenn
Avatar of xberry

ASKER

> Since you've gone with gdm as dm, one might assume you start Gnome too?

of course I do, mandrake desktop is built on Gnome & mainly Gnome specific applications are used, exept the kppp & some kde games for the kids : ). I mentioned above.

I had no trouble with Redhat & Gnome,
SUSe & KDE seem to be married anyway, but doing ok, so
what's wrong with Mandrake & GNOME ?
 


> mandrake desktop is built on Gnome & mainly Gnome specific applications are used
No.
Or yes, depending on how you look at it. The drakx toolset is mainly gnome-based yes.
Galaxy and the menusystem etc etc etc is not gnome-only... It works butifully in kde and ... well actually most of the available sessions. So using KDE is as natural as using Gnome.
When you ran through drakfirsttime (the drakfw command... You can run it again by deleting $HOME/.drakfw and logout/login;-) you might have noticed that KDE was the topmost choice... If you had it installed:-)

> what's wrong with Mandrake & GNOME ?
Probably the DNS thing...

-- Glenn
I agree with Glenn & gheist. Your problem is most likely a result of a problem with reverse lookups being done against a DNS that isn't there or isn't working.

Could we see the contents of /etc/hosts, /etc/resolv.conf, & /etc/sysconfig/network, please?

gheist,

>  Since localhost returns icmp unreachable immediatly .....

That would only happen if the localhost interface was disabled. On a normal system the lo interface will be up and the resolver will attempt to open a connection to named, which will have to timeout under the conditions I specified. This is easy enough to prove. List the localhost first in resolv.conf on a system with no running named and try a query.
%nslookup
...

> server 127.0.0.1
(waits a bit)
Default Server:  [127.0.0.1]
Address:  127.0.0.1

https://www.experts-exchange.com
(fails immediatly)
Server:  [127.0.0.1]
Address:  127.0.0.1

*** [127.0.0.1] can't find https://www.experts-exchange.com: No response from server
>

wilowisp> cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon Update 3)
wilowisp> ps -ef | grep named | grep -v grep
wilowisp> cat /etc/resolv.conf
; generated by /sbin/dhclient-script
search entrophy-free.net
nameserver 127.0.0.1
;nameserver 10.1.0.254
wilowisp> host https://www.experts-exchange.com
;; connection timed out; no servers could be reached
wilowisp>

And doing the equivalent of what you did:

wilowisp> host https://www.experts-exchange.com 127.0.0.1
;; connection timed out; no servers could be reached

The response depends on what tool is used, nslookup (which is depreciated) or host..But in all of the cases (yours and mine) the ultimate response is "connection timed out; no servers could be reached".
No gheist, testing your case I see that Jim is actually right... Might it be that you actually have a "caching only" named running, so the speed is due to you _getting_ an answer? Or some other "feature" set? 'Cause this is what I get with your test (no named running local):
# nslookup
> server 127.0.0.1
Default server: 127.0.0.1
Address: 127.0.0.1#53
https://www.experts-exchange.com
;; connection timed out; no servers could be reached     #<----- Long pause here
>
# cat /etc/mandrake-release
Mandrake Linux release 10.0 (Official) for i586
# ps -ef|grep named|grep -v grep
#

Anyway, we could get a real fast test of if this is the problem... Claus, could you remove bind from the order line in /etc/host.conf? Still "slow"?

-- Glenn
Hm, note Jim that he gets "No response from server", which would mean _some_ form of communication, right? ... While we get straigth "Connection timed out"... Perhaps differences in resolver versions/responses, perhaps more ... telling difference?

-- Glenn
due to fact tat interfece IP changes often, it is needed to assign hostname to localhost.
Avatar of xberry

ASKER

Lots to catch up with you guys . . . tremendeous your input & knowledge.

some minutes ago I did print out the thread, seven pages, so I can go through the data
in offscreen study mode soon - facing EE site through monitor too long seems to leave impression of "bluescreens everywhere" lately. : )

Anyway, before I go to analyse myself, the latest report:

First for jlevie (Jim) the contents of files in question (how they look like now after all suggested modifications):

/etc/hosts:
   (following first jlevie's & then gheist's idea)

  127.0.0.1 localhost.nixlab.de localhost Colin Colin. Colin.nixlab.de


/etc/resolv.conf:
   (following gheist's suggestion)

   nameserver 145.253.2.11
   nameserver 145.253.2.75
   nameserver 127.0.0.1


/etc/sysconfig/network
(following jlevie's input)

   HOSTNAME=Colin.nixlab.de
                                                                               
   #NETWORKING=yes ;(before this line was active, neither seems to effect the problem
                                  ;  though)

only to make things complete:

/etc/host.conf
   (after commenting out the 'bind' per Glenn's idea)

   order hosts,#bind
   multi on

Glenn, no change after #bind in that line.

NOTE THIS PLEASE !!!:

Today when I did boot up Mandrake 10.0 I did attentively follow the bootprocess onscreen & caught this message, which I haven't found in any of my log files . . .strange:

  WARNING: Could not add loopback device to routing table
                     - CUPS may not be configured properly

CUPS is working perfect & seems not affected, but of course this failing 'loopback
                   device' seems to be something . . .

Your comment . . . ?

Thanks
                                                                             

   


                                                                                                                                                               

 

> facing EE site through monitor too long seems to leave impression of "bluescreens everywhere" lately. : )
Hehe... try http://oldlook.experts-exchange.com/questions/21113513/Mandrake-10-0-gui-functions-so-much-slower-than-on-other-'nix-systems-on-same-box.html#12004903
Also note that in "expert mode" the "bluish text" of www.expert-exchange.com is black... Couldn't stand it otherwise.

> Glenn, no change after #bind in that line.
Ok.


>   WARNING: Could not add loopback device to routing table
>                      - CUPS may not be configured properly
Ok, how does your routing table look then?
netstat -rn
would perhaps be interresting.

-- Glenn
Avatar of xberry

ASKER

> Also note that in "expert mode" the "bluish text" of www.expert-exchange.com is black

Thanks for that, sure as hell I knew it looked different some time, but selecting the link
directly from my message box only brought me to the "blue" version, so wasn't aware of
the expert switch . . .

here what I get from "netstat -rn"
 
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
  127.0.0.0         0.0.0.0            255.0.0.0         U         0 0                 0    lo





 
Normal. Too bad:-).

-- Glenn
I've got some issues with the content of the files you posted. From what been posted I believe that this machine doesn't have a full time Internet connection and uses PPP for on-demand Internet access. Is that correct?

That being the case you should not have a resolv.conf file at all. When PPP is up it should supply the name server info like it is doing on your SuSE machine. With data in resolv.conf and the link down the resolver will have to try and timeout each of the nameservers, which in turn is going to slow things down.

Next you specify the hostname in /etc/sysconfig/network (Colin.nixlab.de), but the hosts file isn't quite correct for that hostname and should read:

127.0.0.1     Colin.nixlab.de Colin localhost.localdomain localhost

Also remove the comment from hosts.conf or you'll have problems when the machine dials up.
I'm inclined to agree... Also note (I use pppoe for this machines adsl connection):

[root@localhost root]# ls -l /etc/resolv.conf
lrwxrwxrwx  1 root root 20 sep  8 18:08 /etc/resolv.conf -> /etc/ppp/resolv.conf
[root@localhost root]# cat /etc/mandrake-release
Mandrake Linux release 10.0 (Official) for i586
[root@localhost root]#
In my case the ppp link is (of course) always active, and I have the nameserver lines provided by ppp in resolv.conf (meaning I add _nothing_ here). Was the same when I used kppp and a modem....

... And you could well just dispense with the hostname bit altogether... Noone knows it, nor use it (unless you have a LAN too?).

-- Glenn
Avatar of xberry

ASKER

> doesn't have a full time Internet connection and uses PPP for on-demand Internet
> access. Is that correct?

Yes.

I did as jlevie did lay out:

1. mv /etc/resolv.conf > /etc/obsolete.resolv.conf          
2. /etc/sysconfig/network no need to change (was already HOSTNAME=Colin.nixlab.de
3. modified /etc/hosts as described.
4. removed comment # in front of bind in /etc/host.conf

result: (all get a firm seat & tissues to wipe your tears : )

gui login as well as coming up of modem took about three times longer than before. : |

(Sorry that I don't have other news yet)


Avatar of xberry

ASKER

Again I'd like to point your attention to this fact:

Same system, the difference between login & dialin as user 'root' (Sysadmin) & user 'klaus'
is:

'root':  gui login: instantly after las character of password has been typed
           kppp connect: modem commands executed instantly.  

'klaus': gui login: delay about 10 seconds
            kppp: delay in modem connect also about 10 secs.

Ok, so this must be related to non root user accounts only (because as root everything as expected). My feelings here (sorry for not getting more professional)
is something like a nasty superfluous security filter that Mandrake shoves in for
everyone exept root.

CAn see that that might seem the case, but I can't agree with your deduction there Klaus (Only going on the 5 or so Mdk10 OE DL&PP boxes I've got:-). Or perhaps... What "security level" did you choose?

-- Glenn
With the network config now sane we don't have to worry about any DNS lookup issues. And the time delays (~10 sec) are too short for a DNS timeout. So it has to be something else.

I wonder if it might be something to do with authentication... What does /etc/sysconfig/authconfig contain?

And what does 'chkconfig --list' return?
Avatar of xberry

ASKER

/etc/sysconfig/authconfig does not exist on my mdk 10.0 system

# chkconfig --list
alsa            0:off   1:off   2:on    3:on    4:on    5:on    6:off
dm              0:off   1:off   2:off   3:off   4:off   5:on    6:off
kheader         0:off   1:off   2:on    3:on    4:off   5:on    6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
partmon         0:off   1:off   2:off   3:on    4:on    5:on    6:off
random          0:off   1:off   2:on    3:on    4:on    5:on    6:off
rawdevices      0:off   1:off   2:off   3:on    4:on    5:on    6:off
sound           0:off   1:off   2:on    3:on    4:on    5:on    6:off
keytable        0:off   1:off   2:on    3:on    4:on    5:on    6:off
syslog          0:off   1:off   2:on    3:on    4:on    5:on    6:off
oki4daemon      0:off   1:off   2:off   3:off   4:off   5:off   6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off
portmap         0:off   1:off   2:off   3:on    4:on    5:on    6:off
xfs             0:off   1:off   2:on    3:on    4:on    5:on    6:off
hotplug         0:off   1:off   2:on    3:on    4:on    5:on    6:off
nfslock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
devfsd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
spamassassin    0:off   1:off   2:on    3:on    4:on    5:on    6:off
harddrake       0:off   1:off   2:off   3:on    4:on    5:on    6:off
numlock         0:off   1:off   2:off   3:on    4:on    5:on    6:off
mtink           0:off   1:off   2:off   3:off   4:off   5:off   6:off
cups            0:off   1:off   2:on    3:on    4:on    5:on    6:off
xinetd based services:
        rsync:  off
        fam:    on
        cups-lpd:       off
Avatar of xberry

ASKER

Eh, security level is . . . 'Standard'
Offhand I don't see anything there that looks to be a problem.

If you create another user does it have the same slow start of the desktop (after the first login)?
If it's auth, then textmode logins would be affected too for the "klaus" user... Are they?

-- Glenn
Avatar of xberry

ASKER

Creating another user (I tried two) does have the same slow effect.
Actually, to illustrate the effect better: It is actually the login screen itself staying there for
some seconds, after correct password has been typed, while with root the 'curtain'
drops instantly . . .

textmode logins not affected (if calling a terminal with ctrl-alt F2 for instance),

but hold on !  after jlevies last input & his idea about authentication still sitting in my mind, I started to examine THIS:

/etc/pam.d/*

Not only because my headache got worse then, I got more & more hints that the issue
is related to PAM somehow.

1. observation:  content of Mandrake's etc/pam.d is COMPLETELY different if compared
                         against SUSE's files: Mandrake's pam holds 15 service files more than
                         SUSE needs, and

2.:                     even Syntax content of some identical files, such as "login, passwd or ppp"
                         is laid differently in mandrake against how it is in SUSE. For example
                         in Mandrakes "login", password type:
                         password   required     pam_stack.so service=system-auth,
                         while in SUSE:
                         password required       pam_pwcheck.so          nullok
                         password required       pam_unix2.so            nullok use_first_pass  use_authtok

3. then:              Mandrake's list of available modules against SUSE's in /lib/security/*
                          also is completely different. So does SUSE not employ a module
                          pam_stack.so at all (amongst others) while mandrake doesn't have
                          the module pam_unix2.so available, for instance.

Sorting this out now seems really a task, but one thing for a start would interest me:

If one of you (most likely Glenn) can please post content of /lib/security/ of his Mandrake 10.0 only to rule out the not unlikely possibility that in my version of Mandrake there isn't
something as it should be, as in proper conditions.

 



                         







 
 








Observation #1) So? exactly which files would be "unnecessary" in your view?

Observation #2) Mandrake shows to be a bit more modern, as is most (if not all) RH offsprings:-). Using the stack pam module to "lump similar settings together" in _one central config file_ has nothing to do with this. If you look at the file referenced in "pam_stack.so service=system-auth" ... that is the file /etc/pam.d/system-auth ... you'll see that the settings match/are very close to what SuSE use.
Here's the complete contents of my /etc/pam.d/system-auth:
#%PAM-1.0

auth        required      /lib/security/pam_env.so
auth        sufficient    /lib/security/pam_unix.so likeauth nullok
auth        required      /lib/security/pam_deny.so

account     required      /lib/security/pam_unix.so

password    required      /lib/security/pam_cracklib.so retry=3 minlen=2  dcredit=0  ucredit=0
password    sufficient    /lib/security/pam_unix.so nullok use_authtok md5 shadow
password    required      /lib/security/pam_deny.so

session     required      /lib/security/pam_limits.so
session     required      /lib/security/pam_unix.so
....................................................................... end .............................................


Observation #3) Probably. Have you compared pam versions?

About the content of the /lib/security ....
$ ls
pam_access.so                pam_listfile.so     pam_shells.so
pam_chroot.so                pam_localuser.so    pam_smbpass.so
pam_console.so               pam_mail.so         pam_stack.so
pam_console_apply_devfsd.so  pam_mkhomedir.so    pam_stress.so
pam_cracklib.so              pam_motd.so         pam_succeed_if.so
pam_debug.so                 pam_nologin.so      pam_tally.so
pam_deny.so                  pam_permit.so       pam_time.so
pam_env.so                   pam_pwdb.so         pam_timestamp.so
pam_filter.so                pam_radius.so       pam_unix.so
pam_ftp.so                   pam_rhosts_auth.so  pam_userdb.so
pam_group.so                 pam_rootok.so       pam_warn.so
pam_issue.so                 pam_rps.so          pam_wheel.so
pam_lastlog.so               pam_securetty.so    pam_winbind.so
pam_limits.so                pam_securid.so      pam_xauth.so

... I think this is "grasping at straws" more or less:-).

-- Glenn
Do you have the "mysterious behaviour" regardless of which session you start as "klaus"? Even "failsafe"?

-- Glenn
Maybe even for console login - press Ctrl-Alt-F1 and try to log in as klaus, then assume this is security feature
(we covered that a couple of comments away gheist... No problem with textmode login)

-- Glenn
Avatar of xberry

ASKER

Glenn, no intentions to "gnawing at straws" or let others do so. Well, at many points I thought myself: Why using up time of those helpful guys ? Practically I can wait my 10, sometimes up to 15 seconds at the login, is nothing against the time I spent online then, for instance.
This however would open up totally different question, such as "Why shall we spent time
breaking heads about system phenomens at all ?" or "what kind of question is important enough for being reported at EE ?" : ))

So, luckily we all have our own choice to it & I'm really happy that you guys haven't
given up on this one.

Back to it:

Loggin in as 'failsafe' ?  You mean 'booting up' failsafe ?  This doesn't bring me up to
runlevel 5 though (X11 & the gnome gui login ) where the "mystery" appears.
So I think we can limit the issue to "gdm login".

Thanks for allowing to compare to PAM modules. Leaving it with "looking at it"
rather than "gnawing at it" : )  I also admit we should not consider it, since nothing
obviously "out, unexpected or confusing" there.

Where shall I examine then ?  
Beinging it back to the simple, I thought, what's actually going on there is that
somewhere some part of the system is spending enourmous time searching, resolving
or checking any file or files against input of any password of anyone at the gdm login, exept root. So after authentication of root instant access is given, while with any other
users's authentication happens . . .  what ? I think if we can answer this question, then
the solution won't be too far away, agreed ?

Thanks you : )

 
 

We might be able to get a handle on whether it is an authentication issue by comparing the Gui start up time for root and the klaus user with the system operating in run level 3 (text mode login). Executing 'telinit 3' and rebooting should start the system up in text mode. Then you can compare the login time for root an klaus. Next execute 'startx' and see how the Gui start up times compare.

It might also be interesting to see if a slow startup of the Gui is influenced by being on line or not, now that the networking configuration is correct.
> Loggin in as 'failsafe' ?  You mean 'booting up' failsafe ?  This doesn't bring me up to
> runlevel 5 though (X11 & the gnome gui login ) where the "mystery" appears.
> So I think we can limit the issue to "gdm login".
Nope, I meant the failsafe X session.

-- Glenn
Avatar of xberry

ASKER

Some new results:

I did execute 'telinit 3' in a Gnome Terminal & it switched the system back to runlevel 3
instantly (no reboot necessary).
At the text login then I tried both, 'root' & 'klaus' & it gave the 'magic' difference of five
seconds delay between them:

runlevel 3 login 'root': instantly shell prompt for  'root' was available.
                     
runlevel 3 login 'klaus': five seconds until I got a shell prompt for user 'klaus'  

I also tried a 'startx' while logged in as 'root' & logged in as 'klaus' (all runlevel 3)
but gui startup as expected with no notable difference.

So, likely it's authentication, isn't it ?  


Regarding failsafe: The option 'failsafe' I don't have available at my mandrake
gdm login (to start the GNOME XSession). I know what you mean, since I've
seen it at the SUSE gdm login, but not with mandrake (I'd have to go into
gnome config files for changing this)

   
 


> So, likely it's authentication, isn't it ?
Yep, probably... or differences in what gets executed for root<>regular user.
So the delay is coupled to login... And you're not trying to do something silly like ldap for passwds... Hmmm. I presume you've looked through all the /etc/security/* files for anything...semi-obvious:-).
A "not-that-related" possibility, do you perchance use supermount for "removables"? If you umount all resources mounted with type supermount, does it behave the same at runlevel 3?
You could set -x or echo "something" early in /etc/profile etc, just to get a feel for _where_ the delay is.

> seen it at the SUSE gdm login, but not with mandrake
?
Strange, to say the least.

-- Glenn
I agree, that may well indicate an authentication problem. Unfortunately I don't have an MD 10 box around to lool at, so I don't know where the system authentication configuration is stored. On a RedHat box it would be in /etc/sysconfig/authconfig.  Do you have an /etc/nsswitch.conf, and if so what does it contain for "password"?
Avatar of xberry

ASKER

Some report between my own troublescannings:

re. /etc/security/* :  On first glance nothing but I'll have to relate them to some reliable
                                so to be absolutely sure . . .

the only supermount thingy was the floppy but login with that one umounted behaves the same as before.

Interesting your idea to put an 'echo "something" at the beginning of /etc/profile.
Result:  Login with user 'klaus' at runlevel 3:
Login: klaus
password: **********
. . . 5 seconds it stays there . . .  then:
Last login: (some minutes ago 2004) on vc/1
something
[klaus@Colin klaus]$

So /etc/profile is not connected to the delay here.

- - -

from /etc/nsswitch.conf:

     passwd:     files nisplus nis


Since the delay occurs between entering the password and the "Last login" message it seems to me that there are two possibilities:

1) The delay results from authentication

2) The delay is related to access to the klaus home dir

It would seem that we should be able to eliminate (2) as a possibility by creating a user ktest and spcifying /ktest (rather than /home/ktest) as the home dir. If there's something funny happening with /home/username that should tell us.
> If there's something funny happening with /home/username that should tell us.
Yeah. This reminds me a bit of cruddy automounters and "remote homedirs", but... since this is a standalone....Sigh,

Same goes for the "nisplus nis" thing... Sure, it could take a (short) while to get the passwd map... and perhaps not-so-short while if network was spooking somehow. But this is a standalone machine... And the nsswitch setting is a safe one, since it'll rely on the files primarily.

Since it affects both X session login and terminal login we can pretty safely say that it's something to do with authentication (not aquiring the tty)... Does your /etc/pam.d/system-auth differ from mine?

Hmmmm. Is it up2date in regards to errata/bugfixes?

-- Glenn
Avatar of xberry

ASKER

No, i tried the test, but seemingly it doesn't matter whether a user is created unter /home or under /ktest, simply the same login delay. A good idea anyway to limit our area
of search.

I'm also bent for checking with Pam again, but my /etc/pam.d/system-auth is exactly the same as the one on Glenn's machine & other files under /etc/pam.d are not in question,
since the crucial ones refer to system-auth service anyway.

Bugfixes & errata regarding pam ?  pam on my mandrake is from the pam-0.77-12mdk.rpm
which is one of the latest, if not 'the' latest. I did check for bugs & errata on Mandrakesoft
& generally all over the web, related to pam, login, authentication but wouldn't find anything that only came closest to "the" bug we hunt down here at the moment.

One thing again: I roughly did compare files in /etc/security/ to content in SUSE & old Redhat 7.3. What I didn't find in either of those other systems were the file
"fileshare.conf" & the directory /etc/security/msec. msec = mandrake security ?
or mail security? So I put into category "semiobvious" : ) Content of msec is files: security.conf, server.4, server.5.
How's that at Glenn's ?  
 
Eh, one thing which might give some hints . . . or not:

An other point where authentication is giving some noticeable delay (not so much as user login, but anyway exactly 3 secs.)
is when doing wrong password authentication for 'su'. Why ?  What is taking so much computing time there connected with authentication ? Any operation on text files, even on my 'middleslow' machine here is normally done in a nothing of a sec. Only when doing
grep functions on complete directory trees under / I start to count in secs.
 
> i tried the test, but seemingly it doesn't matter whether a user is created unter /home or under /ktest

I think that pretty conclusively indicates that it is an authentication delay. The question is why...

For grins could I see what is shown for:

o - 'ifconfig -a'
o - 'hostname'
o - 'netstat -nr'
o - 'cat /etc/hosts'
o - 'cat /etc/resolv.conf'

please? I keep wondering if this is somehow related to the system's network config.
> msec = mandrake security ?
Yup, "man msec" might give you some more info.
msec-related problems are generally only affecting "higher" security level settings (thatäs why I asked about the setting initially). Could of course be related, but... I dunno...

> How's that at Glenn's ?
Can't check until monday at the earliest... Am away (getting my brother married:-).

> when doing wrong password authentication for 'su'. Why ?
I think that's the normal "tarpiting"... To make brute force passwd cracking that way ... less feasible;-).
So probably not related.

I've spent a couple of minutes with the login manpage earlier today... There's quite a bit of things that are supposed to happen before execing the shell... But nothing that should take that amount of time... unless the machine is really decrepit:-):-).

-- Glenn
Avatar of xberry

ASKER

> For grins could I see what is shown for:

> o - 'ifconfig -a'  

eth0      Link encap:Ethernet  HWaddr 00:60:97:62:C7:89
          inet6 addr: fe80::260:97ff:fe62:c789/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:9 Base address:0x7000
 
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:610 errors:0 dropped:0 overruns:0 frame:0
          TX packets:610 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:42180 (41.1 Kb)  TX bytes:42180 (41.1 Kb)
 
sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

some explanation to eth0: there is a NIC sitting in my box (from earlier LAN connection, long before I installed the mandrake 10.0) but physically there is no
connection to any host, nor did I configure any network during installation.
 
> o - 'hostname'
> o - 'netstat -nr'
> o - 'cat /etc/hosts'

why repeating those ?  We have discussed those & network issues earlier in this thread & I haven't changed them since then. See hereinabove please : )

> o - 'cat /etc/resolv.conf'

Same for this, if you remember correctly Gns (Glenn) asked me to remove
/etc/resolv.conf to see if that would make any change. Since then I haven't had my
fingers at it.

Ahh! There is an active ethernet interface in the machine. So it is possible for the network configuration to be a cause of the delay. And yes, I'm aware that various parts of the network configuration have been discussed and supposedly corrected, but I'd still like to see exactly how the machine is currently configured. It is always possible that there's been some mis-communication and what you have configured isn't exactly what we think you should have.

The first thing I'd want to do is to disable IPV6, and I'm sorry to say that I don't know how to do that on Mandrake 10. But maybe Glenn knows. The next thing I'd do is to disable eth0, which you should be able to do with drake. I've got a suspicion that the login process is trying to do something using the "UP & RUNNING" IPV6 connection and has to time that out.
Avatar of xberry

ASKER

First, between  . . . thanks for keeping at it . . . never thought this turning out such a giant thing . . . like pulling an elephant through a keyhole . . . : )

> It is always possible that there's been some mis-communication and what you have
> configured isn't exactly what we think you should have.

Agreed, so

[klaus@Colin klaus]$ hostname
Colin.nixlab.de

[klaus@Colin klaus]$ netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
127.0.0.0         0.0.0.0            255.0.0.0          U        0 0                  0    lo

[klaus@Colin klaus]$ cat /etc/hosts
127.0.0.1       Colin.nixlab.de Colin localhost.localdomain localhost

[klaus@Colin etc]$ cat obsolete.resolv.conf   # the "deactivated" resolv.conf file
nameserver 145.253.2.11
nameserver 145.253.2.75
nameserver 127.0.0.1

BTW, the gdmlogin "delay" time isn't stable at 10 sec, today, after hastily typing the
wrong password first, I thereafter had a delay of 25 sec !  







The extra delay on a wrong password is normal. That's an artifical delay introduced to discourage password guessing, commonly referred to as a tarpit.

All of that data looks correct. Have you had a chance to try disabling the NIC?
Avatar of xberry

ASKER

Sorry, misunderstanding (my English isn't perfect :))
What I meant was this:  Wrong passwd, then correct passwd: took 25 sec to a correct login.
Also today, I did type correct passwd instantly, but delay was 25 secs.

I did scan some more basic, passwd related files, such as
/etc/passwd
/etc/shadow
/etc/group
/etc/gshadow
/etc/default/useradd
/etc/login.defs

to look for something "semiobvious" as Glenn would say : )

Made a discovery (which deosn't explain level 3 & other users delay though):

in /etc/gshadow

the user 'klaus' is completely missing !  (all other users, including the 'ktest' do exist)

Going to disable the NIC now, giving report later.

> The first thing I'd want to do is to disable IPV6, and I'm sorry to say that I don't know how to do that on Mandrake 10. But
> maybe Glenn knows.
Set
NETWORKING_IPV6="no"
in /etc/sysconfig/network, and it should go away;).

>> How's that at Glenn's ?
> Can't check until monday at the earliest... Am away (getting my brother married:-).
Obviously I'm back, brother (finally!) passed off to new wife with traditional pomp and circumstance (and a _lot_ of good food... and obligatory alcoholic beverages:-), so let's see... Same files ... fileshare.conf should contain one line "RESTRICT=yes", msed/security.conf should be empty, and server.[45] should detail acceptable things for msec level=[45] (you should eb at level 2... 4 is "Higher" and 5 is "Paranoid" in the draksec tool). You can check the msec setting in the file /etc/sysconfgi/msec ... Mine looks like
---- start
SECURE_LEVEL=2
UMASK_ROOT=022
UMASK_USER=022
TMOUT=0
---- End

> the user 'klaus' is completely missing !  (all other users, including the 'ktest' do exist)
Hm, ok and you have it set so that every new user will be in its own group? I'd recommend that you run grpck and perhaps pwck to check the integrity of relevant files there.

-- Glenn
Avatar of xberry

ASKER

Partial success:

Since I found there was no option with harddrake to diable the NIC (only configuration
available, but it wasn't configured for any network) I simply removed it physically & did
reboot the machine. At bootup it was recognized as "being removed" & thus unregistred by harddrake. Then, I had to set up my host & kppp settings fresh (don't ask me why ; ))
& after restarting my dialup connection I did notice that the connection delay had disappeared. So now modem is coming up instantly after I choose "connect". : )
Thanks for that.

The strange login delay trouble is still there though, at gdm & at level 3 : (

One thing: despite no NIC present I still get this message at boot:
localhost network: Bringing up interface eth0:  failed

ifconfig though correctly only reporting 'lo' now
Avatar of xberry

ASKER

Welcome back Glenn & thanks for posting
My previous message crossed with yours, so I haven't tried your ideas yet.
Check that you don't have an alias for eth0 in /etc/modprobe.conf (I think harddrake took care of that) and that there is no /etc/sysconfig/network-scripts/ifcfg-eth0 file (alternatively change it so that it says ONBOOT="no")... /etc/init.d/network loops over those ifcfg files to "find" network IFs to bring up.

-- Glenn
Avatar of xberry

ASKER

grpck /etc/group              
 - negativ -

grpck /etc/gshadow        #results:
   invalid group file entry
   delete line `root:::'?
 - and so on for all entries in /etc/gshadow if I answer 'no' - What to do ?

pwck /etc/passwd
 user adm: directory /var/adm does not exist
 user news: directory /var/spool/news does not exist
 user uucp: directory /var/spool/uucp does not exist
 pwck: no changes

pwck /etc/shadow
 user root: directory 7 does not exist
 user bin: directory 7 does not exist
 user daemon: directory 7 does not exist
- and so on for all entries -

msec is set to level '2' but not so sure about the acceptable things in server.(45)

All other changes made as instructed. Going to halt machine now & tell later about
changes, if any, after fresh reboot.
 
And a plain "grpck" followed by a plain "pwck" (They should "know" default places for the files)?

-- Glenn
Avatar of xberry

ASKER

Hi, back again, login delay still existing but thankfully instant ppp dial up constant.

"Plain" grpck & "pwck":

[root@Colin klaus]# grpck
[root@Colin klaus]# pwck
user adm: directory /var/adm does not exist
user news: directory /var/spool/news does not exist
user uucp: directory /var/spool/uucp does not exist
pwck: no changes
Looks normal.
What's the pw entry for "klaus"?

-- Glenn
Avatar of xberry

ASKER

Hi, when I logged in today it let me wait 30 secs at the gui login,
then I thought I could check /var/log/messages again & found this:


Sep 23 11:11:54 Colin kernel: atkbd.c: Unknown key released (translated set 2, code 0x7a on isa0060/serio0).
Sep 23 11:11:54 Colin kernel: atkbd.c: This is an XFree86 bug. It shouldn't access hardware directly.
Sep 23 11:11:54 Colin kernel: atkbd.c: Unknown key released (translated set 2, code 0x7a on isa0060/serio0).
Sep 23 11:11:54 Colin kernel: atkbd.c: This is an XFree86 bug. It shouldn't access hardware directly.
Sep 23 11:11:58 Colin spamassassin: spamd startup succeeded
Sep 23 11:11:59 Colin numlock: Starting numlock:
Sep 23 11:11:59 Colin numlock: ^[[65G[^[[1;32m
Sep 23 11:11:59 Colin numlock:   OK
Sep 23 11:11:59 Colin numlock:
Sep 23 11:11:59 Colin rc: Starting numlock:  succeeded
Sep 23 11:11:59 Colin crond[1562]: (CRON) STARTUP (fork ok)
Sep 23 11:12:00 Colin crond: crond startup succeeded
Sep 23 11:12:01 Colin rc: Starting kheader:  succeeded
Sep 23 11:12:01 Colin devfsd[206]: Caught SIGHUP
Sep 23 11:12:01 Colin devfsd[206]: read config file: "/etc/devfs/conf.d//mouse.conf"
Sep 23 11:12:01 Colin devfsd[206]: read config file: "/etc/devfs/conf.d//dynamic.conf"
Sep 23 11:12:01 Colin devfsd[206]: read config file: "/etc/devfs/conf.d//modem.conf"
Sep 23 11:12:01 Colin devfsd[206]: read config file: "/etc/devfsd.conf"
Sep 23 11:12:02 Colin devfsd: Running devfsd actions:  succeeded
Sep 23 11:12:06 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part1
Sep 23 11:12:09 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part2
Sep 23 11:12:11 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part3
Sep 23 11:12:12 Colin gdm(pam_unix)[1353]: session opened for user klaus by (uid=0)
Sep 23 11:12:13 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part4
Sep 23 11:12:15 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part5
Sep 23 11:12:17 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part6
Sep 23 11:12:19 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part7
Sep 23 11:12:21 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part8
Sep 23 11:12:23 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part9
Sep 23 11:12:25 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part10
Sep 23 11:12:27 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus0/target0/lun0/part11
Sep 23 11:12:29 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part1
Sep 23 11:12:31 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part2
Sep 23 11:12:33 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part3
Sep 23 11:12:35 Colin perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part5
Sep 23 11:12:41 Colin kernel: Attached scsi generic sg0 at scsi0, channel 0, id 1, lun 0,  type 3
Sep 23 11:12:41 Colin kernel: Attached scsi generic sg1 at scsi0, channel 0, id 2, lun 0,  type 5
Sep 23 11:12:45 Colin gconfd (klaus-2007): starting (version 2.4.0.1), pid 2007 user 'klaus'
Sep 23 11:12:46 Colin gconfd (klaus-2007): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config source at position 0
Sep 23 11:12:46 Colin gconfd (klaus-2007): Resolved address "xml:readwrite:/home/klaus/.gconf" to a writable config source at position 1
Sep 23 11:12:46 Colin gconfd (klaus-2007): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only config source at position 2
Sep 23 11:12:46 Colin kernel: NET: Registered protocol family 10
Sep 23 11:12:46 Colin kernel: Disabled Privacy Extensions on device c035fa20(lo)
Sep 23 11:12:46 Colin kernel: IPv6 over IPv4 tunneling driver
Sep 23 11:12:46 Colin net.agent[2013]: add event not handled

Not sure if that Xfree86 bug listed at the beginning of the log above is relevant.
It seems to turn up every day since long. Same for that drakeupdate_fstab thing round
the login time. However sometimes it is finished before the "session opened for user 'klaus' but never before the 'attached scsi generic . . . See this one:

Jul 11 14:17:22 localhost perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part1
Jul 11 14:17:24 localhost perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part2
Jul 11 14:17:26 localhost perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part3
Jul 11 14:17:28 localhost perl: drakupdate_fstab called with --auto --add /dev/ide/host0/bus1/target0/lun0/part5
Jul 11 14:17:49 localhost gdm(pam_unix)[1507]: session opened for user klaus by (uid=0)
Jul 11 14:17:53 localhost kernel: Attached scsi generic sg0 at scsi0, channel 0, id 1, lun 0,  type 3
Jul 11 14:17:53 localhost kernel: Attached scsi generic sg1 at scsi0, channel 0, id 2, lun 0,  type 5
Jul 11 14:17:53 localhost kernel: Attached scsi generic sg2 at scsi0, channel 0, id 3, lun 0,  type 5
Jul 11 14:17:57 localhost gconfd (klaus-2177): starting (version 2.4.0.1), pid 2177 user 'klaus'

Could it be related to the scanner device ?  I often had trouble with having the scanner recognized for 'normal user' with xsane installed - I posted on this item at least two times
here on EE & from 'old' Redhat 7.x times I remember the panicking xsane warnings
when trying ro run as root (which came up instantly) while there were administrative twists & workarounds for getting it up for a normal user.
With this Mandrake 10.0 I had no trouble with the xsane being availabe & scanner
being recognized for normal user (it had been handled by harddrake at installation
time already) but never spent any thought that it could be 'a (?) scanner related issue
in the background delaying the login.

So first, your opinion to my findings &
second please, how to go at it without spoiling anything or making things worse?

 
 
I suspect that part of those syslog messages might be system startup stuff. but we should be able to figure that out. Boot up the system, then log in as root on an alternate console and run a 'tail-f /var/log/messages'. When output stops note the timestamp of the last line and switch back to the Gui login screen and log in. Then go back to the root login and see what's been logged since the "saved timestamp".
> I suspect that part of those syslog messages might be system startup stuff.
Yep, definitely.
Could of course be one reason that login "stalls" differing a bit for the GUI case (depending on how fast one is to start logging in:-). We have a decided difference there, between RH7.X and Mdk10.0 (and older Mdks), in that the display manager used to be spawned by init (prefdm from the inittab) but is now spawned as a regular system service (the "dm" initscript, controlled through chkconfig/service). The difference (as I think I've mentioned) is that the system is rather busy ... starting up.... when the login screen appears (kinda' like windoze:-), making login sluggish on slow machines.
But that perhaps could explain a few of the seconds lag you see, not 30... And don't explain why you see a lag with the plain textmode login (that isn't generally available until startup is finished).

I can tell you (although I'll not try to explain exactly WHY:-) that your message log(s) is perfectly normal, I have the exact same entries (+ a truckload of shorewall and other crap "interfoliated", so I won't bore you with a lengthy copy:-).
FWIW, that's not "it".
One note: I don't have a scanner nor any "true" SCSI card (but do have other things like sata drive needing scsi, and burner needing sg), so that (a really slow scanner init, triggered by something during session setup? Nah) could remotely affect some things. Not that likely, eh?-)

-- Glenn
Avatar of xberry

ASKER

here's what I've logged on an alternative terminal
(between the lines for root[1673]login of course:

Sep 24 11:12:28 Colin  -- root[1637]: ROOT LOGIN ON vc/1
Sep 24 11:17:05 Colin gdm(pam_unix)[1361]: session opened for user klaus by (uid=0)
Sep 24 11:17:09 Colin kernel: Attached scsi generic sg0 at scsi0, channel 0, id 1, lun 0,  type 3
Sep 24 11:17:09 Colin kernel: Attached scsi generic sg1 at scsi0, channel 0, id 2, lun 0,  type 5
Sep 24 11:17:13 Colin gconfd (klaus-2047): starting (version 2.4.0.1), pid 2047 user 'klaus'
Sep 24 11:17:13 Colin gconfd (klaus-2047): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config source at position 0
Sep 24 11:17:13 Colin gconfd (klaus-2047): Resolved address "xml:readwrite:/home/klaus/.gconf" to a writable config source at position 1
Sep 24 11:17:13 Colin gconfd (klaus-2047): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only config source at position 2
Sep 24 11:17:14 Colin kernel: NET: Registered protocol family 10
Sep 24 11:17:14 Colin kernel: Disabled Privacy Extensions on device c035fa20(lo)
Sep 24 11:17:14 Colin kernel: IPv6 over IPv4 tunneling driver
Sep 24 11:17:14 Colin net.agent[2053]: add event not handled
Sep 24 11:24:49 Colin login(pam_unix)[1637]: session closed for user root

BAD NEWS: The ppp dialup did delay again today, noticeably long (>10secs)


Glenn wanted to see the pw entry for klaus:
 
   klaus:x:501:501::/home/klaus:/bin/bash

And didn't you say that there wasn't a line in /etc/shadow for klaus? The line from passwd for klaus would imply that there'd have to be a line in /etc/shadow containing the encrypted password for you to be able to authenticate.
gshadow Jim... Easy enough to add manually:-).

-- Glenn
Hmm, not so far as I know. Yes, one can set a password in group or gshadow, but as far as I can tell that's only used by the newgrp command and can't be used by a login process for authentication. And a quick test on RHEL shows that it doesn't work for login authentication.
Avatar of xberry

ASKER

Regarding passwords & login I just did notice something, that may help to limit
the area of our search:

When I lock screen & want to log in again, I also have to give username & password.
So with that login there is absolutely no delay.
If typing a wrong password it is giving that 'tarpid' delay, introduced
by "Checking . . . " & finished exactly after those ominous five seconds with ". . . Sorry !"
 
So couldn't it be that at the gdm or level 3 login exactly the same "tarpid motor" is turned on, regardless if password is wrong or right ?  Anyway the path for checking the password
must be a different one than the one used for the Xscreensaver lock login.

Unlocking the screen saver isn't a "login process" and doesn't have to do all of the things that a login does. So it would be faster. That does suggest that the delay isn't associated with locating the user credentials and is something else that happens during the login process.

While logged in to the GUI do you get a delay for a text login on an alternate console?
Avatar of xberry

ASKER

> While logged in to the GUI do you get a delay for a text login on an alternate console?

no !  Sure I found out about this already at beginning of this QT. However, since some things have changed while this question going on, it is a good idea to check again. Ok, looks as if the delay is limited to logins at runlevel 3 & gdm. Sum up:

login at runlevel 3:                                  delay
login at gdm gui:                                     delay
text login on an alternate console
when logged in to the gui:                       no delay
I hate to have to keep picking at you with things to try, but some of them don't immediately occur to me, sorry...

I'd like for you to log in to the Gui, then log back out and try logging in to an alternate text console. A second test is while still logged in to the alternate console to switch back to the login console and log in to the Gui.

Right now I'm guessing that the first test will show the delay, but second won't.
'In the series "listening to your computer", we have Klaus listening to his troublesome Mandrake system today':
When you have the "delay cases", can you hear any hdd activity, or perhaps heatregulated fan ... "strain"? This might sound slightly voodoo, but... One can actually glean info from listening to noise:-)

If Jims suggestions pan out as he sees it, and there is ... activity... we could perhaps suspect something like anacron/slocate to be the "culprit".

-- Glenn
login at runlevel 3:                                  delay
---------------
then this is general pam problem, maybe one gns described - i.e. no problem as such in your system.
But if it was a general pam (i.e., authentication) issue I'd expect to see the delay for every log in. The fact that it doesn't happen when there's a second login by the same user makes me suspect that it has something to do with setting up the environment for the user (ownership of special devices, etc.). On a second simultaneous log in that would have already been done.
Avatar of xberry

ASKER

/ok, before I try the next "test series" again, some thanks for bearing with me . . .

> I hate to have to keep picking at you . . .
   one moment, i ask machine if it hurts . . . . . . no, no complaint yet : )

Maybe I should really listen more carefully as Glenn did suggest . . . thanks for giving us    a smile . . . but you're right, as far as hardware goes, listening carefully can tell you lot's
 
 
Maybe it takes some time for pam libraries to load....
I thought of that sometime earlier, but dismissed it as a possibility because root login doesn't experience a delay. So far as I know a root login would require the same pam libs to be loaded.
Avatar of xberry

ASKER

didn't time new tests yet since I had a print job & work in openoffice going on under /home/klaus. SOMETHING NOTICEABLE HAPPENED THOUGH:
while printer was busy throwing out sheets of paper, suddenly my nephew came in
& asked if the machine was free for some game playing. Me having attention to an other person for some seconds, he then unnoticed touched something that caused the whole machine to crash & reboot. Maybe my luck since I noticed following:
Before I did the gdm login, the printer queue in background turned on again, throwing out
papers. I typed my password & with the printing still going on, the delay this time seemed to be endless, was at least one minute.

Also, during bootup I still get this "Couldn't add loopback to routing table . . . & maybe something wrong with CUPS or so.

> makes me suspect that it has something to do with setting up the environment for the > user (ownership of special devices, etc.)

Yes, it 'feels' like that. : )



Avatar of xberry

ASKER

Exact guessing by Jim !  New results:

log out from gui, logging in to alternate text console:         some 6 seconds delay
while still logged in to alternate console,
back to gdm login console & loggiong in to gui:                  instant access (no delay)

Seems you can now point your finger to the cause, no ? : )
> Seems you can now point your finger to the cause, no ? : )

Maybe... From the syslog messages posted earlier it looks like the bulk of the delay (5-6 seconds) is associated with setting up things so that you can use the scanner. It would be an interesting experiment to see what would happen if there were to be nothing connected to the USB port (I'm assuming it's a USB scanner).
Note, I don't like USB (technological neaderthal that I am... Collegues have been known to mutter about "just prior to the last ice age, huh" when I go off tangent explaining some anachronistic legacy quirk or other:-):-)... So I very rarely have anything connected to the USB port(s) at bootup&subsequent login. Perhaps later, but never then. If it'd turn out to be something liek the scanner playing you for tricks, that might explain why  I've never seen it.
Would be very nice if Jim turns out to be the best guesser here:-).

-- Glenn
I'm basing that theory on the last set of syslog messages that xberry posted. If you look at the time of authentication and when it looks like the desktop starts:

Sep 24 11:17:05 Colin gdm(pam_unix)[1361]: session opened for user klaus by (uid=0)
Sep 24 11:17:09 Colin kernel: Attached scsi generic sg0 at scsi0, channel 0, id 1, lun 0,  type 3
Sep 24 11:17:09 Colin kernel: Attached scsi generic sg1 at scsi0, channel 0, id 2, lun 0,  type 5
Sep 24 11:17:13 Colin gconfd (klaus-2047): starting (version 2.4.0.1), pid 2047 user 'klaus'


you see about that about 8 seconds elapse (11:17:05 - 11:17:13).
But on the second login libraries are loaded and it becomes much quicker...
True, but remember that it never happens for root's login.
But I see approximately the same entries, and I see no perceptible lag nor diff between root/regular users. Go figure.
Still, I'm inclined to agree with your hunch about the scanner/USB,

-- Glenn
Avatar of xberry

ASKER

hehe, mentally linking my scanner to USB port is reminding me that I should get a new one soon (old one has a hard to repair mechanical problem, that effects scanning of images bigger in size than postcard) . . . i remember any time coming with one of my scanner problems then Jim is guessing it is an USB while it actually still is on the SCSI bus (scanner is from times when USB was rather in heads than on the machines.)
You see, not only Glenn is closer to prehistoric than modern times as far as some hardware is concerned, since, not unlike linuxy spirit, I prefer to add own intelligibility & go to source in case when repair needed rather than 'blindly' throwing money for updated things
: ) Not always though.

Anyway, I'm not so sure, disconnecting only physically will help here, since internal soft'weary' manifestations may still yearn for the thing missing. Anyway, let's try first.
Avatar of xberry

ASKER

No gain. First I did remove sg1 (scanner), then because still giving delay, all scsi devices
including the sg0 (Cdrom). Now the crucial part from syslog looks like this:

Sep 30 12:17:15 Colin gdm(pam_unix)[1340]: session opened for user klaus by (uid=0)
Sep 30 12:17:25 Colin gconfd (klaus-1989): starting (version 2.4.0.1), pid 1989 user 'klaus'

but still giving same delay at login.

So Scsi devices can't be blamed for it. I'll put them back.
Avatar of xberry

ASKER

Still wondering what's causing the 10 seconds elapse. (As noticed in syslog above)
Avatar of xberry

ASKER

Simply to complete feedback on Glenn's earlier 'pick' : )

NO extra noise coming from anywhere during login : )
Thanks for returning the smile Klaus. Have been (and will continue to be) very busy right now. Will try squeeze in some testing RSN though.

-- Glenn
Right, it isn't the scanner.

Just to make sure that "we're all singing the save verse" I'd like to verify that when in run level 3 that the delay occurs between the time you enter your password and when you have a shell prompt.

On the assumption that you are using a bash shell I'd like to see the contents of ~/.bash_profile & ~/.bashrc.
Avatar of xberry

ASKER

> that when in run level 3 that the delay occurs between the time you enter your password > and when you have a shell prompt.

If I understand you right here, then that we did test already:

Interesting your idea to put an 'echo "something" at the beginning of /etc/profile.
Result:  Login with user 'klaus' at runlevel 3:
Login: klaus
password: **********
. . . 5 seconds it stays there . . .  then:
Last login: (some minutes ago 2004) on vc/1
something
After that I get the shell prompt.

content of ~/.bash_profile:

 # .bash_profile
                                                                               
 # Get the aliases and functions
 if [ -f ~/.bashrc ]; then
        . ~/.bashrc
 fi
                                                                               
 # User specific environment and startup programs
                                                                               
 PATH=$PATH:$HOME/bin
                                                                               
 export PATH
 unset USERNAME


content of ~/.bashrc:

# .bashrc
                                                                               
# User specific aliases and functions
                                                                               
# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi


since we already arrived there, why not put the content of
/etc/bashrc : ) :  Voila:

# /etc/bashrc
                                                                                                                                             
# System wide functions and aliases
# Environment stuff goes in /etc/profile
                                                                                                                                             
# by default, we want this to get set.
# Even for non-interactive, non-login shells.
if [ "`id -gn`" = "`id -un`" -a `id -u` -gt 99 ]; then
        umask 002
else
        umask 022
fi
                                                                                                                                             
# are we an interactive shell?
if [ "$PS1" ]; then
    case $TERM in
        xterm*)
            PROMPT_COMMAND='echo -ne "\033]0;${USER}@${HOSTNAME}: ${PWD}\007"'
            ;;
        *)
            ;;
    esac
    [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "
                                                                                                                                             
    if [ -z "$loginsh" ]; then # We're not a login shell
        for i in /etc/profile.d/*.sh; do
            if [ -x $i ]; then
                . $i
            fi
        done
    fi
fi
                                                                                                                                             
unset loginsh

- - - end - - -

Just thought that I should get some more knowledge about shell programming into my mind, guess why ; )













I think we've established that it doesn't have anything to to with where the home dir is and from the delay between entering the password and "Last login: (some minutes ago 2004) on vc/1" it would seem that the problem is in something that login is doing. Short of running login under a debugger or inserting some diagnostic printf()s in login I'm at a loss as to what to do to figure out what it is doing.

What you might try is to login as root and then do an "ssh klaus@localhost" and see if it produces the delay. Since ssh authenticates directly and doesn't use login that might help isolate the problem.

Hmm, does the passwd file for root & klaus have the same value for the shell?

What is the line number of the klaus entry in passwd and in shadow?
Avatar of xberry

ASKER

root & klaus have same value /bin/bash for the shell

'klaus' entry in /etc/passwd:
[ line 21/25 (84%), col 1/39 (2%), char 857/1014 (84%) ]

'klaus' entry in /etc/shadow:
[ line 21/25 (84%), col 1/60 (1%), char 567/810 (70%) ]

I try to login as root & do the ssh klaus@localhost test there now.





Avatar of xberry

ASKER

as 'root'

 ssh klaus@localhost

does produce:

 ssh: connect to host localhost port 22: Connection refused

instantly.

I guess I should haved looked back up the comments... From the chkconfig list you don't appear to have ssh installed. It would still be a good test to do, but you'd have to install the openssh packages.

Avatar of xberry

ASKER

The openssh packages are installed (openssh-3.6, +client +server +askpass)
If openssh-server is installed I don't understand why sshd isn't listed in the output of 'chkconfig --list'. The sshd daemon needs to be started at boot and that normally implies an init script in /etc/rc.d/init.d. Which in turn implies that 'chkconfig --list' should show sshd as a configurable service.

Is there an sshd script in /etc/rc.d/init.d? And if so can you start the daemon (/etc/rc.d/init.d/sshd start)? And if not what does 'rpm -q --list openssh-server | grep sshd' show?
Avatar of xberry

ASKER

Sorry, have to apologize, became a bit sloppy in testing lately:

What I meant was: openssh was in system since beginning but openssh client, server, askpass I did add only two days ago. I forgot to start the ssh demon while still logged in at root (which automatically would be done at bootup) & of course with sshd not active it couldn't do anything.
Anyway, today I could successfully run ssh as suggested with this result:

  ssh klaus@localhost

--- 4 secs delay --- then

  klaus@localhost's password:   ; ******** I type password, then

--- no delay ---

  something  ; (yes this thing from /etc/profile is still in : ) then

--- no delay ---

  [klaus@Colin Klaus]$

So there is no delay after typing the password, but after typing ssh klaus@localhost, which would refer to what you'd call 'doing the direct authentication', no ?
So since authentication at login & ssh is going different ways, which file
exactly tells the system how to perform this or that way, so we'd only examing,
i.e inspecting from that point.




     

 
ssh doesn't use "login", which at least getty does... So ... look at "man login" etc.

-- Glenn (still busy)
Is the ~4 second delay between executing 'ssh klaus@localhost' and the password prompt repeatable? A delay on the first time a given user account makes an ssh connection to a given host is understandable (tho 4 seconds seems to long to me with the target being localhost). Subsequent connections from that user account to local host should have zero delay between the command and the password prompt.

A delay at that point would either be swapping activity or, more likely, a reverse lookup problem on the localhost IP. How much memory does this box have?
Avatar of xberry

ASKER

> Is the ~4 second delay between executing 'ssh klaus@localhost' and the password prompt > repeatable?

Yes, every time I make a connection from any given user account to localhost I have that delay.

Memory:

RAM:     128 MB
SWAP:    128 MB

Admitted, for heavy things on this machine a bit low, but not exactly an explanation for
a simple password authentication delay at login time, . . . ?      
Avatar of xberry

ASKER

I don't mind to approach the problem in a different way:

Since I mentioned that SUSE & REDHATwise there were no problems (delay) under same hardware conditions. So any chance I could 'dump' things in Mandrake or modify towards
a 'short & simple' login without to much fuzz about security ?  

Strange, but still have thoughts sitting in mind that want to convince me that PAM
does something that's giving 'root' & 'normal user' logins kinda different priority.
No ,but there is something in pam configuration whish makes some ?net? query or so for users but not for root.
That's along the lines of what I think is going on, namely that something is trying to trying to do some sort of network lookup for everyone except root.  I just don't know what it is or why it is taking so long. The current network config seems like it should be okay.
Avatar of xberry

ASKER

Ok, throwing everything 'unusual' into line of examination, I found this:

under /home/klaus i get a whole series of those core files:
. . .
. . .
core.2574
core.2625
core.2644
core.2664
core.2709
core.2725
core.2755
core.2795
core.2806
core.2900
core.2915
core.2977
core.3023
core.3064
core.3306
core.3339
core.3456
core.3458
core.3751
core.3810
core.7186

When examining the file core.7186 intuitively, searching for anything relevant to passwd,
I found only two refering lines in it. The one which seems to give anything interesting is
listing those files along with '/etc/passwd':

/etc/gtk-2.0/gtkrc
libnss_files_so.2
/home/klaus/.gtkrc-2.0
/etc/rpc
/etc/ethers
/etc/shadow
/etc/netgroup
/etc/publickey
/etc/aliases

The only existing file of all those mentioned is /etc/rpc

content of /etc/rpc is:

#ident  "@(#)rpc        1.11    95/07/14 SMI"   /* SVr4.0 1.2   */
#
#       rpc
#
portmapper      100000  portmap sunrpc rpcbind
rstatd          100001  rstat rup perfmeter rstat_svc
rusersd         100002  rusers
nfs             100003  nfsprog
ypserv          100004  ypprog
mountd          100005  mount showmount
ypbind          100007
walld           100008  rwall shutdown
yppasswdd       100009  yppasswd
etherstatd      100010  etherstat
rquotad         100011  rquotaprog quota rquota
sprayd          100012  spray
3270_mapper     100013
rje_mapper      100014
selection_svc   100015  selnsvc
database_svc    100016
rexd            100017  rex
alis            100018
sched           100019
llockmgr        100020
nlockmgr        100021
x25.inr         100022
statmon         100023
status          100024
bootparam       100026
ypupdated       100028  ypupdate
keyserv         100029  keyserver
sunlink_mapper  100033
tfsd            100037
nsed            100038
nsemntd         100039
showfhd         100043  showfh
ioadmd          100055  rpc.ioadmd
NETlicense      100062
sunisamd        100065
debug_svc       100066  dbsrv
ypxfrd          100069  rpc.ypxfrd
bugtraqd        100071
kerbd           100078
event           100101  na.event        # SunNet Manager
logger          100102  na.logger       # SunNet Manager
sync            100104  na.sync
hostperf        100107  na.hostperf
activity        100109  na.activity     # SunNet Manager
hostmem         100112  na.hostmem
sample          100113  na.sample
x25             100114  na.x25
ping            100115  na.ping
rpcnfs          100116  na.rpcnfs
hostif          100117  na.hostif
etherif         100118  na.etherif
iproutes        100120  na.iproutes
layers          100121  na.layers
snmp            100122  na.snmp snmp-cmc snmp-synoptics snmp-unisys snmp-utk
traffic         100123  na.traffic
nfs_acl         100227
sadmind         100232
nisd            100300  rpc.nisd
nispasswd       100303  rpc.nispasswdd
ufsd            100233  ufsd
pcnfsd          150001  pcnfs
amd             300019  amq
sgi_fam         391002  fam
bwnfsd          545580417
fypxfrd         600100069 freebsd-ypxfrd
                                                                                                                                                                                                                                                       
maybe this could be interesting in relation to your idea about the 'network lookup' for
everything except root.
Um, if something is generating a core everytime klaus logs in.... What prgm is generating them? "file core.*"...

-- Glenn
Avatar of xberry

ASKER

there have been different progs creating those core files: mostly alsaplayer, but also lprm, gimp-1.2, soffice.bin, games, . . .but for instance today no new core file has been added, so I think we can also forget about that being related to course of delay.

An other thing, wonder if it could be related somehow at all, but at every boot up I have like this (from syslog)

Oct 10 14:22:53 Colin fsck: Filesystem is NOT cleanly umounted

relation to delay doesn't seem logical at first glance, so I haven't mentioned up to now, but
the more 'likely' things didn't work out for source of it, so I throwing in everything now that even furthest does look 'semi obvious' - you have the experience then to say: Forget it ! or "Uhhuh !" : ))


 
Avatar of xberry

ASKER

What to do now ?  

It seems that jlevie did really great in focussing to possible source of problem, he made so many attempts . . . what do you recommend that I should concentrate on myself in order to bring this one to end in cooperation with you guys ?

Please be open if you think alike "this could be too much crap & best solution may be to install some other" . . . or so . . . : ))
SOLUTION
Avatar of Gns
Gns

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of xberry

ASKER

Hi aji,

i did clean install of Mandrake 10.0 where the problem occured. Everything, literally, that we tried here, didn't lead to a change of the described symptom.

Right now my intention is to do a new install, this time using a fresh, original copy of Mandrake 10.1 Official edition & see if the problem occures again.
If yes, then I'll know that problem must be distro specific, if not, then likely
there wasn't everything clean with my 10.0.

I'll let you all know about the result then, of course.
Avatar of xberry

ASKER

Three months after I started this question I finally managed to close this one. You're sure keen to await the result of the new installation attempt:

Today I did intall a fresh, original, version of Mandrake 10.1 "Official" & the problem with the delayed login
after typing the password doesn't exist any longer !!!

I did accept jlevies answer & gave it the most points, because he contributed lots, if not most in excluding possible mistakes
& finally, after there wasn't anything else to try, did confirm to my suggestion with the straight idea
"The most efficient solution is to try a re-install of a standard Mandrake workstation and see if the problems persist".
Indeed by the word "standard" he did spot the fact that I did my first Mandrake install from a set of magazine CD's, issued by a software firm which had permissions from Mandrake (to use the distro's name, label & else) but just were not issued directly by Mandrake.  
Ahaji2002 obviously ran into a similiar, if not same problem, which furthermore confirmed the idea that my set of CD's wasn't as clean & correct as I thought in first place.
Glenn (Gns) did do lots in comparing my isntallation to his & already in middle of the thread did
express some feelings of strangeness about it & indirectly did trigger first thoughts about
integrity of the version.
Well, even when underlying thoughts of honesty & integrity to the mandrake 10.0 copies which I installed first,
comparing installation help & mandrake specific requirements as supplied with the standard 10.1 official release,
I can only warn everybody (myself included !!!) to try inofficial, copied distribution installations from alien vendors, even if
the magazine is made up all professional & in glossy cover.
So, with the new Mandrake 10.1 is shipped a real
useful handbook with important steps for installation preparation & so on. I already got punished by not obeying one
essential tip from the book, but that will be part of another question topic soon.
The installation self is fine & . . . many thanks for your patience & help.