• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 452
  • Last Modified:

Server reboot when opening console

Hi all,

I had this problem on a server running Solaris8.
"who -r" command show server is in runlevel 6 (instead of 3 as I would expect normally)
Trying to find out why runlevel 6, I do a ps and grep to rc. I see that an application's kill script is hanging there for 1 week. So I suppose someone tried a reboot a week ago, but for whatever reason it did not complete. (Someone actually logged in at that time).
This was all checked via a ssh connection.
Now, I want to open a console connection to the server (via Avocent console). When I do this, at the moment the console screen pops up (just a telnet window), the server simply continues the reboot!
Unfortunately it's a production server, but that's not your problem :-).

Now I'm wondering:
could someone have started a reboot and cancelled at some point or just left it hanging at the (malfunctioning?) kill script and then,
how is it possible this reboot continues on opening a console window?

Any suggestions are appreciated (I have some explaining to do but don't realy have a clue...).

Thanks.
0
eszet
Asked:
eszet
  • 4
  • 2
  • 2
  • +3
2 Solutions
 
wesly_chenCommented:
> could someone have started a reboot and cancelled at some point or just left it hanging at the (malfunctioning?)
> kill script and then, how is it possible this reboot continues on opening a console window?
Or the shutdown process hang on trying to kill some process. So the Solaris box stay in run level 6.
While you login to the console and request a login authentication which triggers some chain processes and finally
timeout the hang "kill" process so the system continues rebooting.

Regards,

Wesly
0
 
PsiCopCommented:
Run Level 6 = Whatever the system's default Run Level is. If the System Run Level is normally Run Level 3, then booting to Run Level 6 will take it to Run Level 3. If its normally Run Level 2, then booting to Run Level 6 will take it to Run Level 2.
0
 
wesly_chenCommented:
> Run Level 6 = Whatever the system's default Run Level is
As my understanding, run level 6 in Solaris means "reboot" the system to the default run level.
So "init 6" will go to run level 0 (or kill all the processes like run level 0) first
and re-start the system then go to run level 2 or 3.

In this case, the system is hanging at killing a certain process before it goes to re-start.

Wesly
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
yuzhCommented:
init 6 = reboot

to learn more about Solaris run level:
man init

also have a look at "Booting process in Solaris":
http://www.adminschoice.com/docs/booting_process_in_solaris.htm
0
 
eszetAuthor Commented:
init 6 will go to runlevel 0 and then restart (to runlevel 3 in our case). That's correct and not the problem.

What I don't understand is how opening a console continues the reboot that was hanging at the kill script.
Wesly I don't think your answer explains this behaviour or can you clarify this a bit more?
>>>
"While you login to the console and request a login authentication which triggers some chain processes and finally
timeout the hang "kill" process so the system continues rebooting."
<<<

Thanks.
0
 
NukfrorCommented:
If a kill script is just sitting there, the problem isn't the reboot -- the problem is the kill script.  Is the kill script a system standard kill script or is it something you guys added for some application of yours ?

You need to do some research into what this kill is trying to do.  Maybe its hanging on a stale NFS mount ...

Maybe its stuck in an infinite loop.  Until the kill script return something, the system isn't going to proceed to the next script.
0
 
NukfrorCommented:
Oh wait ... sorry ... I see the spot you are in now.  You're stuck in a reboot with no command line access sounds like.

Maybe someone wrote a wrapper around the /etc/rc# scripts ?  This is really weird.
0
 
wesly_chenCommented:
> I see that an application's kill script is hanging there for 1 week
Could you provide which application?

While the system is going to reboot (run level 0 or run level 6), the shutdown process will start run the K* scripts in
/etc/rc0.d. /etc/rc0.d/K<number><daemon> will kill the processes in that script (symbolic link to /etc/init.d). The lower
number start first. Those scripts are executed sequentialy. So one of script doesn't finish will cause the whole shutdown
process hanging there.
If you have customerized kill script, or modified the scripts in /etc/init.d, then you may want to check them first.

Most of time as my experience, the NFS mountd daemon will hang because the NFS designed flaw.

However, without knowing which application you mentioned, I can't tell you more details.

Wesly
0
 
GugroCommented:
Hi,

when the system shuts down, some processes tried to log some messages on the console /dev/console.
But your console port was not connected so the console buffer for the serial port was filled up and the writing process stopped
-> The whole shutdown procedure stops

When you reconnected to the console, the console server accepts more data from the serial port, and the processes are able
to continue...
0
 
eszetAuthor Commented:
Wesly,
I'm aware of the Solaris reboot proces. I know if a kill script doesn't finish, the whole shutdown process will hang. But that's not really the problem. I don't even think there's a problem with the script, it worked fine before and also when I verified during the last reboot Monday. I rather think someone may have cancelled the shutdown process somehow, though I'm not sure how. I see some apps have been started manually after that moment, that's why we thought the server was running fine, until we noticed it was in runlevel 6, with the kill script hanging. (btw, the application is Pega from Pegasystems Inc.).

What I'm looking for is how it's possible that opening a console connection continued the reboot. I've no console logging, so unfortunately I can't see at what point it continued.

Gugro,
that's more like what I'm looking for. But I don't think it explains this, because we never have a console connected to the servers, unless needed. So according to your explanation that would cause problems all the time filling the buffer, which is not the case?

Thanks.
0
 
wesly_chenCommented:
> would cause problems all the time filling the buffer
It depends. Not all the messages will go to console (check /etc/syslog.conf for /dev/console).
Besides, the application itself can pipe the message to /dev/console by itself.
Most of time the critical error and authentication error will show up on console.
Therefore, what Gugro said is highly possible.
Since you familar with shutdown process. Changing run level is done by init process.
Even though someone cancelled the shutdown process (Ctrl-C),
the init process is still running and the reboot process still keep going.
Unless some process hanging there.
I don't use Pega (Business Process Manager) so I don't know it.
At this point, I can't tell you anymore.

Wesly
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 2
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now