We help IT Professionals succeed at work.

Amazon EC2 inittab not spawning process

atlbiker
atlbiker asked
on
I am unable to get my process to be spawned by init process using inittab so that when it dies it will be respawned.

I found this...
http://bob.mcwhirter.org/blog/2008/10/17/run-level-run/

Its a little dated, but brings up interesting points. My runlevel is 3, is there anything else EC2 has changed to be different? I am running the Amazon Linux AMI Beta.

My inittab line looks like:
h1:345:respawn:/home/jbarber/daemon-example.py start

[jbarber@ip-10-84-197-230 ~]$ ls -lart /home/jbarber/
total 76
-rwx--x--x 1 jbarber ec2-user 1109 Jun 24 18:37 daemon-example.py

thanks.
Comment
Watch Question

nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
There are several possibilities:

a) I am not 100% sure, there are init programs which limit the line length to about 40 characters, so shortening might help.
(the remainder of a line is considered a comment)
b) Also is /home/jbarber mounted when it is seeing this line for interpretation? init steps sequential through the file.
f.e. if it needs an nfs mount, you first need some networok config done, then rpc services started including nfs before it can access the command.

Besides that, runing software unconditional as root from a user directory is more or less asking for problems...
Duncan RoeSoftware Developer
CERTIFIED EXPERT

Commented:
An init-spawned job starts in a very sparse environment. PATH is very short, and there aren't many other variables. You can find out what the environment is by inserting a line like
e99:345:once:env > /tmp/init_env

Open in new window

In an interactive root session, reduce your environment down to that reported in /tmp/init_env. Does your python script still run?
Top Expert 2015

Commented:
You need to specify full command line. Init is not a shell to interpret #!

Author

Commented:
@ducan_roe nothing ever happened. I killed -HUP 1 to try to force execution, even a bounced still failed to creation of init_env.

@gheist
Updated inittab to:
h1:345:respawn:python /home/jbarber/daemon-example.py start

Killhuped and bounced... no success.

@noci
Moved the file to /tmp and updated inittab accordingly. No success.
Moved to a new directory under /opt/ and updated inittab... no change.

Thanks.

Author

Commented:
Were would the logs be written for something getting executed in the inittab?
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
The mounting issue affect all mounted subsystems not just /home.. (/tmp is mounted or not?)...

also no path are available...:

so /bin/python   or /usr/bin/python should be used...
on my system python is on the /usr file system, so it is only available after mounting of the /usr filesystem...
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
w.r.t. mounted wovlumes please check your /etc/fstab for which ones are mounted (or the output of the mount program)

Author

Commented:
@noci

[root@ip-10-84-197-230 etc]# which python
/usr/bin/python

[root@ip-10-84-197-230 etc]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvda1             8256952    865876   7307192  11% /
tmpfs                   859032        44    858988   1% /dev/shm

h1:345:respawn:/usr/bin/python /opt/portal/daemon-example.py start

still no luck...
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Ok so there are no mounting issues...   ;-)
And there is no logging either? see /var/log/
(ls -latr /var/log ) and inspect the most recently updated logfiles for thimething w.r.t. init?
dmesg might give a clue?
Is there any means to soo what is written to the console?

the whole line is 67 characters, the command alone = 52 characters.

Now make a shell script:  /sbin/run-my-d  
Containing the command line:
/usr/bin/python /opt/portal/daemon-example.py start


Put in init:
h1:345:respawn:/bin/bash /sbin/run-my-d

BTW the use of respawn presumes that the daemon DOES not stop and continues to run in that process.
(it should not daemonize itself...)

Author

Commented:
Fail. I don't think amazon treats inittab like normal. Would @duncan_roes test not have proven this by its failure to execute?
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
This commandline is shorter that 40 characters, the previous ones weren't.

what does

grep '\[init]' /var/log/* /var/log/*/*

yield?

Author

Commented:
nada...


[jbarber@ip-10-84-197-230 ~]$ grep '\[init]' /var/log/* /var/log/*/*
grep: /var/log/boot.log: Permission denied
grep: /var/log/boot.log-20110626: Permission denied
grep: /var/log/btmp: Permission denied
grep: /var/log/cron: Permission denied
grep: /var/log/cron-20110626: Permission denied
grep: /var/log/maillog: Permission denied
grep: /var/log/maillog-20110626: Permission denied
grep: /var/log/messages: Permission denied
grep: /var/log/messages-20110626: Permission denied
grep: /var/log/secure: Permission denied
grep: /var/log/secure-20110626: Permission denied
grep: /var/log/spooler: Permission denied
grep: /var/log/spooler-20110626: Permission denied
grep: /var/log/tallylog: Permission denied
grep: /var/log/yum.log: Permission denied
grep: /var/log/mail/statistics: Permission denied
[jbarber@ip-10-84-197-230 ~]$ sudo su
[root@ip-10-84-197-230 jbarber]# grep '\[init]' /var/log/* /var/log/*/*
[root@ip-10-84-197-230 jbarber]#

Author

Commented:
[root@ip-10-84-197-230 log]# pwd
/var/log
[root@ip-10-84-197-230 log]# ls -lart
total 544
drwxr-xr-x  2 ntp  ntp    4096 Jan 16 22:40 ntpstats
drwxr-xr-x  2 root root   4096 Jan 17 22:36 conman.old
drwxr-xr-x  2 root root   4096 Jan 17 22:36 conman
-rw-------  1 root root      0 Feb 24 16:33 tallylog
-rw-------  1 root root      0 Feb 24 16:33 spooler-20110626
drwxr-xr-x  2 root root   4096 Feb 24 16:34 mail
drwxr-xr-x 19 root root   4096 Feb 24 16:34 ..
-rw-------  1 root root      0 Jun 22 18:23 yum.log
-rw-r--r--  1 root root   1332 Jun 22 18:23 cloud-init.log
-rw-------  1 root root    466 Jun 24 19:44 boot.log-20110626
-rw-r--r--  1 root root  12952 Jun 24 19:45 dmesg.old
-rw-------  1 root root  68453 Jun 25 18:26 secure-20110626
-rw-------  1 root root   5865 Jun 26 03:17 maillog-20110626
-rw-------  1 root root  40674 Jun 26 03:17 cron-20110626
-rw-------  1 root root      0 Jun 26 03:17 spooler
-rw-------  1 root root 142966 Jun 26 03:17 messages-20110626
-rw-------  1 root utmp  38016 Jun 27 11:29 btmp
-rw-------  1 root root     93 Jun 27 13:29 boot.log
-rw-r--r--  1 root root  12952 Jun 27 13:29 dmesg
-rw-------  1 root root   1133 Jun 27 13:30 maillog
-rw-------  1 root root  25720 Jun 27 13:37 messages
-rw-r--r--  1 root root     52 Jun 27 15:23 run-my-d
drwxr-xr-x  6 root root   4096 Jun 27 15:23 .
-rw-------  1 root root  17140 Jun 27 16:01 cron
-rw-rw-r--  1 root utmp  80640 Jun 27 16:18 wtmp
-rw-r--r--  1 root root 146584 Jun 27 16:18 lastlog
-rw-------  1 root root   7618 Jun 27 16:19 secure
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Excuse me, i looked at a gentoo system with metalog...,

please try:
grep init: /var/log/messages

also what is in /var/log/run-my-d ?

After telinit q there should be a line "date time nodename init: Re-reading inittab"  most probably in /var/log/messages, might be in a different one.

Author

Commented:
I was in the wrong directory when i initially made run-my-d. Didn't know where i was when i made it the first time so re-made in /sbin. i deleted.

[root@ip-10-84-197-230 log]# telinit q
[root@ip-10-84-197-230 log]# grep init: /var/log/messages
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty2) main process (908) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty3) main process (910) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: serial (hvc0) main process (912) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty4) main process (913) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty5) main process (915) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty6) main process (917) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty1) main process (971) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: plymouth-shutdown main process (4249) terminated with status 1
Jun 27 13:29:19 ip-10-84-197-230 init: splash-manager main process (4244) terminated with status 1
[root@ip-10-84-197-230 log]# grep init: /var/log/messages
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty2) main process (908) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty3) main process (910) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: serial (hvc0) main process (912) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty4) main process (913) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty5) main process (915) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty6) main process (917) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: tty (/dev/tty1) main process (971) killed by TERM signal
Jun 27 13:29:19 ip-10-84-197-230 init: plymouth-shutdown main process (4249) terminated with status 1
Jun 27 13:29:19 ip-10-84-197-230 init: splash-manager main process (4244) terminated with status 1
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
(After a change of the inittab you do need a telinit Q to reload it, I assumed you allready know this up to now).

With execute failures: something along: "date time nodename init: Id "h1" respawning too fast: disabled for 5 minutes" should have been shown.

Author

Commented:
I've always been in habit of kill -HUP 1. However, after telinit Q and or reboots of the instance i still don't see any logs with 'h1'.

Author

Commented:
I'll change my habit to telinit Q :)

Author

Commented:
just got a response on AWS forum...

https://forums.aws.amazon.com/thread.jspa?threadID=70502

I'll test and then update when done.

Author

Commented:
uhmm... still not working FTL.
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Upstart works quite different.
Please don't delete question, ask it to be closed with refund or points assign to you.
As this is valuable information.
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
I presume you did create a script named
/etc/init/my-python-deamon.conf

with content like (at least):
start on (runlevel RUNLEVEL=3 or ( runlevel RUNLEVEL=4 or runlevel RUNLEVEL=5))
task
exec /usr/bin/python /opt/portal/daemon-example.py start


The task key word is depending how your service starts, most probably you don't need it here.
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
instead of task the keyword should be respawn.
(BTW, I have no upstart capable machines so I have to guess a little).
---8<---
start on (runlevel RUNLEVEL=3 or ( runlevel RUNLEVEL=4 or runlevel RUNLEVEL=5))
respawn
exec /usr/bin/python /opt/portal/daemon-example.py start
Duncan RoeSoftware Developer
CERTIFIED EXPERT

Commented:
I tried my suggested line in my inittab, and it worked fine(!)
> a   
    77 e1:3:once:env > /tmp/t5env
> s    
inittab
> q
07:57:51# telinit q
07:57:55# k /tmp/t5env 
      1 CONSOLE=/dev/console
      2 TERM=linux
      3 INIT_VERSION=sysvinit-2.86
      4 PATH=/bin:/usr/bin:/sbin:/usr/sbin
      5 RUNLEVEL=3
      6 PWD=/
      7 PREVLEVEL=N
      8 SHLVL=0
      9 HOME=/
     10 BOOT_IMAGE=v2.6.32.8
07:58:04# 

Open in new window

Commented:
Final solution seems to be using expect daemon rather then expect fork... so my final conf file for upstart looks like:


start on runlevel [3456]
stop on runlevel [016]
expect daemon
respawn
exec /opt/portal/daemon-example.py start

pre-stop script
    pidfile = /tmp/daemon.pid
    exec /opt/portal/daemon-example.py stop

    # Wait for daemon to end
    loop = 6000
    while [$loop -gt 0]; do
    # If the pidfile is found, continue waiting
        if [ -e $pidfile ]; then
            loop = $((loop-1))
            sleep 1
            continue
        fi
        break
    done
end script

Open in new window


I found a bug about upstart not handling expect fork http://stackoverflow.com/questions/6026107/ubuntu-upstart-not-respawning-the-daemon-despite-respawn-in-the-config-file. Haven't confirmed the version of upstart Amazon is using, but this seems to be the issue.

Author

Commented:
Received feed back on amazon forums.

Author

Commented:
done