Jenkins slave terminate in few seconds

enthuguy
enthuguy used Ask the Experts™
on
Hi Jenkins and linux experts,

When I start jenkins slave node (java/linux) from jenkins master....I can see jenkins  start up script runs for a 2 seconds and terminates for some reason. Could you help how to trouble shoot by looking at the essential logs please.

Below is the process I see for 2 seconds then it terminates
jenkins   4976  4946  0 00:35 ?        00:00:00 bash -c source ~/.bash_profile cd "/opt/build/jenkins" && /usr/java/jdk1.8.0_111/bin/java -XX:PermSize=1024m -XX:MaxPermSize=2048m -Xmx2048m -jar remoting.jar -workDir /opt/build/jenkins -jar-cache /opt/build/jenkins/remoting/jarCache
jenkins   4991  4976 71 00:35 ?        00:00:04 /usr/java/jdk1.8.0_111/bin/java -XX:PermSize=1024m -XX:MaxPermSize=2048m -Xmx2048m remoting.jar -workDir /opt/build/jenkins -jar-cache /opt/build/jenkins/remoting/jarCache

Open in new window


From master jenkins > manage jenkins > node > agent log
Slave JVM has not reported exit code. Is it still running?
[11/19/19 00:35:18] Launch failed - cleaning up connection
[11/19/19 00:35:18] [SSH] Connection closed.
ERROR: Connection terminated
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

Open in new window



please help
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Saw some activities from the log during process launch and termination


==> /var/log/messages <==
Nov 19 00:56:01 ip-10-157-60-213 systemd: Removed slice User Slice of root.

Open in new window


==> /var/log/awslogs.log <==
2019-11-19 00:56:02,029 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362029, 'start_position': 534944L, 'end_position': 535181L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,029 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362029, 'start_position': 535181L, 'end_position': 535397L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,029 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362029, 'start_position': 535397L, 'end_position': 535524L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,029 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362029, 'start_position': 535524L, 'end_position': 535761L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,029 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362029, 'start_position': 535761L, 'end_position': 535961L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,030 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362030, 'start_position': 535961L, 'end_position': 536161L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:02,030 - cwlogs.push.reader - WARNING - 2271 - Thread-12 - Fall back to current time: {'timestamp': 1574085362030, 'start_position': 536161L, 'end_position': 536397L}, reason: timestamp could not be parsed from message.
2019-11-19 00:56:05,292 - cwlogs.push.publisher - INFO - 2271 - Thread-11 - Log group: audit, log stream: i-0c105ecef5ab17312, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1574085360019, 'start_position': 525508L, 'end_position': 525833L}, 'fallback_events_count': 42, 'last_event': {'timestamp': 1574085362030, 'start_position': 536161L, 'end_position': 536397L}, 'source_id': '5886209762257dd5de4fa6fbd5e7c021', 'num_of_events': 42, 'batch_size_in_bytes': 11939}
2019-11-19 00:56:06,136 - cwlogs.push.publisher - INFO - 2271 - Thread-3 - Log group: messages, log stream: i-0c105ecef5ab17312, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1574085359000, 'start_position': 890767L, 'end_position': 890845L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1574085361000, 'start_position': 891077L, 'end_position': 891153L}, 'source_id': '7e74b58114541c821496b27dd3af2d09', 'num_of_events': 5, 'batch_size_in_bytes': 511}
2019-11-19 00:56:06,515 - cwlogs.push.publisher - INFO - 2271 - Thread-5 - Log group: security, log stream: i-0c105ecef5ab17312, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1574085359000, 'start_position': 762360L, 'end_position': 762527L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1574085359000, 'start_position': 762527L, 'end_position': 762639L}, 'source_id': 'f9bcb5e041683ff2efc767ef236a1e86', 'num_of_events': 2, 'batch_size_in_bytes': 329}

Open in new window


==> /var/log/audit/audit.log <==
type=CRYPTO_KEY_USER msg=audit(1574085366.999:853): pid=8274 uid=0 auid=5001 ses=41 msg='op=destroy kind=server fp=SHA256:77:99:e9:d2:e4:fc:3b:a9:ef:58:7f:61:1d:1a:6c:88:cf:8d:2b:85:10:8e:b4:f7:ae:ae:b6:88:49:b7:f4:55 direction=? spid=8278 suid=5001  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'
type=CRYPTO_KEY_USER msg=audit(1574085366.999:854): pid=8274 uid=0 auid=5001 ses=41 msg='op=destroy kind=session fp=? direction=both spid=8278 suid=5001 rport=38174 laddr=<slave ip here> lport=22  exe="/usr/sbin/sshd" hostname=? addr=<master ip here> terminal=? res=success'

Open in new window


==> /var/log/secure <==
Nov 19 00:56:07 ip-10-157-60-213 sshd[8278]: Received disconnect from <master ip here> port 38174:11: Closed due to user request.
Nov 19 00:56:07 ip-10-157-60-213 sshd[8278]: Disconnected from <master ip here> port 38174
Nov 19 00:56:07 ip-10-157-60-213 sshd[8274]: pam_unix(sshd:session): session closed for user jenkins

Open in new window


==> /var/log/audit/audit.log <==
type=USER_END msg=audit(1574085367.004:855): pid=8274 uid=0 auid=5001 ses=41 msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct="jenkins" exe="/usr/sbin/sshd" hostname=ip-<masterip here> .ap-southeast-2.compute.internal addr=<master ip here> terminal=ssh res=success'
type=CRED_DISP msg=audit(1574085367.008:856): pid=8274 uid=0 auid=5001 ses=41 msg='op=PAM:setcred grantors=pam_tally2,pam_env,pam_listfile,pam_unix acct="jenkins" exe="/usr/sbin/sshd" hostname=ip-<masterip here> .ap-southeast-2.compute.internal addr=<master ip here> terminal=ssh res=success'
type=USER_END msg=audit(1574085367.012:857): pid=8274 uid=0 auid=5001 ses=41 msg='op=login id=5001 exe="/usr/sbin/sshd" hostname=ip-<masterip here> .ap-southeast-2.compute.internal addr=<master ip here> terminal=ssh res=success'
type=USER_LOGOUT msg=audit(1574085367.015:858): pid=8274 uid=0 auid=5001 ses=41 msg='op=login id=5001 exe="/usr/sbin/sshd" hostname=ip-<masterip here> .ap-southeast-2.compute.internal addr=<master ip here> terminal=ssh res=success'
type=CRYPTO_KEY_USER msg=audit(1574085367.015:859): pid=8274 uid=0 auid=5001 ses=41 msg='op=destroy kind=server fp=SHA256:0b:ea:35:b2:1a:43:2b:4f:4d:d9:44:34:d8:42:61:0b:29:a7:de:bc:0a:f5:7e:70:2a:ec:92:9e:27:f3:27:01 direction=? spid=8274 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'
type=CRYPTO_KEY_USER msg=audit(1574085367.015:860): pid=8274 uid=0 auid=5001 ses=41 msg='op=destroy kind=server fp=SHA256:49:db:ad:ef:6f:22:3a:33:36:55:75:8c:22:62:16:f5:01:8b:75:4d:9c:aa:c0:f8:21:dd:b8:bb:4b:a8:ab:be direction=? spid=8274 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'
type=CRYPTO_KEY_USER msg=audit(1574085367.015:861): pid=8274 uid=0 auid=5001 ses=41 msg='op=destroy kind=server fp=SHA256:77:99:e9:d2:e4:fc:3b:a9:ef:58:7f:61:1d:1a:6c:88:cf:8d:2b:85:10:8e:b4:f7:ae:ae:b6:88:49:b7:f4:55 direction=? spid=8274 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'

Open in new window

David FavorFractional CTO
Distinguished Expert 2018

Commented:
Tip: Roll up all your code into a single bash script, with this in the top of your script.

set -x

Open in new window


This will echo all script activity.

Then run the script in foreground to see if oddities stand out.

If not, then you'll have to start digging into all code running in your script, increasing verbosity of logging till something stands out to fix.
Hi,

it could well be caused by a bad java version on the slave that you need to update first so verify that too.

Cheers
Commented:
Managed with this approach

# Mounted volume to another instance as secondary volume

# Get facl from working another instance
sudo getfacl -R / > /home/ec2-user/permissions.facl

# Set facl on the mounted volume
cd <secondary mounted volume>
sudo setfacl --restore=/home/ec2-user/permissions.facl

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial