amigan_99
asked on
Linux /var/tmp/sos directories are filling up disk on a regular basis. Help please.
I have a linux system that I use which is regularly getting filled by by these /var/tmp/sos directories.
They seem to be getting created every week and taking 10s of gigs of space. Is this a normal part of
linux operation? It's causing the disk to become critically full too often. Is there a way to automatically
delete the old sos directories?
Also - do you know if the directories systemd-private-* need to be kept and what generates them?
observium tmp]$ ls -l
total 24
-rw-r--r--. 1 root root 0 Dec 27 20:15 need_home_synced
-rw-r--r--. 1 nagios nagios 6 Dec 28 09:16 retrans_state.txt
drwx------. 2 root root 4096 Dec 28 04:13 sos.t92dy0
drwx------. 2 root root 4096 Dec 21 03:35 sos.y5KVwk
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd cc41d2be17 0a72a0dcb5 fc-httpd.s ervice-b7f s9F
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd cc41d2be17 0a72a0dcb5 fc-mariadb .service-H PqubD
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd cc41d2be17 0a72a0dcb5 fc-ntpd.se rvice-A5CY zA
Linux observium.internal.acme.co m 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
They seem to be getting created every week and taking 10s of gigs of space. Is this a normal part of
linux operation? It's causing the disk to become critically full too often. Is there a way to automatically
delete the old sos directories?
Also - do you know if the directories systemd-private-* need to be kept and what generates them?
observium tmp]$ ls -l
total 24
-rw-r--r--. 1 root root 0 Dec 27 20:15 need_home_synced
-rw-r--r--. 1 nagios nagios 6 Dec 28 09:16 retrans_state.txt
drwx------. 2 root root 4096 Dec 28 04:13 sos.t92dy0
drwx------. 2 root root 4096 Dec 21 03:35 sos.y5KVwk
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd
drwx------. 3 root root 4096 May 17 2018 systemd-private-04196687cd
Linux observium.internal.acme.co
Strange....
sos report is meant to pass info to CentOS or RedHat when you have trouble with your system.
And they request a report of the system from you.
It is meant to give those organisations the logging needed to investigate problems.
Those reports are (AFAIK) not meant to be made on a regular basis.
Here is more info: https://access.redhat.com/solutions/3592
So i guess you may need to check if some cron job creates them automatically.
OTOH they may get triggered because some error happens. In that case you need to resolve the error.
You can check the report contents yourself if needed:
Here is a description how/what/where: https://www.ostechnix.com/sosreport-a-tool-to-collect-system-logs-and-diagnostic-information/
sos report is meant to pass info to CentOS or RedHat when you have trouble with your system.
And they request a report of the system from you.
It is meant to give those organisations the logging needed to investigate problems.
Those reports are (AFAIK) not meant to be made on a regular basis.
Here is more info: https://access.redhat.com/solutions/3592
So i guess you may need to check if some cron job creates them automatically.
OTOH they may get triggered because some error happens. In that case you need to resolve the error.
You can check the report contents yourself if needed:
Here is a description how/what/where: https://www.ostechnix.com/sosreport-a-tool-to-collect-system-logs-and-diagnostic-information/
I'm with noci. This is very strange.
Try providing the following.
These will show oddball linkages + binds + ghost files.
Try providing the following.
/bin/ls -d /tmp
/bin/ls -d /var/tmp
mount
df
find /var/tmp -type f -ls
lsof 2>/dev/null | grep "(deleted)"
These will show oddball linkages + binds + ghost files.
ASKER
I ran sudo crontab -l | grep sos - and sudo ps -ef | grep sos - but neither is showing sosreport. Any other thought where else I might spot the how these are getting spawned?
ASKER
/bin/ls -d /tmp
/tmp
observium tmp]$ /bin/ls -d /var/tmp
/var/tmp
observium tmp]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,re latime,sec label)
proc on /proc type proc (rw,nosuid,nodev,noexec,re latime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1 6255652k,n r_inodes=4 063913,mod e=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,re latime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,re latime,sec label)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime ,seclabel, gid=5,mode =620,ptmxm ode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel, mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,se clabel,mod e=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,re latime,xat tr,release _agent=/us r/lib/syst emd/system d-cgroups- agent,name =systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,re latime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,re latime,cpu set)
cgroup on /sys/fs/cgroup/net_cls,net _prio type cgroup (rw,nosuid,nodev,noexec,re latime,net _prio,net_ cls)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,re latime,mem ory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,re latime,pid s)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,re latime,fre ezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,re latime,dev ices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,re latime,cpu acct,cpu)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,re latime,per f_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,re latime,hug etlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,re latime,blk io)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda4 on / type ext4 (rw,relatime,seclabel,data =ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=32,pgrp=1, timeout=30 0,minproto =5,maxprot o=5,direct )
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/sda2 on /boot type ext4 (rw,relatime,seclabel,data =ordered)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
observium tmp]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 864G 642G 179G 79% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 914M 15G 6% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda2 190M 77M 99M 44% /boot
observium tmp]$ find /var/type -type f -ls
find: ‘/var/type’: No such file or directory
lsof 2>/dev/null | grep "(deleted)"
{no output}
/tmp
observium tmp]$ /bin/ls -d /var/tmp
/var/tmp
observium tmp]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,re
proc on /proc type proc (rw,nosuid,nodev,noexec,re
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,re
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,re
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,se
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,re
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/net_cls,net
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/cpu,cpuacct
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,re
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,re
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda4 on / type ext4 (rw,relatime,seclabel,data
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=32,pgrp=1,
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/sda2 on /boot type ext4 (rw,relatime,seclabel,data
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
observium tmp]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 864G 642G 179G 79% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 914M 15G 6% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda2 190M 77M 99M 44% /boot
observium tmp]$ find /var/type -type f -ls
find: ‘/var/type’: No such file or directory
lsof 2>/dev/null | grep "(deleted)"
{no output}
ASKER
Perhaps I should run a script that deletes any sos* every week. ? Can I do that with a non-root user account that requires sudo to delete those directories manually?
You named it observium, so I assume that is what it is running. Is it the community version or a paid version?
Did you try their support channels? They might have more experience with what is happening.
Did you try their support channels? They might have more experience with what is happening.
ASKER
That's a good thought. I didn't set it up - a former team member did and obviously my sysadmin skills are marginal.
cron also uses: /etc/cron* directories.
it can run under another account so crontab -l may not show it.
ls -lR /var/spool/cron may show other
You can get rid of the files with: rm -rf /var/tmp/sos*
The files should not get created unless there are problems that need to be reported to RedHat or CentOS support.
it can run under another account so crontab -l may not show it.
ls -lR /var/spool/cron may show other
You can get rid of the files with: rm -rf /var/tmp/sos*
The files should not get created unless there are problems that need to be reported to RedHat or CentOS support.
Rerun this command, as I made a typo, should be /var/tmp rather than /var/type which doesn't exist.
Other commands show nothing of use.
Note: Be sure to run these commands when the problem occurs.
The problem must be occurring to determine the cause, so just running the commands when everything's working won't help.
Especially, the find /var/tmp -type f -ls + the lsof of showing deleted/ghost files, are the really critical commands to run when problem occurs.
find /var/tmp -type f -ls
Other commands show nothing of use.
Note: Be sure to run these commands when the problem occurs.
The problem must be occurring to determine the cause, so just running the commands when everything's working won't help.
Especially, the find /var/tmp -type f -ls + the lsof of showing deleted/ghost files, are the really critical commands to run when problem occurs.
ASKER
observium ~]$ sudo find /var/tmp -type f -ls
[sudo] password for mememe:
33824258 676 -rw------- 1 root root 688394 Dec 28 04:01 /var/tmp/sos.t92dy0/tmpsxP 6UK
33824263 101922356 -rw------- 1 root root 104368414720 Dec 28 04:08 /var/tmp/sos.t92dy0/sosrep ort-observ ium.intern al.acme.co m-20181228 035405.tar
33816634 4 -rw------- 1 root root 955 Dec 28 04:01 /var/tmp/sos.t92dy0/tmp3G_ gs8
33816633 176 -rw------- 1 root root 178443 Dec 28 04:13 /var/tmp/sos.t92dy0/tmpKLs v8K
33824256 1568 -rw------- 1 root root 1605040 Dec 28 04:01 /var/tmp/sos.t92dy0/tmpar3 hIj
32637056 0 -rw-r--r-- 1 root root 0 Dec 28 20:45 /var/tmp/need_home_synced
32637128 4 -rw-r--r-- 1 nagios nagios 6 Dec 29 10:46 /var/tmp/retrans_state.txt
[sudo] password for mememe:
33824258 676 -rw------- 1 root root 688394 Dec 28 04:01 /var/tmp/sos.t92dy0/tmpsxP
33824263 101922356 -rw------- 1 root root 104368414720 Dec 28 04:08 /var/tmp/sos.t92dy0/sosrep
33816634 4 -rw------- 1 root root 955 Dec 28 04:01 /var/tmp/sos.t92dy0/tmp3G_
33816633 176 -rw------- 1 root root 178443 Dec 28 04:13 /var/tmp/sos.t92dy0/tmpKLs
33824256 1568 -rw------- 1 root root 1605040 Dec 28 04:01 /var/tmp/sos.t92dy0/tmpar3
32637056 0 -rw-r--r-- 1 root root 0 Dec 28 20:45 /var/tmp/need_home_synced
32637128 4 -rw-r--r-- 1 nagios nagios 6 Dec 29 10:46 /var/tmp/retrans_state.txt
ASKER
observium ~]$ sudo ls -lR /var/spool/cron
/var/spool/cron:
total 0
/var/spool/cron:
total 0
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much.
You're welcome!
The first one with /var/tmp/sos filling up is from what I believe is a program called sosreport. But it shouldn't be running automatically since it is a tool to collect reports for forwarding to a support entity. You should be able to delete older runs.
The second part is from systemd-private, which forces certain programs to have private log and temp space, usually for containerized programs.
This might give you a clue on the second... https://support.plesk.com/hc/en-us/articles/115000063849-Directories-like-tmp-systemd-private-overflow-cause-server-crash-due-to-lack-of-disk-space
The first you could create a cron job to clear files older than say a week or so in the /var/tmp/sos directory.
A caution on deleting the systemd-private directories and not just the files contained in them... https://access.redhat.com/discussions/3027351