Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


directory /var/tmp in filesystem /var keeps increasing to full level in AIX

Posted on 2010-11-20
Medium Priority
Last Modified: 2012-05-10

our filesystem is /var is continuously increasing every minute, upon further investigation, i found out that, it is tmp folder within /var that is continuously increasing.

can anyone tell me the significance of /var/tmp , and why could it be constantly increasing and how to go about dealing with this? is /var corrupted or something.
/var/tmp owner is bin
and owner of the files within /var/tmp is root.
in the past 4 hours i have added 2.5GB of space to /var , due to /var/tmp reaching near 100% full level.
Question by:assistunix
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 15
  • 11
LVL 68

Expert Comment

ID: 34181064
Is it actually only /var/tmp?

By default the only files which could grow significantly in /var/tmp are the snmp-related logs snmpdv3.log, snmpmibd.log and aixmibd.log.

Which other files do you find in /var/tmp? If in doubt, please post an ls -l sample!

Quite more growth can happen in /var/adm and /var/spool! In /var/adm is the wtmp file, which can grow very big over time, because it records logins and logoffs and sometimes there are remote machines which try to login in short intervals in an automated way, maybe even with malicious intent. In var/spool are the logs of sendmail and all the print queues and their logs, which can grow along with printing activity and print job size.


LVL 68

Expert Comment

ID: 34181088
find all processes accessing /var with:

for pid in $(fuser /dev/hd9var 2>/dev/null); do ps -o pid=,ppid=,user=,args= -p $pid; done

Maybe you deleted a file whose handle is still held open by some process writing lots of data to it?
In this case you won't see any growing file, but freespace will vanish nonetheless.


Author Comment

ID: 34181103
yea it is /var/tmp only.
and it is these type of files below, that are causing /var/tmp to increase
/var/tmp is filled with these type of files and is utilizing 5GB currently

-rw-------    1 root     system      3848810 Nov 20 07:32 stm782474aaaad
-rw-------    1 root     system      3838813 Nov 20 07:28 stm1130600aaaae
-rw-------    1 root     system      3862524 Nov 20 07:27 stm335976aaaaa
-rw-------    1 root     system      3838832 Nov 20 07:19 stm782474aaaac
-rw-------    1 root     system      3848810 Nov 20 07:15 stm1130600aaaad
-rw-------    1 root     system      3838836 Nov 20 07:08 stm782474aaaab
-rw-------    1 root     system      3838832 Nov 20 07:05 stm1130600aaaac
-rw-------    1 root     system      3862523 Nov 20 06:56 stm782474aaaaa

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.


Author Comment

ID: 34181116
from that command it seems as if no processes is really accessing /var, correct me if i am wrong.

/var/tmp # for pid in $(fuser /dev/hd9var 2>/dev/null); do ps -o pid=,ppid=,user=,args= -p $pid; done
  90330       1     root /usr/lib/errdemon
 213234  282826     root /usr/sbin/rsct/bin/rmcd -a IBM.LPCommands -r
 221330  282826     root /usr/sbin/syslogd
 303278  282826     root /usr/sbin/muxatmd
 307212       1     root /usr/sbin/cron
 311460  282826     root /usr/sbin/aixmibd
 352286  475194 pconsole /usr/java5/bin/java -Xmx512m -Xms20m -Xscmx10m -Xshareclasses -Dfile.encoding=UTF-8 -Xbootclasspath/a:/pconsole/lwi/runtime/core/
 372934  282826     root /usr/sbin/nimsh -s
 463090  282826     root /usr/sbin/rsct/bin/vac8/IBM.CSMAgentRMd
 475194  417854 pconsole /bin/ksh /pconsole/lwi/bin/
 487446  282826     root /usr/sbin/rsct/bin/IBM.ServiceRMd
 585800  282826     root /usr/sbin/rsct/bin/IBM.DRMd
 913506  860330     root /usr/lpp/OV/lbin/eaagt/opcle -std
1687552 1679516     root -ksh
LVL 68

Accepted Solution

woolmilkporc earned 2000 total points
ID: 34181137
All those processes are accessing /var!

All of them are pretty standard except for the ...OV.. thing. Are you using NetView?

Anyway, stm... files are temporary work files of "sort"!
Please check with

fuser -f /var/tmp/stm782474aaaad

(choose the newest, still growing file). Which PID do you see? What gives "ps -ef | grep (PID from fuser)"?


Author Comment

ID: 34181164
this is what i get from running the following command.

/var/tmp # fuser -f /var/tmp/stm782474aaaad

Author Comment

ID: 34181181
i do not get any PID with that command. and OV , i believe is for hpopen view the monitoring tool.

Author Comment

ID: 34181182
# fuser -f /var/tmp/stm782474aaaad

Author Comment

ID: 34181204
this is just some of the sort processes that are being run.
could these sort processes by causing the increase in space, in /var/tmp ?

ps -ef | grep sort
    root  286816       1   0 18:30:05      -  0:03 sort -rn
    root  295130       1   0 12:15:04      -  0:08 sort -rn
    root  335976       1   0 07:15:04      -  0:13 sort -rn
    root  348410       1   0 11:45:06      -  0:08 sort -rn
    root  360476       1   0 08:00:05      -  0:13 sort -rn
    root  409648       1   0 08:30:05      -  0:12 sort -rn
    root  450710       1   0 22:45:04      -  0:00 sort -rn
    root  458798       1   0 13:45:06      -  0:06 sort -rn
    root  483558       1   0 10:30:05      -  0:09 sort -rn
    root  520394       1   0 18:45:05      -  0:03 sort -rn
    root  573536       1   0 10:15:04      -  0:10 sort -rn
    root  679994       1   0 14:30:03      -  0:06 sort -rn
    root  692332       1   0 17:45:04      -  0:03 sort -rn
    root  704668       1   0 11:30:06      -  0:08 sort -rn
    root  749628       1   0 18:15:05      -  0:03 sort -rn
    root  762054       1   0 11:15:06      -  0:08 sort -rn
    root  778430       1   0 20:45:05      -  0:01 sort -rn
LVL 68

Expert Comment

ID: 34181205
So this file is no longer in use. Was it actually the newest one or did you simply copy my example? Remember, I told you to use a still growing file?

Anyway, try this

while [[ -z $A ]]
 do A=$(fuser -f $(ls -rt /var/tmp/stm* | tail -1) 2>/dev/null)
   sleep 2
ps -ef |grep $A | grep -v grep

As soon as an open stm* file is found its associated process will be displayed.
LVL 68

Expert Comment

ID: 34181216
Your comments are arriving too fast, it seems!

These sort processes are responsible for the lots of stm files, I'm sure.
Where do they come from? Can you kill them?

By the way, NetView is the IBM version of HP OpenView, so I was nearly right with my guess.

Author Comment

ID: 34181238
not sure, where they are coming from?
is there a way to find that out?

ps -ef , shows the owner of sort -rn to be root and ppid is 1.

i will try killing the pid of sort -rn processes
LVL 68

Expert Comment

ID: 34181258
Since we have no valid PPID (1 is init, probably only the "adoptive father") it will be very hard to find the true origin of these processes.

Isn't there any sort process having a PPID other than 1? If so, what's this PPID's process?

Killing the sorts with PPID 1 is the right measure and will probably do no harm.

Author Comment

ID: 34181327
I killed all the sort -rn process and they all had 1 as the ppid.
after killing those processes, all the stmxxxxxx files in /var/tmp got removed automatically.

Thank you for your help :)
LVL 68

Expert Comment

ID: 34181341

but it's somewhat unsatisfying not to have found out where those processes
might have come from, don't you think?

Is there perhaps a faulty cronjob (running every 15 minutes or so) containing a sort?


Author Comment

ID: 34181352
yes i agree, it would be good to find the source of it.
i figured that some app or someone ran those processes of "sort -rn" that started to go in loop or hung or something like that.

and it seems as if the issue is back.

sort -rn processes are running again with 1 as ppid and stm806999aaaaa files are being generated again in /var/tmp.

how can we find out if there is a fault cronjob, containing a sort?
LVL 68

Expert Comment

ID: 34181365
crontab -l
as root, then check the commands resp. called scripts/programs

LVL 68

Expert Comment

ID: 34181377
I think we should continue tomorrow.

It's late at night here in my part of the world and my day should have been over a couple of hours ago.

 À bientôt!


Author Comment

ID: 34181421
ok sure thing. have a good night, thank you for your help.

Author Comment

ID: 34183845
i was able to find the parent of one of the sort -rn processes, the other sort -rn processes have 1 as ppid.
sort -rn processes keep starting after i kill them.
each time sort -rn process gets restarted , it gets a new pid, and new ppid.

# ps -ef | grep 405536
    root  405536 1196152   0 18:30:05      -  0:00 sort -rn    <<<
    root 1319156 1646674   0 18:31:40  pts/0  0:00 grep 405536
 # ps -ef | grep 1196152
    root  405536 1196152   0 18:30:05      -  0:00 sort -rn
    root  843894 1196152 120 18:30:05      -  0:16 du -xak /mnt/sapmnt
    root 1196152 1171574   0 18:30:05      -  0:00 head -20    <<<
    root 1319158 1646674   0 18:31:45  pts/0  0:00 grep 1196152
 # ps -ef | grep 1171574
    root 1024030 1646674   0 18:31:58  pts/0  0:00 grep 1171574
    root 1196152 1171574   0 18:30:05      -  0:00 head -20
    root 1171574 1675510   0 05:53:57      -  0:04 /usr/lpp/OV/lbin/eaagt/opcacta   << this is the constant ppid for everytime a new sort -rn process is created.
this pid 1171574 creates a new PID for head -20 , which creates a new PID sort -rn process.

 # ps -ef | grep 1675510
    root  753718 1646674   0 18:32:21  pts/0  0:00 grep 1675510
    root  778348 1675510   0 05:54:07      -  0:00 /usr/lpp/OV/lbin/eaagt/opcle -std
    root  786576 1675510   0 05:53:59      -  0:00 /usr/lpp/OV/lbin/conf/ovconfd
    root  847950 1675510   0 05:53:56      -  0:00 /usr/lpp/OV/bin/ovbbccb -nodaemon
    root  884870 1675510   0 05:53:57      -  0:08 /usr/lpp/OV/lbin/perf/coda
    root  921716 1675510   0 05:54:07      -  0:00 /usr/lpp/OV/lbin/eaagt/opcmsgi
    root 1028286 1675510   0 05:54:07      -  0:05 /usr/lpp/OV/lbin/eaagt/opcmona
    root 1171574 1675510   0 05:53:57      -  0:04 /usr/lpp/OV/lbin/eaagt/opcacta
    root 1392852 1675510   0 05:53:59      -  0:02 /usr/lpp/OV/lbin/eaagt/opcmsga
    root 1675510       1   0 05:53:56      -  0:08 /usr/lpp/OV/bin/ovcd   <<<

Author Comment

ID: 34183866
to clarify on what i meant, by ( root 1171574 1675510   0 05:53:57      -  0:04 /usr/lpp/OV/lbin/eaagt/opcacta ) being the constant ppid, see below example.
sort -rn and head -20 have diff pid's , but they all lead back to the same constant ppid 1171574.

# ps -ef | grep 1273900
        root 1273900 1347826   0 18:15:04      -  0:00 sort -rn
# ps -ef | grep 1347826
    root 1347826 1171574   0 18:15:04      -  0:00 head -20
# ps -ef | grep 1171574
       root 1171574 1675510   0 05:53:57      -  0:04 /usr/lpp/OV/lbin/eaagt/opcacta
# ps -ef | grep 1675510
    root 1675510       1   0 05:53:56      -  0:08 /usr/lpp/OV/bin/ovcd

Please advise on next step on how to manage this issue?

Author Comment

ID: 34183908
Another thing worth mentioning is that , i only killed sort -rn processes which had 1 as ppid
the sort -rn processes which have an actuall ppid (other than 1) start and stop automatically and get assigned new pid everytime they start again.
LVL 68

Expert Comment

ID: 34184037

all this comes from OpenView.

Let's see:

EaAgt is the Event Action Agent Application, and ovcacta is the Action Agent itself.
The parent of this all, ovcd, is the OpenView Control Daemon.

Now you've identified ovcacta as the culprit, you can try to restart it.

Use the following with caution, because I'm only familiar with NetView, and OpenView seems rather different!

Issue ovc -stop opcacta and ovc -start opcacta
Check the new status with opcagt -status

Is opcacta running? Are new "sort" hooligans coming up?

You could as well stop and start the whole Agent Subsystem and clean up its temp files inbetween.

1. opcagt -kill
2. Kill all remaining "opc..." processes, if any.
3. Remove all files under "/var/opt/OV/tmp/OpC"  
Note: Not sure if this directory exists with HPOV, if it doesn't search for something like "/usr/lpp/OV/tmp/OpC" or "/usr/opt/OV/tmp/OpC"
4. opcagt -start

Hope this helps. If any of the above commands does not exist or would complain about bad syntax - sorry for that, but it's not NetView!
In such a case you will have to consult your HPOV docs - or try to restart the whole OpenView application, this should be something like
ovc -stop ovcd
ovc -start ovcd
Attention! All HPOV application windows will close!

If the latter doesn't exist or work either - sorry again, please check the docs or see your HPOV admin.


LVL 68

Expert Comment

ID: 34184051
What I forgot: It could well be that HPOV is manageable via smit!

Open smit (or smitty) and search for HPOV, either under "Communications Applications and Services" or "Applications".

If it's there, see what you can do. At least restarting the whole application should be possible!

Good luck!

Author Comment

ID: 34233348
I had stopped the ovcd processes and restarted it, but the sort -rn processes issue is still there- Will work further with HPOV team.

Author Closing Comment

ID: 34253858
HPOV team made changes to their application template from their end. Thanks.

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension ( This reminded me of questions tha…
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question