Link to home
Start Free TrialLog in
Avatar of cansib
cansib

asked on

AIX 5.3 - Something is clearing my error logs....

Something is clearing my error logs.  I know it's something that my vendor did, but I would like to see for myself where this can be found and how I can undo it.  When I run:  errpt |pg   I get nothing.  When I run:  /usr/lib/errdemon   it says it's already running.  Can someone help me find where it could be scripted to clear the error logs?  I am still learning AIX so I haven't done a lot of advanced things with it.  Thanks!
SOLUTION
Avatar of omarfarid
omarfarid
Flag of United Arab Emirates image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,
the only 'allowed' method to clear the errorlog is using the 'errclear' program.
Basically, you tell errclear how many days of data to leave in the log and which types of records to delete, which means,
errclear 0
will delete everything. Use man errclear to see more.
Seems your log gets cleared very frequently, so I would have a look at root's crontab -
crontab -l | grep errclear  (as user root, because only root is allowed to use errclear)
You should find the standard AIX entries, which normally read
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
which means 'clear Software and errlOgger-generated errors older than 30 days, clear Hardware errors older than 90 days.
If you find other values, especially for the retention days settings, or additional errclear entries, you have found it.
-----------
To test whether your logging is working at all, use
errlogger "This is a test"
then use
errpt
to see if it's there.
 
wmp
 
 

 
 
Avatar of cansib
cansib

ASKER

This is crazy.  In the crontab, I only found the standard entries:

0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90

Then, when I ran errlogger "This is a test" and then used errpt, there was still nothing.  Is my error logging corrupted?  Can I rebuild it?  Thanks!

Mark
1)
ps -ef | grep errdemon
do you find a running process /usr/lib/errdemon ?
Issue
/usr/lib/errstop
then
/usr/lib/errdemon
and test anew.
Have some meetings now, will be back in ca. 2 hrs.
wmp
 
Avatar of cansib

ASKER

Here's the output from the first command:

idxhost:root:/ =>ps -ef | grep errdemon
    root  6994     1   0   Feb 14      -  0:00 /usr/lib/errdemon
    root 30738 26460   0 08:49:48 pts/23  0:00 grep errdemon

and here's what happened with the next 2 commands:

idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>/usr/lib/errdemon
idxhost:root:/ =>errlogger "This is a test"
idxhost:root:/ =>errpt
idxhost:root:/ =>

Strange, huh?
So, please repeat the errstop, then look with ps if errdemon is running nevertheless.
If yes, terminate it with kill -9 [pid] and see if it vanishes.
If yes, issue the /usr/lib/errdemon again and test.

Look at /var/adm/ras for the files errlog and errtmplt.
errlog must be writeable for user root and group system
errtmplt must be writeable for root and have a minimum size of 250-300 K.

If errlog is not there, do
touch /var/adm/ras/errlog, chown root:system /var/adm/ras/errlog, chmod 664 /var/adm/ras/errlog

I'll do some research in the meantime.

wmp





Avatar of cansib

ASKER

Thank you!  I have to make a run offsite real quick, but I will post back my results.  Thank you so much for helping!  I really appreciate it.

Mark
Avatar of cansib

ASKER

I ran the errstop, then the ps command, here is the output from that:

idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>ps -ef | grep errdemon
    root 19650 34522   0 15:32:57  pts/9  0:00 grep errdemon

Does that mean it's still running?  Thanks!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
OK, omarfarid is right.
errdemon is not running anymore. Now start it using /usr/lib/errdemon and test.
If it still doesn't work, please examine /var/adm/ras as I suggested above.
wmp
 
 
Avatar of cansib

ASKER

I tested and it's still not logging anything.

Here is what I found on the 2 log files:

-rw-rw-r--   1 root     system       104218 Feb 18 07:10 errlog
-rw-r--r--   1 root     system       241805 Mar 07 2007  errtmplt

Thanks!
Some sort of 'hard' method -
1) /usr/lib/errstop
2) rm /var/adm/ras/errlog
3) /usr/lib/errdemon
4) errpt
You should see
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
9DBCFDEE   0218172709 T O errdemon       ERROR LOGGING TURNED ON
If not, I fear I will be out of ideas in a while ...
 
Avatar of cansib

ASKER

I tried that and still no luck.  Is it possible that the error log entries are somehow being redirected?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cansib

ASKER

Someone from our vendor support remoted in and fixed this without my knowledge.  In fact, I don't even know for sure if it was them, but I'm thinking, who else could it be.  I did an "esc + k" and saw a back log of commands that I didn't run that all were related to the errdemon and errpt.  So it's working now, I just don't know who fixed it.  Thanks for the help though and sorry for the delay in getting back to this issue.
Hi again,
glad to hear that it works now. But too bad that we don't know why! Is there really no chance to ask someone from your vendor's support people what they did? The answer might help other people, too!
wmp
P.S. What commands did you see with esc-k?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial