cansib
asked on
AIX 5.3 - Something is clearing my error logs....
Something is clearing my error logs. I know it's something that my vendor did, but I would like to see for myself where this can be found and how I can undo it. When I run: errpt |pg I get nothing. When I run: /usr/lib/errdemon it says it's already running. Can someone help me find where it could be scripted to clear the error logs? I am still learning AIX so I haven't done a lot of advanced things with it. Thanks!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This is crazy. In the crontab, I only found the standard entries:
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
Then, when I ran errlogger "This is a test" and then used errpt, there was still nothing. Is my error logging corrupted? Can I rebuild it? Thanks!
Mark
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
Then, when I ran errlogger "This is a test" and then used errpt, there was still nothing. Is my error logging corrupted? Can I rebuild it? Thanks!
Mark
1)
ps -ef | grep errdemon
do you find a running process /usr/lib/errdemon ?
Issue
/usr/lib/errstop
then
/usr/lib/errdemon
and test anew.
Have some meetings now, will be back in ca. 2 hrs.
wmp
ps -ef | grep errdemon
do you find a running process /usr/lib/errdemon ?
Issue
/usr/lib/errstop
then
/usr/lib/errdemon
and test anew.
Have some meetings now, will be back in ca. 2 hrs.
wmp
ASKER
Here's the output from the first command:
idxhost:root:/ =>ps -ef | grep errdemon
root 6994 1 0 Feb 14 - 0:00 /usr/lib/errdemon
root 30738 26460 0 08:49:48 pts/23 0:00 grep errdemon
and here's what happened with the next 2 commands:
idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>/usr/lib/errdemon
idxhost:root:/ =>errlogger "This is a test"
idxhost:root:/ =>errpt
idxhost:root:/ =>
Strange, huh?
idxhost:root:/ =>ps -ef | grep errdemon
root 6994 1 0 Feb 14 - 0:00 /usr/lib/errdemon
root 30738 26460 0 08:49:48 pts/23 0:00 grep errdemon
and here's what happened with the next 2 commands:
idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>/usr/lib/errdemon
idxhost:root:/ =>errlogger "This is a test"
idxhost:root:/ =>errpt
idxhost:root:/ =>
Strange, huh?
So, please repeat the errstop, then look with ps if errdemon is running nevertheless.
If yes, terminate it with kill -9 [pid] and see if it vanishes.
If yes, issue the /usr/lib/errdemon again and test.
Look at /var/adm/ras for the files errlog and errtmplt.
errlog must be writeable for user root and group system
errtmplt must be writeable for root and have a minimum size of 250-300 K.
If errlog is not there, do
touch /var/adm/ras/errlog, chown root:system /var/adm/ras/errlog, chmod 664 /var/adm/ras/errlog
I'll do some research in the meantime.
wmp
If yes, terminate it with kill -9 [pid] and see if it vanishes.
If yes, issue the /usr/lib/errdemon again and test.
Look at /var/adm/ras for the files errlog and errtmplt.
errlog must be writeable for user root and group system
errtmplt must be writeable for root and have a minimum size of 250-300 K.
If errlog is not there, do
touch /var/adm/ras/errlog, chown root:system /var/adm/ras/errlog, chmod 664 /var/adm/ras/errlog
I'll do some research in the meantime.
wmp
ASKER
Thank you! I have to make a run offsite real quick, but I will post back my results. Thank you so much for helping! I really appreciate it.
Mark
Mark
ASKER
I ran the errstop, then the ps command, here is the output from that:
idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>ps -ef | grep errdemon
root 19650 34522 0 15:32:57 pts/9 0:00 grep errdemon
Does that mean it's still running? Thanks!
idxhost:root:/ =>/usr/lib/errstop
idxhost:root:/ =>ps -ef | grep errdemon
root 19650 34522 0 15:32:57 pts/9 0:00 grep errdemon
Does that mean it's still running? Thanks!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
OK, omarfarid is right.
errdemon is not running anymore. Now start it using /usr/lib/errdemon and test.
If it still doesn't work, please examine /var/adm/ras as I suggested above.
wmp
errdemon is not running anymore. Now start it using /usr/lib/errdemon and test.
If it still doesn't work, please examine /var/adm/ras as I suggested above.
wmp
ASKER
I tested and it's still not logging anything.
Here is what I found on the 2 log files:
-rw-rw-r-- 1 root system 104218 Feb 18 07:10 errlog
-rw-r--r-- 1 root system 241805 Mar 07 2007 errtmplt
Thanks!
Here is what I found on the 2 log files:
-rw-rw-r-- 1 root system 104218 Feb 18 07:10 errlog
-rw-r--r-- 1 root system 241805 Mar 07 2007 errtmplt
Thanks!
Some sort of 'hard' method -
1) /usr/lib/errstop
2) rm /var/adm/ras/errlog
3) /usr/lib/errdemon
4) errpt
You should see
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
9DBCFDEE 0218172709 T O errdemon ERROR LOGGING TURNED ON
If not, I fear I will be out of ideas in a while ...
1) /usr/lib/errstop
2) rm /var/adm/ras/errlog
3) /usr/lib/errdemon
4) errpt
You should see
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
9DBCFDEE 0218172709 T O errdemon ERROR LOGGING TURNED ON
If not, I fear I will be out of ideas in a while ...
ASKER
I tried that and still no luck. Is it possible that the error log entries are somehow being redirected?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Someone from our vendor support remoted in and fixed this without my knowledge. In fact, I don't even know for sure if it was them, but I'm thinking, who else could it be. I did an "esc + k" and saw a back log of commands that I didn't run that all were related to the errdemon and errpt. So it's working now, I just don't know who fixed it. Thanks for the help though and sorry for the delay in getting back to this issue.
Hi again,
glad to hear that it works now. But too bad that we don't know why! Is there really no chance to ask someone from your vendor's support people what they did? The answer might help other people, too!
wmp
P.S. What commands did you see with esc-k?
glad to hear that it works now. But too bad that we don't know why! Is there really no chance to ask someone from your vendor's support people what they did? The answer might help other people, too!
wmp
P.S. What commands did you see with esc-k?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
the only 'allowed' method to clear the errorlog is using the 'errclear' program.
Basically, you tell errclear how many days of data to leave in the log and which types of records to delete, which means,
errclear 0
will delete everything. Use man errclear to see more.
Seems your log gets cleared very frequently, so I would have a look at root's crontab -
crontab -l | grep errclear (as user root, because only root is allowed to use errclear)
You should find the standard AIX entries, which normally read
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
which means 'clear Software and errlOgger-generated errors older than 30 days, clear Hardware errors older than 90 days.
If you find other values, especially for the retention days settings, or additional errclear entries, you have found it.
-----------
To test whether your logging is working at all, use
errlogger "This is a test"
then use
errpt
to see if it's there.
wmp