Troubleshooting TSM Scheduled Backup Failure

ok, let me hear your experiences...

after the migtation of all clients to a new version, all days I check from TSM server is there was any failed or error on schedule backup:

tsm: BIBM_TSM>q event * * t=c begind=today-1 begint=now endd=today endt=now ex=yes

Scheduled Start          Actual Start             Schedule Name     Node Name         Status
--------------------     --------------------     -------------     -------------     ---------
09/17/13   14:55:00      09/17/13   14:59:38      TEST2_BAC-     TEST           Failed 12
                                                   KUP
09/18/13   06:30:00      09/18/13   06:38:31      MDES_BACK-     MDES           Failed 12
                                                   UP

Open in new window


So, the question would be:

1- Is better to look the cause of this failed at TSM server side or client?
2- Is there any tool to see all error/failed schedule logs together of all nodes?

Thanks.
sminfoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

max_the_kingCommented:
Hi,
after realizinig from TSM server that you have a problem on a client schedule, you should go to clients log files.
TSM server is just warning you that something has failed in that schedule with reason code = 12.
You should further invastigate on dsmsched.log or whatever file you have configured to write the log on the client; it might just be some skipped file because it was in use or a major issue. Client's log will tell you.

hope thie helps
max
0
woolmilkporcCommented:
Hi,

do you know Tivoli Operational Reporter? It's a Windows application and is shipped with TSM for Windows since version 5.2, but is also available for download:
ftp://ftp.software.ibm.com/storage/tivoli-storage-management/maintenance/server/v5r5/WIN/LATEST/5.5.7.000-TIV-TSMCON-Windows.exe

It will not replace a thorough investigation of the clients' logfles, but in will give you at least this (in the form of hourly or daily reports):

- Summary
- Schedules status (admin and client)
- Sessions, tape mounts
- Expiration, migration/reclamation (HSM)
- Database backup
- Storage pool backup
- Node activity
- Activity log
- Missed files (summary/detail)
- Session details
- etc.

Here is an IBM document describing  installation and customizing:
http://www-01.ibm.com/support/docview.wss?uid=swg27019794

Important: Many reports will not run against a TSM 6.x server as shipped. The above document contains instructions how to modify the reports (it's easy!) so that they can run against TSM 6.x

wmp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sminfoAuthor Commented:
OK.. so it's on client side... in our case we have this dsm.sys on every server

SErvername  TSM
   COMMmethod           TCPip
   TCPPort              1500
   TCPServeraddress     bibmtsmr37
   NODENAME             SASTEST

   PasswordAccess       GENERATE
   ErrorLogName         /var/syslog/dsmerror.log
   ErrorLogRetention    1 S
   schedlogname         /var/syslog/dsmsched.log
   schedlogretention    1 S
   INCLEXCL             /usr/tivoli/tsm/client/ba/bin64/inclexcl.tsm
   Compression          yes

Open in new window


So, all those 'failed/errors' shedule backupsshown on TSM server are on this two logs '/var/syslog/dsmerror.log' and  '/var/syslog/dsmsched.log'?

Or I need to add a new log entry to our dsm.sys?

I'm think in send this log to our syslog collector and see all failed/errors backups in a single site.... don't know if it's a good idea...
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

woolmilkporcCommented:
The "dsmc sched" client scheduler process writes to the log defined by "SCHEDLOGNAME" in dsm.sys.

Errors go, as always, to ERRORLOGNAME. It should be sufficient to monitor this file as a first approach. The scheduler log will give way too much info, unless you specified "quiet" in dsm.opt.

If you want to use syslog please make sure that the client's nodename would appear in each log line, particularly if hostname and nodename are different.
0
sminfoAuthor Commented:
perfect.. let me see how can I send ONLY errors entries on theses logs to syslog... I'll find a wauy.. but one last question.. I really copy this config file from IBM, but not sure what these entries are for?

ErrorLogRetention    1 S
schedlogretention    1 S

Rotation of one day?
0
woolmilkporcCommented:
"1 S": Keep one day's worth of messages in the original file, and save ("S") the pruned older messages to a file which will grow endlessly.

Alternatively, you could specify a higher number of days to keep the full text and inhibit pruning, like

schedlogretention    10 D

This will always keep the last 10 days' worth of messages in the original log, older messages are pruned and discarded.

You can also use the parameter "schedlogmax <megabytes>" instead of schedlogretention. This will allow the logfile to grow up to the specified value, then it will wrap around. Not good for syslog, I think.

Didn't you consider trying the Operational Reporter?
0
sminfoAuthor Commented:
Yes.. of course.. I have downloaded the tool 10 min ago and I'm going to take a look..

thanks max/wmp
0
sminfoAuthor Commented:
ok.. as a workarround I change dsm.sys as:

SErvername  TSM
   COMMmethod           TCPip
   TCPPort              1500
   TCPServeraddress     bibmtsm.bibm.net
   NODENAME             NODE1

   PasswordAccess       GENERATE
   ErrorLogName         /var/syslog/dsmerror.log
   ErrorLogRetention    1 D
   schedlogname         /var/syslog/dsmsched.log
   schedlogretention    1 D
   INCLEXCL             /usr/tivoli/tsm/client/ba/bin64/inclexcl.tsm
   POSTSchedulecmd      /usr/local/hacmp/runtsmreport
   Compression          yes

Open in new window


Noted new line  "POSTSchedulecmd      /usr/local/hacmp/runtsmreport"

which just send daily log file to syslog:

#!/bin/ksh
cat /var/syslog/dsmsched.log |while read line;do logger -p local0.info -t DSMSCHEDLOG $line;done
cat /var/syslog/dsmerror.log |while read line;do logger -p local0.info -t DSMERRORLOG $line;done

Open in new window


ANd in syslog file shows something like:

[root@nim:/usr/tivoli/tsm/client/ba/bin64] tail -10 /var/syslog/syslog.log
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS1228E Sending of object '/var/adm/nim/7077922' failed
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS4005E Error processing '/var/adm/nim/7077922': file not found
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS1228E Sending of object '/var/adm/nim/7077922/lslpp.info' failed
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS4005E Error processing '/var/adm/nim/7077922/lslpp.info': file not found
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS1228E Sending of object '/var/adm/nim/8781984' failed
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS4005E Error processing '/var/adm/nim/8781984': file not found
Sep 18 16:59:02 nim local0:info DSMERRORLOG: 09/18/13 12:12:21 ANS1802E Incremental backup of '/var' finished with 3 failure

Open in new window


Maybe there're tons of ways to do this...;)
0
woolmilkporcCommented:
Fine!

But you can't be sure at which point in time  log pruning will take place!
You will most probably see duplicate records in syslog.

How about clearing it out manually?

cat /dev/null > /var/syslog/dsmsched.log
...

at the end of the script?

Specifiy

schedlogretention N

so TSM will keep its hands off the log.
0
sminfoAuthor Commented:
well.. I changed also

ErrorLogRetention    1 S

for

ErrorLogRetention    1 D

so, I believe it always let the daliy log...is that right?
0
woolmilkporcCommented:
Not really.

The client will prune the log after the schedule completes, and this will be done asynchronously in the background.

So when running the cat commands immediately after the schedule completes you will catch a mix of old and new messages.

Better do all the log maintenance manually.

- use schedlogretention/errorlogretention "N", so there will be no pruning by TSM which could/will interfere with your activities.
- cat the log to logger
- clean up the log (empty it; or copy it away, then empty it).
0
sminfoAuthor Commented:
0
sminfoAuthor Commented:
OK!!.. better the way you say... thanks again!!
0
woolmilkporcCommented:
I hope you didn't misunderstand me - using "postschedulecmd" is really a perfect way to deal with this, and your script will work just fine!

I was just suggesting another ("manual") way of log maintenance, that's all.
0
sminfoAuthor Commented:
yes yes!... I understand well your recommendation of 'manual clean of log' ... ;)

I'm changing now 4 clients (DEV LPARS) to see for a week how it goes.. if fine, just change all PROD LPARs...

Finally, I'm falling in love with TSM!!   ;)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.