Link to home
Start Free TrialLog in
Avatar of virgo0880
virgo0880

asked on

tsm client taking more time for backup

Hi All,

I have a node for which i have scheduled incremental backup. But this node is taking 2-3 hours for backing up only 126 number of files on average basis. How can i find out what is the problem, why the backup is taking this much time instead of completing in matter of minutes. Other systems in the schedule are completing in 5-10 minutes.

Thanks
virgo
Avatar of madunix
madunix

Did you check the connection? it could be a half-duplex connection instead of a full duplex ..mismatched settings between host NIC and the switch port.
What do the statistics displayed at the end of each backup run say?

Particularly contrast "Network data transfer rate:" with "Aggregate data transfer rate:"!

If there is not a huge difference there might indeed be a network misconfiguration, but if the network rate seems acceptable compared to the ones of your other systems we should suspect a client problem here!

- System overloaded, e.g. short on memory (high paging rates) or CPU?
- Stale NFS volumes involved and the client has to wait for the timeout?

If in doubt please post the statistics from "--- SCHEDULEREC STATUS BEGIN" up to "--- SCHEDULEREC STATUS END"

wmp
Avatar of virgo0880

ASKER

I checked the NIC speed, it is set to auto negotation.

Hi wmp,

Please find the output :

11/15/10   22:32:06 --- SCHEDULEREC STATUS BEGIN
11/15/10   22:32:06 Total number of objects inspected:  788,139
11/15/10   22:32:06 Total number of objects backed up:      144
11/15/10   22:32:06 Total number of objects updated:          0
11/15/10   22:32:06 Total number of objects rebound:          0
11/15/10   22:32:06 Total number of objects deleted:          0
11/15/10   22:32:06 Total number of objects expired:         32
11/15/10   22:32:06 Total number of objects failed:           0
11/15/10   22:32:06 Total number of bytes transferred:   91.23 MB
11/15/10   22:32:06 Data transfer time:                    9.06 sec
11/15/10   22:32:06 Network data transfer rate:        10,305.46 KB/sec
11/15/10   22:32:06 Aggregate data transfer rate:          8.65 KB/sec
11/15/10   22:32:06 Objects compressed by:                    0%
11/15/10   22:32:06 Elapsed processing time:           02:59:55
11/15/10   22:32:06 --- SCHEDULEREC STATUS END
11/15/10   22:32:06 --- SCHEDULEREC OBJECT END MIS_UNIX_XXXX 11/15/10   19:30:00
11/15/10   22:32:06 Scheduled event 'MIS_UNIX_XXXX' completed successfully.
11/15/10   22:32:06 Sending results for scheduled event 'MIS_UNIX_XXXX'.
11/15/10   22:32:06 Results sent to server for scheduled event 'MIS_UNIX_XXXX'.

in client check dsmsched.log and dsmerror.log? clear client logs and take a manual backup and see what happens, check "q act" I usuauly prefer to see what happens in server side by "dsmadmc -console"
If you are using TSM journal, also verify that journal service is operational.
OK,

as you can see your network transfer rate is not the cause of your issue.

10 MB/sec is phantastic for a 100Mb network, and is still tolerable (although not really good) for a GE network.

I'd rather suspect that the number of files to be inspected is the culprit!

Nearly 800.000 files is not that uncommon, but on the other hand, if your client machine is a bit short on memory or CPU, inspecting this much files can be time consuming.

What are the values of your other, well-performing machines? Is there a comparable high amount of files, what are the transfer rates?

Does the client show this bad performance consistently? If not, there could also be other effects like media waits or the like.

Anyway, should it turn out that the number of files is actually the cause, you could consider implementing journal based backup!

wmp
I dont think CPU/memory is the issue, as this system (IBM,7038-6M2) is having 4cpu/24g of configuration. Other well-performing machines are having values from 1431585, 1325417 etc...yes, i have taken a average of last 10 days & found that it is taking approx 3 hours on a daily basis...but the number of files to be backed up is very less, other machines are completing backup from 7-10 minutes. This is AIX 5.3.

Thanks
virgo
I fear you're a bit unclear - and a bit reticent!

1431585, 1325417 etc

What are these values? Number of files ...?

The number of files actually backed up is by no means a criterion here (as your network seems to work well), it's the number of files inspected which counts!

What are the transfer rates (network/aggregated) of those well-performing machines?
Does the slow machine have to back up NFS/automount volumes?
If so, what is your NFS performance in general, apart from TSM?

Do you have I/O waits, maybe due to disks being slower as the ones of the other machines? Or is it all the same SAN?



Yes, these are the number of files examined, the files backed up are less say...150-200 files, for some of the other nodes. Also, there are no nfs mount points configured for the backup, it is just the OS fs backup which is scheduled.

virgo
So there are not many options left.

- Bad disk I/O performance (you didn't answer my question about that above).
-- Record an iostat during backup

- Heavy batch jobs in parallel to the backup (what's the task of the slow system anyway?)
-- Record a vmstat during backup

- Other hardware/software issues which might be reflected in the error log.
-- Examine errpt

- TSM client configuration
-- Compare dsm.sys/opt, TSM node definition with the corresponding data of the "good" systems.

wmp

Sorry for the delayed response, i will check on the things & revert.

Thanks
virgo
Hi , I have checked the NIC setting on the switch & on the system, both are set to full 1 gb. But, when i see dsm.opt with one of my working system, here is the diff :

Working system : dsm.opt
$ cat dsm.opt
************************************************************************
* ADSTAR Distributed Storage Manager                                   *
*                                                                      *
* Sample Client User Options file for AIX and SunOS (dsm.opt.smp)      *
************************************************************************

*  This file contains an option you can use to specify the ADSM
*  server to contact if more than one is defined in your client
*  system options file (dsm.sys).  Copy dsm.opt.smp to dsm.opt.
*  If you enter a server name for the option below, remove the
*  leading asterisk (*).

*  For information about additional options you can set in this file,
*  see the options.doc file in the directory where ADSM was installed.

************************************************************************

* SErvername       A server name defined in the dsm.sys file

While the node on which i have the problem, is having dsm.opt as :

* SErvername       A server name defined in the dsm.sys file
Servername                      TSM
Tracefile   /tmp/Encrtrace.out
traceflags  service
tracemax 1024

What are these extra options defined like tracefile,traceflags,tracemax, is it because of this, the backups are taking more time.

thanks
virgo
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I dont see /tmp/Encrtrace.out file on the system. Also, if i remove this option whether i need to restart the client scheduler for the changes to take effect ?

Once removing this option, can i check that the backups are not taking more time, by issuing some commands on the command-line, we have scheduled incremental backups for this system, what command i shud gave on command line to check if it is getting done early.

thanks
virgo
You must restart the client scheduler.

To check the effect just examine the statistics at the end of the scheduler log,
that't the easiest way.
If you didn't configure SCHEDLOGNAME in dsm.sys
it defaults to /usr/tivoli/tsm/client/ba/bin/dsmsched.log (or .../bin64/... with the 64bit client).
ok, i have removed the trace optioins. Now, can i manually check  by issuing the backup command for the systems, instead of waiting till evening till the backup get starts.

Regards
virgo
Yes, of course.

Issue "dsmc i"

If you want a more verbose log in a file and watch the protocol simultaneously at the console issue

"dsmc i -verbose | tee /tmp/dsmincr.log" (filename is just an example)

I tried executing the backup again & now it is taking less time for the backup, specially one of the filesystem was taking 2 hours for the backup & now it completed within 4 minutes. I will monitor the backup today & see how much time it wud take for the whole system to backup.

Thanks , wmp for your great help.

virgo
OK