virgo0880
asked on
tsm client taking more time for backup
Hi All,
I have a node for which i have scheduled incremental backup. But this node is taking 2-3 hours for backing up only 126 number of files on average basis. How can i find out what is the problem, why the backup is taking this much time instead of completing in matter of minutes. Other systems in the schedule are completing in 5-10 minutes.
Thanks
virgo
I have a node for which i have scheduled incremental backup. But this node is taking 2-3 hours for backing up only 126 number of files on average basis. How can i find out what is the problem, why the backup is taking this much time instead of completing in matter of minutes. Other systems in the schedule are completing in 5-10 minutes.
Thanks
virgo
Did you check the connection? it could be a half-duplex connection instead of a full duplex ..mismatched settings between host NIC and the switch port.
What do the statistics displayed at the end of each backup run say?
Particularly contrast "Network data transfer rate:" with "Aggregate data transfer rate:"!
If there is not a huge difference there might indeed be a network misconfiguration, but if the network rate seems acceptable compared to the ones of your other systems we should suspect a client problem here!
- System overloaded, e.g. short on memory (high paging rates) or CPU?
- Stale NFS volumes involved and the client has to wait for the timeout?
If in doubt please post the statistics from "--- SCHEDULEREC STATUS BEGIN" up to "--- SCHEDULEREC STATUS END"
wmp
Particularly contrast "Network data transfer rate:" with "Aggregate data transfer rate:"!
If there is not a huge difference there might indeed be a network misconfiguration, but if the network rate seems acceptable compared to the ones of your other systems we should suspect a client problem here!
- System overloaded, e.g. short on memory (high paging rates) or CPU?
- Stale NFS volumes involved and the client has to wait for the timeout?
If in doubt please post the statistics from "--- SCHEDULEREC STATUS BEGIN" up to "--- SCHEDULEREC STATUS END"
wmp
ASKER
I checked the NIC speed, it is set to auto negotation.
Hi wmp,
Please find the output :
11/15/10 22:32:06 --- SCHEDULEREC STATUS BEGIN
11/15/10 22:32:06 Total number of objects inspected: 788,139
11/15/10 22:32:06 Total number of objects backed up: 144
11/15/10 22:32:06 Total number of objects updated: 0
11/15/10 22:32:06 Total number of objects rebound: 0
11/15/10 22:32:06 Total number of objects deleted: 0
11/15/10 22:32:06 Total number of objects expired: 32
11/15/10 22:32:06 Total number of objects failed: 0
11/15/10 22:32:06 Total number of bytes transferred: 91.23 MB
11/15/10 22:32:06 Data transfer time: 9.06 sec
11/15/10 22:32:06 Network data transfer rate: 10,305.46 KB/sec
11/15/10 22:32:06 Aggregate data transfer rate: 8.65 KB/sec
11/15/10 22:32:06 Objects compressed by: 0%
11/15/10 22:32:06 Elapsed processing time: 02:59:55
11/15/10 22:32:06 --- SCHEDULEREC STATUS END
11/15/10 22:32:06 --- SCHEDULEREC OBJECT END MIS_UNIX_XXXX 11/15/10 19:30:00
11/15/10 22:32:06 Scheduled event 'MIS_UNIX_XXXX' completed successfully.
11/15/10 22:32:06 Sending results for scheduled event 'MIS_UNIX_XXXX'.
11/15/10 22:32:06 Results sent to server for scheduled event 'MIS_UNIX_XXXX'.
Hi wmp,
Please find the output :
11/15/10 22:32:06 --- SCHEDULEREC STATUS BEGIN
11/15/10 22:32:06 Total number of objects inspected: 788,139
11/15/10 22:32:06 Total number of objects backed up: 144
11/15/10 22:32:06 Total number of objects updated: 0
11/15/10 22:32:06 Total number of objects rebound: 0
11/15/10 22:32:06 Total number of objects deleted: 0
11/15/10 22:32:06 Total number of objects expired: 32
11/15/10 22:32:06 Total number of objects failed: 0
11/15/10 22:32:06 Total number of bytes transferred: 91.23 MB
11/15/10 22:32:06 Data transfer time: 9.06 sec
11/15/10 22:32:06 Network data transfer rate: 10,305.46 KB/sec
11/15/10 22:32:06 Aggregate data transfer rate: 8.65 KB/sec
11/15/10 22:32:06 Objects compressed by: 0%
11/15/10 22:32:06 Elapsed processing time: 02:59:55
11/15/10 22:32:06 --- SCHEDULEREC STATUS END
11/15/10 22:32:06 --- SCHEDULEREC OBJECT END MIS_UNIX_XXXX 11/15/10 19:30:00
11/15/10 22:32:06 Scheduled event 'MIS_UNIX_XXXX' completed successfully.
11/15/10 22:32:06 Sending results for scheduled event 'MIS_UNIX_XXXX'.
11/15/10 22:32:06 Results sent to server for scheduled event 'MIS_UNIX_XXXX'.
in client check dsmsched.log and dsmerror.log? clear client logs and take a manual backup and see what happens, check "q act" I usuauly prefer to see what happens in server side by "dsmadmc -console"
If you are using TSM journal, also verify that journal service is operational.
If you are using TSM journal, also verify that journal service is operational.
OK,
as you can see your network transfer rate is not the cause of your issue.
10 MB/sec is phantastic for a 100Mb network, and is still tolerable (although not really good) for a GE network.
I'd rather suspect that the number of files to be inspected is the culprit!
Nearly 800.000 files is not that uncommon, but on the other hand, if your client machine is a bit short on memory or CPU, inspecting this much files can be time consuming.
What are the values of your other, well-performing machines? Is there a comparable high amount of files, what are the transfer rates?
Does the client show this bad performance consistently? If not, there could also be other effects like media waits or the like.
Anyway, should it turn out that the number of files is actually the cause, you could consider implementing journal based backup!
wmp
as you can see your network transfer rate is not the cause of your issue.
10 MB/sec is phantastic for a 100Mb network, and is still tolerable (although not really good) for a GE network.
I'd rather suspect that the number of files to be inspected is the culprit!
Nearly 800.000 files is not that uncommon, but on the other hand, if your client machine is a bit short on memory or CPU, inspecting this much files can be time consuming.
What are the values of your other, well-performing machines? Is there a comparable high amount of files, what are the transfer rates?
Does the client show this bad performance consistently? If not, there could also be other effects like media waits or the like.
Anyway, should it turn out that the number of files is actually the cause, you could consider implementing journal based backup!
wmp
ASKER
I dont think CPU/memory is the issue, as this system (IBM,7038-6M2) is having 4cpu/24g of configuration. Other well-performing machines are having values from 1431585, 1325417 etc...yes, i have taken a average of last 10 days & found that it is taking approx 3 hours on a daily basis...but the number of files to be backed up is very less, other machines are completing backup from 7-10 minutes. This is AIX 5.3.
Thanks
virgo
Thanks
virgo
I fear you're a bit unclear - and a bit reticent!
1431585, 1325417 etc
What are these values? Number of files ...?
The number of files actually backed up is by no means a criterion here (as your network seems to work well), it's the number of files inspected which counts!
What are the transfer rates (network/aggregated) of those well-performing machines?
Does the slow machine have to back up NFS/automount volumes?
If so, what is your NFS performance in general, apart from TSM?
Do you have I/O waits, maybe due to disks being slower as the ones of the other machines? Or is it all the same SAN?
1431585, 1325417 etc
What are these values? Number of files ...?
The number of files actually backed up is by no means a criterion here (as your network seems to work well), it's the number of files inspected which counts!
What are the transfer rates (network/aggregated) of those well-performing machines?
Does the slow machine have to back up NFS/automount volumes?
If so, what is your NFS performance in general, apart from TSM?
Do you have I/O waits, maybe due to disks being slower as the ones of the other machines? Or is it all the same SAN?
ASKER
Yes, these are the number of files examined, the files backed up are less say...150-200 files, for some of the other nodes. Also, there are no nfs mount points configured for the backup, it is just the OS fs backup which is scheduled.
virgo
virgo
So there are not many options left.
- Bad disk I/O performance (you didn't answer my question about that above).
-- Record an iostat during backup
- Heavy batch jobs in parallel to the backup (what's the task of the slow system anyway?)
-- Record a vmstat during backup
- Other hardware/software issues which might be reflected in the error log.
-- Examine errpt
- TSM client configuration
-- Compare dsm.sys/opt, TSM node definition with the corresponding data of the "good" systems.
wmp
- Bad disk I/O performance (you didn't answer my question about that above).
-- Record an iostat during backup
- Heavy batch jobs in parallel to the backup (what's the task of the slow system anyway?)
-- Record a vmstat during backup
- Other hardware/software issues which might be reflected in the error log.
-- Examine errpt
- TSM client configuration
-- Compare dsm.sys/opt, TSM node definition with the corresponding data of the "good" systems.
wmp
ASKER
Sorry for the delayed response, i will check on the things & revert.
Thanks
virgo
Thanks
virgo
ASKER
Hi , I have checked the NIC setting on the switch & on the system, both are set to full 1 gb. But, when i see dsm.opt with one of my working system, here is the diff :
Working system : dsm.opt
$ cat dsm.opt
************************** ********** ********** ********** ********** ******
* ADSTAR Distributed Storage Manager *
* *
* Sample Client User Options file for AIX and SunOS (dsm.opt.smp) *
************************** ********** ********** ********** ********** ******
* This file contains an option you can use to specify the ADSM
* server to contact if more than one is defined in your client
* system options file (dsm.sys). Copy dsm.opt.smp to dsm.opt.
* If you enter a server name for the option below, remove the
* leading asterisk (*).
* For information about additional options you can set in this file,
* see the options.doc file in the directory where ADSM was installed.
************************** ********** ********** ********** ********** ******
* SErvername A server name defined in the dsm.sys file
While the node on which i have the problem, is having dsm.opt as :
* SErvername A server name defined in the dsm.sys file
Servername TSM
Tracefile /tmp/Encrtrace.out
traceflags service
tracemax 1024
What are these extra options defined like tracefile,traceflags,trace max, is it because of this, the backups are taking more time.
thanks
virgo
Working system : dsm.opt
$ cat dsm.opt
**************************
* ADSTAR Distributed Storage Manager *
* *
* Sample Client User Options file for AIX and SunOS (dsm.opt.smp) *
**************************
* This file contains an option you can use to specify the ADSM
* server to contact if more than one is defined in your client
* system options file (dsm.sys). Copy dsm.opt.smp to dsm.opt.
* If you enter a server name for the option below, remove the
* leading asterisk (*).
* For information about additional options you can set in this file,
* see the options.doc file in the directory where ADSM was installed.
**************************
* SErvername A server name defined in the dsm.sys file
While the node on which i have the problem, is having dsm.opt as :
* SErvername A server name defined in the dsm.sys file
Servername TSM
Tracefile /tmp/Encrtrace.out
traceflags service
tracemax 1024
What are these extra options defined like tracefile,traceflags,trace
thanks
virgo
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I dont see /tmp/Encrtrace.out file on the system. Also, if i remove this option whether i need to restart the client scheduler for the changes to take effect ?
Once removing this option, can i check that the backups are not taking more time, by issuing some commands on the command-line, we have scheduled incremental backups for this system, what command i shud gave on command line to check if it is getting done early.
thanks
virgo
Once removing this option, can i check that the backups are not taking more time, by issuing some commands on the command-line, we have scheduled incremental backups for this system, what command i shud gave on command line to check if it is getting done early.
thanks
virgo
You must restart the client scheduler.
To check the effect just examine the statistics at the end of the scheduler log,
that't the easiest way.
If you didn't configure SCHEDLOGNAME in dsm.sys
it defaults to /usr/tivoli/tsm/client/ba/ bin/dsmsch ed.log (or .../bin64/... with the 64bit client).
To check the effect just examine the statistics at the end of the scheduler log,
that't the easiest way.
If you didn't configure SCHEDLOGNAME in dsm.sys
it defaults to /usr/tivoli/tsm/client/ba/
ASKER
ok, i have removed the trace optioins. Now, can i manually check by issuing the backup command for the systems, instead of waiting till evening till the backup get starts.
Regards
virgo
Regards
virgo
Yes, of course.
Issue "dsmc i"
If you want a more verbose log in a file and watch the protocol simultaneously at the console issue
"dsmc i -verbose | tee /tmp/dsmincr.log" (filename is just an example)
Issue "dsmc i"
If you want a more verbose log in a file and watch the protocol simultaneously at the console issue
"dsmc i -verbose | tee /tmp/dsmincr.log" (filename is just an example)
ASKER
I tried executing the backup again & now it is taking less time for the backup, specially one of the filesystem was taking 2 hours for the backup & now it completed within 4 minutes. I will monitor the backup today & see how much time it wud take for the whole system to backup.
Thanks , wmp for your great help.
virgo
Thanks , wmp for your great help.
virgo
ASKER
OK