Solved

tsm client taking more time for backup

Posted on 2010-11-15
18
1,216 Views
Last Modified: 2013-11-14
Hi All,

I have a node for which i have scheduled incremental backup. But this node is taking 2-3 hours for backing up only 126 number of files on average basis. How can i find out what is the problem, why the backup is taking this much time instead of completing in matter of minutes. Other systems in the schedule are completing in 5-10 minutes.

Thanks
virgo
0
Comment
Question by:virgo0880
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 2
18 Comments
 
LVL 25

Expert Comment

by:madunix
ID: 34140640
Did you check the connection? it could be a half-duplex connection instead of a full duplex ..mismatched settings between host NIC and the switch port.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34143697
What do the statistics displayed at the end of each backup run say?

Particularly contrast "Network data transfer rate:" with "Aggregate data transfer rate:"!

If there is not a huge difference there might indeed be a network misconfiguration, but if the network rate seems acceptable compared to the ones of your other systems we should suspect a client problem here!

- System overloaded, e.g. short on memory (high paging rates) or CPU?
- Stale NFS volumes involved and the client has to wait for the timeout?

If in doubt please post the statistics from "--- SCHEDULEREC STATUS BEGIN" up to "--- SCHEDULEREC STATUS END"

wmp
0
 

Author Comment

by:virgo0880
ID: 34147534
I checked the NIC speed, it is set to auto negotation.

Hi wmp,

Please find the output :

11/15/10   22:32:06 --- SCHEDULEREC STATUS BEGIN
11/15/10   22:32:06 Total number of objects inspected:  788,139
11/15/10   22:32:06 Total number of objects backed up:      144
11/15/10   22:32:06 Total number of objects updated:          0
11/15/10   22:32:06 Total number of objects rebound:          0
11/15/10   22:32:06 Total number of objects deleted:          0
11/15/10   22:32:06 Total number of objects expired:         32
11/15/10   22:32:06 Total number of objects failed:           0
11/15/10   22:32:06 Total number of bytes transferred:   91.23 MB
11/15/10   22:32:06 Data transfer time:                    9.06 sec
11/15/10   22:32:06 Network data transfer rate:        10,305.46 KB/sec
11/15/10   22:32:06 Aggregate data transfer rate:          8.65 KB/sec
11/15/10   22:32:06 Objects compressed by:                    0%
11/15/10   22:32:06 Elapsed processing time:           02:59:55
11/15/10   22:32:06 --- SCHEDULEREC STATUS END
11/15/10   22:32:06 --- SCHEDULEREC OBJECT END MIS_UNIX_XXXX 11/15/10   19:30:00
11/15/10   22:32:06 Scheduled event 'MIS_UNIX_XXXX' completed successfully.
11/15/10   22:32:06 Sending results for scheduled event 'MIS_UNIX_XXXX'.
11/15/10   22:32:06 Results sent to server for scheduled event 'MIS_UNIX_XXXX'.

0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 
LVL 25

Expert Comment

by:madunix
ID: 34147961
in client check dsmsched.log and dsmerror.log? clear client logs and take a manual backup and see what happens, check "q act" I usuauly prefer to see what happens in server side by "dsmadmc -console"
If you are using TSM journal, also verify that journal service is operational.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34148120
OK,

as you can see your network transfer rate is not the cause of your issue.

10 MB/sec is phantastic for a 100Mb network, and is still tolerable (although not really good) for a GE network.

I'd rather suspect that the number of files to be inspected is the culprit!

Nearly 800.000 files is not that uncommon, but on the other hand, if your client machine is a bit short on memory or CPU, inspecting this much files can be time consuming.

What are the values of your other, well-performing machines? Is there a comparable high amount of files, what are the transfer rates?

Does the client show this bad performance consistently? If not, there could also be other effects like media waits or the like.

Anyway, should it turn out that the number of files is actually the cause, you could consider implementing journal based backup!

wmp
0
 

Author Comment

by:virgo0880
ID: 34148544
I dont think CPU/memory is the issue, as this system (IBM,7038-6M2) is having 4cpu/24g of configuration. Other well-performing machines are having values from 1431585, 1325417 etc...yes, i have taken a average of last 10 days & found that it is taking approx 3 hours on a daily basis...but the number of files to be backed up is very less, other machines are completing backup from 7-10 minutes. This is AIX 5.3.

Thanks
virgo
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34148729
I fear you're a bit unclear - and a bit reticent!

1431585, 1325417 etc

What are these values? Number of files ...?

The number of files actually backed up is by no means a criterion here (as your network seems to work well), it's the number of files inspected which counts!

What are the transfer rates (network/aggregated) of those well-performing machines?
Does the slow machine have to back up NFS/automount volumes?
If so, what is your NFS performance in general, apart from TSM?

Do you have I/O waits, maybe due to disks being slower as the ones of the other machines? Or is it all the same SAN?



0
 

Author Comment

by:virgo0880
ID: 34150073
Yes, these are the number of files examined, the files backed up are less say...150-200 files, for some of the other nodes. Also, there are no nfs mount points configured for the backup, it is just the OS fs backup which is scheduled.

virgo
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34150866
So there are not many options left.

- Bad disk I/O performance (you didn't answer my question about that above).
-- Record an iostat during backup

- Heavy batch jobs in parallel to the backup (what's the task of the slow system anyway?)
-- Record a vmstat during backup

- Other hardware/software issues which might be reflected in the error log.
-- Examine errpt

- TSM client configuration
-- Compare dsm.sys/opt, TSM node definition with the corresponding data of the "good" systems.

wmp

0
 

Author Comment

by:virgo0880
ID: 34174620
Sorry for the delayed response, i will check on the things & revert.

Thanks
virgo
0
 

Author Comment

by:virgo0880
ID: 34191829
Hi , I have checked the NIC setting on the switch & on the system, both are set to full 1 gb. But, when i see dsm.opt with one of my working system, here is the diff :

Working system : dsm.opt
$ cat dsm.opt
************************************************************************
* ADSTAR Distributed Storage Manager                                   *
*                                                                      *
* Sample Client User Options file for AIX and SunOS (dsm.opt.smp)      *
************************************************************************

*  This file contains an option you can use to specify the ADSM
*  server to contact if more than one is defined in your client
*  system options file (dsm.sys).  Copy dsm.opt.smp to dsm.opt.
*  If you enter a server name for the option below, remove the
*  leading asterisk (*).

*  For information about additional options you can set in this file,
*  see the options.doc file in the directory where ADSM was installed.

************************************************************************

* SErvername       A server name defined in the dsm.sys file

While the node on which i have the problem, is having dsm.opt as :

* SErvername       A server name defined in the dsm.sys file
Servername                      TSM
Tracefile   /tmp/Encrtrace.out
traceflags  service
tracemax 1024

What are these extra options defined like tracefile,traceflags,tracemax, is it because of this, the backups are taking more time.

thanks
virgo
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 250 total points
ID: 34191959
Yes, of course!

You're running a fat service trace with each backup, which costs a lot of time (and disk space).

Check /tmp/Encrtrace.out! This file should be rather big - TRACEMAX is set to 1 GB.

Take out these three options if you don't need the trace and you'll see.

Btw. for completeness you should add the servername to dsm.opt of the "working" system. Although this value defaults to the first "Servername" entry in dsm.sys, it's better to have it in dsm.opt as well.





0
 

Author Comment

by:virgo0880
ID: 34192019
I dont see /tmp/Encrtrace.out file on the system. Also, if i remove this option whether i need to restart the client scheduler for the changes to take effect ?

Once removing this option, can i check that the backups are not taking more time, by issuing some commands on the command-line, we have scheduled incremental backups for this system, what command i shud gave on command line to check if it is getting done early.

thanks
virgo
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34192055
You must restart the client scheduler.

To check the effect just examine the statistics at the end of the scheduler log,
that't the easiest way.
If you didn't configure SCHEDLOGNAME in dsm.sys
it defaults to /usr/tivoli/tsm/client/ba/bin/dsmsched.log (or .../bin64/... with the 64bit client).
0
 

Author Comment

by:virgo0880
ID: 34192287
ok, i have removed the trace optioins. Now, can i manually check  by issuing the backup command for the systems, instead of waiting till evening till the backup get starts.

Regards
virgo
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34192329
Yes, of course.

Issue "dsmc i"

If you want a more verbose log in a file and watch the protocol simultaneously at the console issue

"dsmc i -verbose | tee /tmp/dsmincr.log" (filename is just an example)

0
 

Author Comment

by:virgo0880
ID: 34192565
I tried executing the backup again & now it is taking less time for the backup, specially one of the filesystem was taking 2 hours for the backup & now it completed within 4 minutes. I will monitor the backup today & see how much time it wud take for the whole system to backup.

Thanks , wmp for your great help.

virgo
0
 

Author Closing Comment

by:virgo0880
ID: 34233026
OK
0

Featured Post

Edgartown IT Case Study

Learn about Edgartown's quest to ensure the safety and security of the entire town's employee and citizen data. Read the case study!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question