Backup throughput degrades from remote AIX 5.3 using Backup Exec 12.5 remote agent

I have Backup Exec 12.5 on Windows 2003 Enterprise trying to backup an AIX 5.3 partition to LTO4 drives via a BE AIX remote agent installed on the AIX Server. Up until a week ago backup of 890 GB took a little over 6 hours with a throughput of 2.3GBpm. Now I have the same job running about 2MBpm. It originally started out at 8MBpm but slowly degrades to 2MBpm.
Cleaned drives, restarted AIX agent & BE server services. Finding very little online that addresses this issue. Any ideas where to look or how to fix?
Thanks, Joe
jdonesAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

woolmilkporcCommented:

Hi,

this sounds quite like a network problem. Any changes in your network environment?
Particularly have a look at speed/auto-negotiate settings, which shoud be consistent on both sides (server/switch).
Keep in mind that you'll have to check this on your VIO server, if you're using VIO.
Might also be related to MTU sizes.
Another thing to examine is of course the overall load of the network and the network interfaces, and CPU.
 
wmp
0
SalfordsFinestCommented:
Sometimes Backup Exec needs to renegotiate the connection between the drive and the server.  A server reboot will sort that out in the interim.  Long term check that you have the latest veritas service packs and updates installed and that you're using the veritas drivers for the tape drive.

One other thing, check the NIC settings they may need changing from AutoDetect to Full Duplex.
0
Rodney BarnhardtServer AdministratorCommented:
We had a similar problem, where the time to backup the server jumped and the rate went down. After some trouble shooting, it ended up being a bad switch. All of our servers for that office were segmented from the rest of the network on a single switch. When we replaced that switch, our performance came back. It was an unmanaged switch in a small office.
0
jdonesAuthor Commented:
I checked the network configurations on both servers & the switch. All set for auto-negotiate & all flowing at 1Gbps/ Full Duplex speed/config. I performed a test copy to the files using WinSCP to the problem partition & another partition using a different LUN. The throughput speeds were consistent & did not degrade in speed.
However, the problem partition contains the running applications for the server, Oracle, Peoplesoft, Sybase, Siebel, JDE, Etc. Also, the directory is 94% full. I think it may be a combination of disk saturation causing disk thrashing + all of the small files contained in each application subdirectory that is causing  an issue with Backup Exec's ability to manage the backup job. A lot of little files tend to slow down the backup process.
What do you think?
0
woolmilkporcCommented:
Yes,

>>  A lot of little files tend to slow down the backup process << - that's more than true, unfortunately.

You wrote that the degradation began a week ago. Does this correlate with the increase in number of small files, or have they been there before?

Did you try to tune block size, buffer size and buffer count in drive properties?

I mostly use TSM and thus can't say much about BE tuning, but blocking/buffering is always a good idea.

>> the problem partition contains the running applications << - do you see high CPU load during the backup process? If yes, any chance to move it off-shift?

How about BE's network settings? Particularly TCP_NODELAY is useful with many small files. Does this setting exist in BE? Are there tunable buffer settings at the BE client side?

Additionally, you could set AIX's tcp_nodelayack to 1 and see if it helps (no -o tcp_nodelayack=1). This setting is useful to overcome the weak implementation of Nagle's algorithm in Windows. Attention - in a highly CPU constrained environment it could cause too much overhead (well, not very likely, but who knows ...)

wmp





0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.