Performance Degredation of Veritas 9.1
Posted on 2004-11-01
Currently have an issue with Backup Exec slowly degrading performance on large backups. The current setup is the following:
Source server (SS) is a Dual Opteron 2.4G w/3G Memory, Win2000 SP4. Attached to this is a Storcase which the data is being backed up. Connection between both devices is controlled by LSI Logic 1020/1030 Ultra320 SCSI Adapter.
Traffic is transported over Cat6 to a Cisco 2970 Gig switch and then transferred over to the backup server.
The backup server is a Dual AMD Athlon MP 2400+ w/2G Memory, Win2000 SP4, Veritas 9.1 SP1. The backup server (BS) has an Adaptec iSCSI 7211c which is connected to a REO 4000 (CAT6) and a Adaptec SCSI 39160/3960D Ultra 160 connecting to a Dell PowerVault 132T.
When backing up a large backup job i.e. 500Gb backup the performance on the BS slowly degrades over time stating out backing up for approx 1-2 hours at the full bandwidth of around 1350Mbs. After sometime the transfer rate begins to degrade. The processors on the BS are being pegged at around 90-100%. The two main ‘Process’ showing are the ‘System’, utilizing in between 40-70% utilization of the processors and the ‘bengine.exe’ using anywhere from 20-40% of the processors. The current backup now is approx 500Gb and has been running for 44:20:00 w/158.0 Mb/min throughput.
File swapping does not seem to be an issue being that following figures are current approximations of performance being there is almost 1G of available memory at all times during backups (The BS is on a 0+1 SCSI Drives). This was originally 1G of total physical memory and was paging to the disk during backups but since adding another 1G this has not been an issue. I also went through Microsoft Article ID: 304101 for issues with unsuccessful backups for large system volume.
System Cache 1794540
When looking at the processes I have noticed the following but am not sure if the issue is this point in the process.
Bengine.exe I/O Reads- 27,148,942 I/O Writes- 14,120,225
I/O Read bytes- 506,148,210,256 I/O Write Bytes- 462,656,248,558
I am not getting any performance loss across the LAN or any errors or dropped packets from what the 2970 Cisco is telling me and the Tx and Rx match on both termination points from the Cisco switch to both devices when referencing Tx to Rx from one device to the other (BS to SS)
The BS is set to be running in background services mode, which I would assume is correct???
The iSCSI is a separate private network of 10.10.1.1 and 10.10.1.2. The only question I would have on that is would the gateway on the iSCSI be better set to the BS or the REO but I do not believe that would make any significant difference. Currently the gateway is set to the BS server.
Once the initial job is completed of backing up the SS to ‘backup to disk/folder’ a ‘Duplicate to Tape’ job occurs. This job runs from the REO 4000 back to the BS server to the Dell PowerVault. This job has no issues with performance getting the full/maximum bandwidth from source to destination. The CPU’s operate at ‘normal’ parameters when this job is running.
Most of the data is archived data being that the facility is a DR facility for banks and check/item processing for several banks. Only about 30% of the total backup is small ‘check images’. Most software backup is Bankware and or Bisys data ‘Archive Data’. I am currently running AOFO and the remote agent on the SS. I have turned of AOFO with the same results in backup performance.
The SS does not have the performance issues that the BS is having. Processes are well under maximum values and no stress is being applied during these backups on the SS.
Any suggestions would be greatly appreciate.