Solved

Symantec System Recovery running very slowly

Posted on 2012-03-16
18
1,473 Views
Last Modified: 2012-03-28
Hello,

I run SSR 2011 in a number of virtual machines that create the image file on a physical server that offloads to tape.  We have been having issues where the backups start well and run well for some hours then slow right down.  I am  using 2gb file sizes and backing up in one instance about 300gb of data.  the base was going OK overnight until about 3am when the files that we being created every 4 mins started to be created avery 15, then 30 sometimes 40 minutes.  Still running now.

The destination address is easily accessible, the are no queued packets and the network performace in task manager is very low on both machines.

The whole network was going slow when this was still running (very slowly) during the day and when I cancelled it the whole network returned to normal.

Any suggestions?

I have already reinstalled SSR 2011 with the latest version, but that may not be the issue...

 - Neil
0
Comment
Question by:neilbuckman
  • 9
  • 6
  • 3
18 Comments
 
LVL 42

Expert Comment

by:paulsolov
ID: 37731935
What is the total time for the job?  If it's backing up a server with a lot of small files vs a few large files this can occur.
0
 

Author Comment

by:neilbuckman
ID: 37731947
The main thing on the server is an SQL mail archive database.  I don't think that there are an unusual number of samle files, and if so, nothing has changed in that respect.

The job has been runing now for 20 hours and thinks it has 5 hours to go...
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 37731988
Is the mail archive database already compressed data or is it native SQL databases?  The reason I ask is that if you have data that is already compressed it will try to compress it but since data is already compressed it cannot.  

20 hours is way too long make sure it's transferring on a 1gb port all the way through or try to image to usb to take the network out of the picture.
0
 

Author Comment

by:neilbuckman
ID: 37732030
It is native SQL and it is 1gb all through- actually the virtual NIC is 10gb.  Does the program need more working space for bigger files or does the size of the image files help or hinde?  I chose 2gb files and it is a full VSS backup.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 37732044
It *could* be a problem with your HDDs/disk/raid controller.  Look at event logs.  Maybe the hardware is doing some error recovery and it has absolutely nothing to do with the restore process itself.
0
 

Author Comment

by:neilbuckman
ID: 37732141
Thanks for that suggestion.  I checked the physical Dell server where the backup is being written and the Server Manager says all the drives are good and that there are no predicted failures.  I tested writing a 500mb file to both raid groups on that server and both were normal, even while the other backup is still crawling along.

Likewise there are no errors coming out of the SAN on which the VM's live.

Other (incremental) images are being written to the same backup destination quite quickly and normally while the other is running.  But the VM with the big image running is still slow slow.
0
 
LVL 42

Assisted Solution

by:paulsolov
paulsolov earned 250 total points
ID: 37732547
Try this when only imaging this server to the destination. The bottleneck may be at the destination side
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 250 total points
ID: 37732587
You beat me to it, paulsalov .... There are two sides, source and destination.   You need to make sure the hardware is OK on both sides.  

Also, just because the logs don't pick up on an error doesn't mean there isn't a problem.  (It just means it will be harder to detect).

You are in a windows environment, so run perfmon and see where the bottleneck is.  Memory, cpu, disk, or network.
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 37733154
This is why  I test with USB drive to take network and destination out of the equation
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 47

Expert Comment

by:dlethe
ID: 37733177
Same here, then also boot the USB stick to LINUX, then use dd to do raw disk I/O into the bit bucket and then use one of many free tools to do some network I/O.  This takes windows and drivers out of the picture.

Determine enough "where the problem isn't"s and you end up with where the problem is. Too hard to test such things with so many unknowns.
0
 

Author Comment

by:neilbuckman
ID: 37733594
Thanks for this input. I will do some more testing later today.
0
 

Author Comment

by:neilbuckman
ID: 37734548
A Dell technician has checked the logs from the SAN and it is all fine.  The destination server is also OK as I can write to it from other machines quickly.

It seems that the isue is related to the connection between one host and the SAN, or soemthing in the host itself, although nothing is apparent.  The hosts run ESXi 4.1.  Just copying files from VM to VM I see a pattern where any copy that uses that connection has a problem - ie, is slow.  There are 2 NIC's on each host that connect through 2 redundant switches to the two Service Processors on the SAN (iSCSI).  Nothing in vSphere indicates any problem.  

Any suggesitions appreciated ...
0
 

Accepted Solution

by:
neilbuckman earned 0 total points
ID: 37739909
I am hopeful that we might have found the problem and it relates to a network bottleneck caused by the offsite copy running (or trying to run) at the same time as the images are being created.  I have disabled all the offsite copies and things at this stage look like thay are back to normal.  Fingers crossed.

I think we have reached a tipping point as far as volume is concerned and we need to rethink how we backup so that backup traffic is not travelling on the network at all.  I am looking at Veeam as an option, running on a virtual machine and backing up either to the SAN or a USB external drive.

I will let things settle for a day or two before (hopefully) closing this issue.

Thanks for all the contributions thus far...
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 37740073
OK..the offsite copy could definately be the culprit. I have had multiple calls to Symantec about this and Offsite copy using FTP protocol uses Microsoft's embedded FTP service so you'll get no better or worse performance then using MS FTp to copy a file over the wire

Keep in mind that unless you're running BESR 8.5 you cannot use VMWare converter to convert natively to a VM directly from image.

Veeam does a much better with replication since it uses it's own protocol and compresses the VM as well as keeping it in its native format versus a SRS image which you have to convert.  If you have another ESXi host you can even replicate localy so you would have a warm backups in place.
0
 

Author Comment

by:neilbuckman
ID: 37740089
Thanks,

I am encouraged by the positive comments I am hearing about Veeam.  

We are using SSR 2011 and I am aware that it is a bit of a process recovering the image.  Great for getting single files back though.
0
 
LVL 42

Expert Comment

by:paulsolov
ID: 37740112
You can also get single files back via Veeam or Quest vRanger but you can also repicate and/or backup with each product as well.
0
 

Author Comment

by:neilbuckman
ID: 37759519
We had a crisis a few nights back when the main switch ceased to function suddenly. After much resetting and finally updating the firmware it has been good and backups are quick and painless.

It may have been a combination of the switch and the offsite copies that were causing the pain or the switch may be coincidental. But it is good now and we are taking steps to improve network capacity and redundancy.

I think case closed..

Thanks for the help.
0
 

Author Closing Comment

by:neilbuckman
ID: 37775771
The solution was not exactly hit on by the comments but the investigative advice was good and useful
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Suggested Solutions

When we have a dead host and we lose all connections to the ESXi, and we need to find a way to move all VMs from that dead ESXi host.
Exchange server is not supported in any cloud-hosted platform (other than Azure with Azure Premium Storage).
This tutorial will walk an individual through the process of configuring basic necessities in order to use the 2010 version of Data Protection Manager. These include storage, agents, and protection jobs. Launch Data Protection Manager from the deskt…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now