Veeam Replication jobs fail with some sort of timeout

I have a number of Veeam Replication jobs that replicate server VMs offsite. Some network changes were recently made (adding a NAT setup between the office LAN and the remote site) and that caused all manner of problems i.e. all my replication jobs failed. The changes were undone but out of my 5 jobs 2 of them still refuse to complete and they give the same error I was seeing prior to backing out of the NAT changes.

The job fails with the error "Error: Client error: ChannelError: TimedOut" and Veeam Support seem intent on blaming our WAN link.  However, as I said 3 of my jobs complete OK. I've also copied a VM from local to remote hosts and VMware didn't fall over and complain about a WAN problem.

Data size doesn't seem to be a common factor, neither does OS (one server is Win2003 the other Win2008), antivirus (one server has it the other doesn't) or anything else that I can see.

I am at a loss as to why these 2 jobs (which I've tried deleting and recreating from scratch as well) get maybe up to 40-50% done and then just give up.

Can anyone help?

Thanks
funassetAsked:
Who is Participating?
 
funassetAuthor Commented:
Since creating the new Veeam images with the remote host retrieved and sitting on the office LAN, incremental replication has been fine.  The problem appeared to be WAN related. Other processes required bandwidth on the WAN line and they were squeezing Veeam out.  In the end some traffic management values were tweaked in the VPN tunnel through which everything flows. Since then all the processes appear to be playing together nicely.
0
 
MaximVeeamCommented:
Which version of Veeam Backup do you use?
0
 
funassetAuthor Commented:
It reports 6.5.0.128 (64 bit)
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
funassetAuthor Commented:
I think that's the latest they have?
0
 
MaximVeeamCommented:
Yes, you are right. My idea was that the software can be out-of-date. I am looking for a solution on Veeam Forums - http://forums.veeam.com/
0
 
funassetAuthor Commented:
I set a job going on Friday and it worked fine for 11 hours then just gave up with the same error.
0
 
funassetAuthor Commented:
Update:
I've been using a test VM (Windows 2008 Server Standard - clean install) to see if I can find some type of common denominator for the servers that fail to replicate.  My results are

Is problem related to..........
OS? It doesn't seem to be as failures have included various operating systems.
AV software being present? No - again jobs have failed regardless of AV software being present.
Data size? I don't know if Veeam just shoves the entire virtual disk down the wire or if it's dynamic and shoves whatever the used disk data size is. In my test I loaded my test server from clean (12Gb) to 150Gb and all jobs failed. 12Gb is less than some of the server jobs that have succeeded. To me this suggests that the amount of data involved is not a common factor.
Throttling? No - jobs have failed with Throttling On and Off.
Host datastore? No - job failure does not seem specfic to any one datastore.
WAN link - although Veeam Support seemed fixated on a WAN link problem this doesn't explain why some jobs succeed. Also, VMware can migrate a VM copy of a failed source server to the remote host without any trouble which suggests that the link is fine.

If anyone else has any other suggestions I'd be grateful as it's becoming a Royal pain!

Thanks
0
 
MaximVeeamCommented:
Could you please provide the ticket number?
0
 
funassetAuthor Commented:
The problem seems to be down to bandwidth.  The remote host was retrieved and when on the local LAN all replication jobs completed OK bar the usual Veeam "features" of moaning about CBT/"Cannot use SOAP" and calculating 'disk digests' for a job that finished a only a few minutes earlier?

The acid test will be to see if the incremental replication jobs complete OK when the host is returned to the remote site and throttling is reinstated.
0
 
Michael RodríguezSystems EngineerCommented:
Hi funasset,

Do you have a VEEAM proxy VM server setup on your destination ESXi host?  Having a proxy server greatly improves replication times.  The proxy server will then "hot add" the VMDK files of the servers you're replicating, assuming you configure the replication job properly.

I currently use VEEAM in house to replicate 10 VMs offsite.  Luckily our rate of change is really small (5-7GB usually) so replication finishes within 6-8 hours.

If you need more info on the proxy server config, let me know and I'll send you screenshots or whatever.

Hopefully this helps.
0
 
funassetAuthor Commented:
Hi and thanks for the info.

Yes I do have a remote proxy available but I'm not sure if I have the seeding set up OK. Certainly since I've had the physical server back in-house the replication jobs have been working fine but being on the local LAN might not highlight any cockup I might have made in defining where the replication job seeds from.

Some screenshots would be very welcome - many thanks.
0
 
Michael RodríguezSystems EngineerCommented:
Hi funasset,

I've attached multiple screens of my configuration.  From what you stated, just make sure that during the config of the replication job, you're specifying the proper target proxy.

Let me know if you have any more questions.
veeam-proxy.jpg
veeam-job-successandinfo.jpg
veeam-replica-mapping.jpg
veeam-target-proxy.jpg
0
 
funassetAuthor Commented:
Many thanks.

It seems that I have my config the same as yours. I think my problem is down to bandwidth. I don't have exclusive access to our feeble WAN link so when other processes run they seem to squeeze Veeam jobs out. I thought that the Throttling feature would somehow create a fixed pipe for Veeam to use but it seems not and if these other processes need more bandwidth they just take it.

Now the host has been retrieved and new full replications have been created locally I'm hoping that the WAN link will be able to cope with just incremental jobs once the server has been put back.  If they still fail then I guess I'll have to push someone to get a better link!

Thanks again.

To be continued.....................
0
 
Michael RodríguezSystems EngineerCommented:
No problem, funasset, hopefully you'll get the problem resolved soon.

Just curious, your replication target server, it doesn't have any issues in terms of a failed RAID card battery and/or failed disk, correct?

For example, the DL380 G5 I replicate to, if the Smart Array BBU failes on the RAID card, performance slows waaaay the hell down.

I know you said that you have your server locally again and speed seems to be fine when local, but you never know.
0
 
funassetAuthor Commented:
It had some sort of problem a while ago and the RAID card was replaced. I've run diagnostics on it while it's been here and (if you believe Dell Diagnostics!) it claims that all is well.

I appreciate the thought!
0
 
funassetAuthor Commented:
See previous post
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.