Solved

Veeam Replication jobs fail with some sort of timeout

Posted on 2013-01-25
16
10,874 Views
Last Modified: 2013-05-20
I have a number of Veeam Replication jobs that replicate server VMs offsite. Some network changes were recently made (adding a NAT setup between the office LAN and the remote site) and that caused all manner of problems i.e. all my replication jobs failed. The changes were undone but out of my 5 jobs 2 of them still refuse to complete and they give the same error I was seeing prior to backing out of the NAT changes.

The job fails with the error "Error: Client error: ChannelError: TimedOut" and Veeam Support seem intent on blaming our WAN link.  However, as I said 3 of my jobs complete OK. I've also copied a VM from local to remote hosts and VMware didn't fall over and complain about a WAN problem.

Data size doesn't seem to be a common factor, neither does OS (one server is Win2003 the other Win2008), antivirus (one server has it the other doesn't) or anything else that I can see.

I am at a loss as to why these 2 jobs (which I've tried deleting and recreating from scratch as well) get maybe up to 40-50% done and then just give up.

Can anyone help?

Thanks
0
Comment
Question by:funasset
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 3
  • 3
16 Comments
 
LVL 5

Expert Comment

by:MaximVeeam
ID: 38818222
Which version of Veeam Backup do you use?
0
 

Author Comment

by:funasset
ID: 38818244
It reports 6.5.0.128 (64 bit)
0
 

Author Comment

by:funasset
ID: 38818666
I think that's the latest they have?
0
Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

 
LVL 5

Expert Comment

by:MaximVeeam
ID: 38818683
Yes, you are right. My idea was that the software can be out-of-date. I am looking for a solution on Veeam Forums - http://forums.veeam.com/
0
 

Author Comment

by:funasset
ID: 38826375
I set a job going on Friday and it worked fine for 11 hours then just gave up with the same error.
0
 

Author Comment

by:funasset
ID: 38839118
Update:
I've been using a test VM (Windows 2008 Server Standard - clean install) to see if I can find some type of common denominator for the servers that fail to replicate.  My results are

Is problem related to..........
OS? It doesn't seem to be as failures have included various operating systems.
AV software being present? No - again jobs have failed regardless of AV software being present.
Data size? I don't know if Veeam just shoves the entire virtual disk down the wire or if it's dynamic and shoves whatever the used disk data size is. In my test I loaded my test server from clean (12Gb) to 150Gb and all jobs failed. 12Gb is less than some of the server jobs that have succeeded. To me this suggests that the amount of data involved is not a common factor.
Throttling? No - jobs have failed with Throttling On and Off.
Host datastore? No - job failure does not seem specfic to any one datastore.
WAN link - although Veeam Support seemed fixated on a WAN link problem this doesn't explain why some jobs succeed. Also, VMware can migrate a VM copy of a failed source server to the remote host without any trouble which suggests that the link is fine.

If anyone else has any other suggestions I'd be grateful as it's becoming a Royal pain!

Thanks
0
 
LVL 5

Expert Comment

by:MaximVeeam
ID: 38867831
Could you please provide the ticket number?
0
 

Author Comment

by:funasset
ID: 38875580
The problem seems to be down to bandwidth.  The remote host was retrieved and when on the local LAN all replication jobs completed OK bar the usual Veeam "features" of moaning about CBT/"Cannot use SOAP" and calculating 'disk digests' for a job that finished a only a few minutes earlier?

The acid test will be to see if the incremental replication jobs complete OK when the host is returned to the remote site and throttling is reinstated.
0
 
LVL 4

Expert Comment

by:Michael Rodríguez
ID: 38907050
Hi funasset,

Do you have a VEEAM proxy VM server setup on your destination ESXi host?  Having a proxy server greatly improves replication times.  The proxy server will then "hot add" the VMDK files of the servers you're replicating, assuming you configure the replication job properly.

I currently use VEEAM in house to replicate 10 VMs offsite.  Luckily our rate of change is really small (5-7GB usually) so replication finishes within 6-8 hours.

If you need more info on the proxy server config, let me know and I'll send you screenshots or whatever.

Hopefully this helps.
0
 

Author Comment

by:funasset
ID: 38908833
Hi and thanks for the info.

Yes I do have a remote proxy available but I'm not sure if I have the seeding set up OK. Certainly since I've had the physical server back in-house the replication jobs have been working fine but being on the local LAN might not highlight any cockup I might have made in defining where the replication job seeds from.

Some screenshots would be very welcome - many thanks.
0
 
LVL 4

Expert Comment

by:Michael Rodríguez
ID: 38910713
Hi funasset,

I've attached multiple screens of my configuration.  From what you stated, just make sure that during the config of the replication job, you're specifying the proper target proxy.

Let me know if you have any more questions.
veeam-proxy.jpg
veeam-job-successandinfo.jpg
veeam-replica-mapping.jpg
veeam-target-proxy.jpg
0
 

Author Comment

by:funasset
ID: 38913283
Many thanks.

It seems that I have my config the same as yours. I think my problem is down to bandwidth. I don't have exclusive access to our feeble WAN link so when other processes run they seem to squeeze Veeam jobs out. I thought that the Throttling feature would somehow create a fixed pipe for Veeam to use but it seems not and if these other processes need more bandwidth they just take it.

Now the host has been retrieved and new full replications have been created locally I'm hoping that the WAN link will be able to cope with just incremental jobs once the server has been put back.  If they still fail then I guess I'll have to push someone to get a better link!

Thanks again.

To be continued.....................
0
 
LVL 4

Expert Comment

by:Michael Rodríguez
ID: 38915150
No problem, funasset, hopefully you'll get the problem resolved soon.

Just curious, your replication target server, it doesn't have any issues in terms of a failed RAID card battery and/or failed disk, correct?

For example, the DL380 G5 I replicate to, if the Smart Array BBU failes on the RAID card, performance slows waaaay the hell down.

I know you said that you have your server locally again and speed seems to be fine when local, but you never know.
0
 

Author Comment

by:funasset
ID: 38917458
It had some sort of problem a while ago and the RAID card was replaced. I've run diagnostics on it while it's been here and (if you believe Dell Diagnostics!) it claims that all is well.

I appreciate the thought!
0
 

Accepted Solution

by:
funasset earned 0 total points
ID: 39167551
Since creating the new Veeam images with the remote host retrieved and sitting on the office LAN, incremental replication has been fine.  The problem appeared to be WAN related. Other processes required bandwidth on the WAN line and they were squeezing Veeam out.  In the end some traffic management values were tweaked in the VPN tunnel through which everything flows. Since then all the processes appear to be playing together nicely.
0
 

Author Closing Comment

by:funasset
ID: 39180413
See previous post
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you try to share a printer , you may receive one of the following error messages. Error message when you use the Add Printer Wizard to share a printer: Windows could not share your printer. Operation could not be completed (Error 0x000006…
During and after that shift to cloud, one area that still poses a struggle for many organizations is what to do with their department file shares.
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question