Link to home
Start Free TrialLog in
Avatar of funasset
funasset

asked on

Veeam Replication jobs fail with some sort of timeout

I have a number of Veeam Replication jobs that replicate server VMs offsite. Some network changes were recently made (adding a NAT setup between the office LAN and the remote site) and that caused all manner of problems i.e. all my replication jobs failed. The changes were undone but out of my 5 jobs 2 of them still refuse to complete and they give the same error I was seeing prior to backing out of the NAT changes.

The job fails with the error "Error: Client error: ChannelError: TimedOut" and Veeam Support seem intent on blaming our WAN link.  However, as I said 3 of my jobs complete OK. I've also copied a VM from local to remote hosts and VMware didn't fall over and complain about a WAN problem.

Data size doesn't seem to be a common factor, neither does OS (one server is Win2003 the other Win2008), antivirus (one server has it the other doesn't) or anything else that I can see.

I am at a loss as to why these 2 jobs (which I've tried deleting and recreating from scratch as well) get maybe up to 40-50% done and then just give up.

Can anyone help?

Thanks
Avatar of MaximVeeam
MaximVeeam

Which version of Veeam Backup do you use?
Avatar of funasset

ASKER

It reports 6.5.0.128 (64 bit)
I think that's the latest they have?
Yes, you are right. My idea was that the software can be out-of-date. I am looking for a solution on Veeam Forums - http://forums.veeam.com/
I set a job going on Friday and it worked fine for 11 hours then just gave up with the same error.
Update:
I've been using a test VM (Windows 2008 Server Standard - clean install) to see if I can find some type of common denominator for the servers that fail to replicate.  My results are

Is problem related to..........
OS? It doesn't seem to be as failures have included various operating systems.
AV software being present? No - again jobs have failed regardless of AV software being present.
Data size? I don't know if Veeam just shoves the entire virtual disk down the wire or if it's dynamic and shoves whatever the used disk data size is. In my test I loaded my test server from clean (12Gb) to 150Gb and all jobs failed. 12Gb is less than some of the server jobs that have succeeded. To me this suggests that the amount of data involved is not a common factor.
Throttling? No - jobs have failed with Throttling On and Off.
Host datastore? No - job failure does not seem specfic to any one datastore.
WAN link - although Veeam Support seemed fixated on a WAN link problem this doesn't explain why some jobs succeed. Also, VMware can migrate a VM copy of a failed source server to the remote host without any trouble which suggests that the link is fine.

If anyone else has any other suggestions I'd be grateful as it's becoming a Royal pain!

Thanks
Could you please provide the ticket number?
The problem seems to be down to bandwidth.  The remote host was retrieved and when on the local LAN all replication jobs completed OK bar the usual Veeam "features" of moaning about CBT/"Cannot use SOAP" and calculating 'disk digests' for a job that finished a only a few minutes earlier?

The acid test will be to see if the incremental replication jobs complete OK when the host is returned to the remote site and throttling is reinstated.
Hi funasset,

Do you have a VEEAM proxy VM server setup on your destination ESXi host?  Having a proxy server greatly improves replication times.  The proxy server will then "hot add" the VMDK files of the servers you're replicating, assuming you configure the replication job properly.

I currently use VEEAM in house to replicate 10 VMs offsite.  Luckily our rate of change is really small (5-7GB usually) so replication finishes within 6-8 hours.

If you need more info on the proxy server config, let me know and I'll send you screenshots or whatever.

Hopefully this helps.
Hi and thanks for the info.

Yes I do have a remote proxy available but I'm not sure if I have the seeding set up OK. Certainly since I've had the physical server back in-house the replication jobs have been working fine but being on the local LAN might not highlight any cockup I might have made in defining where the replication job seeds from.

Some screenshots would be very welcome - many thanks.
Hi funasset,

I've attached multiple screens of my configuration.  From what you stated, just make sure that during the config of the replication job, you're specifying the proper target proxy.

Let me know if you have any more questions.
veeam-proxy.jpg
veeam-job-successandinfo.jpg
veeam-replica-mapping.jpg
veeam-target-proxy.jpg
Many thanks.

It seems that I have my config the same as yours. I think my problem is down to bandwidth. I don't have exclusive access to our feeble WAN link so when other processes run they seem to squeeze Veeam jobs out. I thought that the Throttling feature would somehow create a fixed pipe for Veeam to use but it seems not and if these other processes need more bandwidth they just take it.

Now the host has been retrieved and new full replications have been created locally I'm hoping that the WAN link will be able to cope with just incremental jobs once the server has been put back.  If they still fail then I guess I'll have to push someone to get a better link!

Thanks again.

To be continued.....................
No problem, funasset, hopefully you'll get the problem resolved soon.

Just curious, your replication target server, it doesn't have any issues in terms of a failed RAID card battery and/or failed disk, correct?

For example, the DL380 G5 I replicate to, if the Smart Array BBU failes on the RAID card, performance slows waaaay the hell down.

I know you said that you have your server locally again and speed seems to be fine when local, but you never know.
It had some sort of problem a while ago and the RAID card was replaced. I've run diagnostics on it while it's been here and (if you believe Dell Diagnostics!) it claims that all is well.

I appreciate the thought!
ASKER CERTIFIED SOLUTION
Avatar of funasset
funasset

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
See previous post