Solved

Veeam Ver 6 & vCenter 4.1 Job Failure Issues

Posted on 2012-03-16
7
5,206 Views
Last Modified: 2012-03-27
I'm hoping that someone can give me a hand with this very frustrating error that I have been receiving for the past 2 days which has caused all of our backups to fail repeatedly. The exact error message is the following:

Error: Client error: File does not exist or locked. VMFS path: [[DataStoreName] Server1/Server1.vmx]. Please, try to download specified file using connection to the ESX server where the VM registered. Failed to create NFC download stream. NFC path: [nfc://conn:123.456.789.50,nfchost:host-65,stg:datastore-3433@Server1/Server1.vmx].

I have an open support request with veeam and vmware on this issue. Both vendors are pointing the finger at each other which is very frustrating.

The error seemed to be caused by our switch dying which in turn caused vcenter to lose communication to it's Service Console connection, ESX Hosts and all the vm's. Our entire company went down until we setup a new switch later that day. Everything magically started coming backup up again and working except that our backups failed that night and every night since then. We are running Veeam Version 6 Patch 3, vcenter 4.1, and ESX 4.1 update 2.

Veeam Tech Support is saying that there is an NFC communication issue that vmware should assist in resolving but vmware is saying that veeam is using their API incorrectly. Here is vmware's official responce "Unfortunately, VMware will not be addressing this issue until the next major release at this point, as from our perspective, the API in question is not actually reacting poorly, simply being used incorrectly. The API in question is called the CopyDatastoreFile_Task API, and is designed for use on files that are not locked by our VMFS Distributed Locking system. There are alternative APIs that are available for copying/accessing locked files appropriately (by farming the task to the lock-owning ESX host, or by using one of the alternative lock-slots, depending on what type of lock is in place). " 

Here are the things that I have tried before putting tickets in with both veeam and vmware.

1. Rebooted vcenter and veeam server
2. Rebooted esx servers to try and clear this lock.
3. Deleted and re-setup jobs on the veeam server.
4. Verified communication from veeam server to vcenter and all esx hosts.
5. Powered off vcenter server and veeam server.
6. vmotioned vm's to different esx hosts and also different datastores.
7. Restart mgmt agent and vcenter agent on all esx servers.

The jobs are still failing with that same issue.

The weird thing is that If I setup my ESX hosts by IP address individually in veeam I can back them up my vm's but just not through vcenter. I can also not download the .vmx file or any other file through vcenter. I can only download it directly through the esx host. This issue goes away after I reboot the host but comes back once veeam tries to run a backup job and fails.

I'm open to any suggestions that any one has.

Thanks in advance.
0
Comment
Question by:Papnou
  • 5
  • 2
7 Comments
 

Author Comment

by:Papnou
ID: 37731951
Another thing that I wanted to mention was that there is no service console lock on the file. I verified that by running lsof | grep command.  I did find the MAC address of the lock holder which is an unused NIC by running the vmkfstools -D (path to .vnx file command)  The result of that command is below.  

Lock [type 10c00001 offset 42524672 v 636, hb offset 3895296
gen 45, mode 1, owner 4f634c4d-63c2a440-1e08-001b2187be06 mtime 444]
Addr <4, 42, 12>, gen 13, links 1, type reg, flags 0, uid 0, gid 0, mode 755
len 3488, nb 1 tbz 0, cow 0, zla 2, bs 65536

MAC Address of owner = 001b2187be06 (vmnic10)

This is very strange to me how an un-used NIC not configured or plugged into anything can have a lock on a vm.  

Just wanted to post this in case anyone has run into this.  I have a feeling this is directly related to my post above.  Thanks.
0
 
LVL 118
ID: 37732172
Check if servers IP addresses and names can be resolved. e.g DNS

or use local hosts files on the Backup Server.
0
 

Author Comment

by:Papnou
ID: 37737847
Thanks for the suggestion.  I have double verified that DNS is working correctly.  I can ping each ESX server by IP and by name.  Any other suggestions?
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 
LVL 118
ID: 37737935
what about Reverse DNS, traceroute does it resolve the same?
0
 

Author Comment

by:Papnou
ID: 37739749
Yes, it does.  They both resolve the same.
0
 

Accepted Solution

by:
Papnou earned 0 total points
ID: 37755832
This issue has now been fixed.  A big thanks to Cody from Veeam Tech Support for figuring this out.  

The fix was to uninstall and reinstall the VPXA agent.  I guess there was some corruption in the vCenter database that occurred when we lost the Service Console connections.  A reboot by itself did not correct the behavior as the problem is that vCenter didn't have the correct NFC path in it's DB.

Here is a link to the article on the vmware support site
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1003714

Here are the steps that I followed to get this working again.  

1.   Run the following command via Putty into ESX Host once all vm's have been v-motioned off to another host.

service mgmt-vmware stop && service vmware-vpxa stop && service vmware-vmkauthd stop && service xinetd restart && rpm -qa | grep -i vpxa | awk '{print $1}' | xargs rpm -ef $1 && userdel vpxuser && rpm -qa | grep -i aam | awk '{print $1}' | xargs rpm -ef $1 && service mgmt-vmware start && service vmware-vmkauthd start

2.  Login to vCenter < Choose ESX Host < Right Click < Connect - This initiates a re-install of the agent and prompts you to re-authenticate to the host.  

3.  Reboot the Host.

Hopefully this post will help someone else who might be having the same issue.
0
 

Author Closing Comment

by:Papnou
ID: 37770468
Along with this posting on EE, I also put a ticket in with Veeam Tech Support.  They helped me fix this problem.  I just thought I should post the fix and close out this request.  Thanks.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
Advanced tutorial on how to run the esxtop command to capture a batch file in csv format in order to export the file and use it for performance analysis. He demonstrates how to download the file using a vSphere web client (or vSphere client) and exp…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now