?
Solved

Veeam Ver 6 & vCenter 4.1 Job Failure Issues

Posted on 2012-03-16
7
Medium Priority
?
5,655 Views
Last Modified: 2012-03-27
I'm hoping that someone can give me a hand with this very frustrating error that I have been receiving for the past 2 days which has caused all of our backups to fail repeatedly. The exact error message is the following:

Error: Client error: File does not exist or locked. VMFS path: [[DataStoreName] Server1/Server1.vmx]. Please, try to download specified file using connection to the ESX server where the VM registered. Failed to create NFC download stream. NFC path: [nfc://conn:123.456.789.50,nfchost:host-65,stg:datastore-3433@Server1/Server1.vmx].

I have an open support request with veeam and vmware on this issue. Both vendors are pointing the finger at each other which is very frustrating.

The error seemed to be caused by our switch dying which in turn caused vcenter to lose communication to it's Service Console connection, ESX Hosts and all the vm's. Our entire company went down until we setup a new switch later that day. Everything magically started coming backup up again and working except that our backups failed that night and every night since then. We are running Veeam Version 6 Patch 3, vcenter 4.1, and ESX 4.1 update 2.

Veeam Tech Support is saying that there is an NFC communication issue that vmware should assist in resolving but vmware is saying that veeam is using their API incorrectly. Here is vmware's official responce "Unfortunately, VMware will not be addressing this issue until the next major release at this point, as from our perspective, the API in question is not actually reacting poorly, simply being used incorrectly. The API in question is called the CopyDatastoreFile_Task API, and is designed for use on files that are not locked by our VMFS Distributed Locking system. There are alternative APIs that are available for copying/accessing locked files appropriately (by farming the task to the lock-owning ESX host, or by using one of the alternative lock-slots, depending on what type of lock is in place). " 

Here are the things that I have tried before putting tickets in with both veeam and vmware.

1. Rebooted vcenter and veeam server
2. Rebooted esx servers to try and clear this lock.
3. Deleted and re-setup jobs on the veeam server.
4. Verified communication from veeam server to vcenter and all esx hosts.
5. Powered off vcenter server and veeam server.
6. vmotioned vm's to different esx hosts and also different datastores.
7. Restart mgmt agent and vcenter agent on all esx servers.

The jobs are still failing with that same issue.

The weird thing is that If I setup my ESX hosts by IP address individually in veeam I can back them up my vm's but just not through vcenter. I can also not download the .vmx file or any other file through vcenter. I can only download it directly through the esx host. This issue goes away after I reboot the host but comes back once veeam tries to run a backup job and fails.

I'm open to any suggestions that any one has.

Thanks in advance.
0
Comment
Question by:Papnou
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
7 Comments
 

Author Comment

by:Papnou
ID: 37731951
Another thing that I wanted to mention was that there is no service console lock on the file. I verified that by running lsof | grep command.  I did find the MAC address of the lock holder which is an unused NIC by running the vmkfstools -D (path to .vnx file command)  The result of that command is below.  

Lock [type 10c00001 offset 42524672 v 636, hb offset 3895296
gen 45, mode 1, owner 4f634c4d-63c2a440-1e08-001b2187be06 mtime 444]
Addr <4, 42, 12>, gen 13, links 1, type reg, flags 0, uid 0, gid 0, mode 755
len 3488, nb 1 tbz 0, cow 0, zla 2, bs 65536

MAC Address of owner = 001b2187be06 (vmnic10)

This is very strange to me how an un-used NIC not configured or plugged into anything can have a lock on a vm.  

Just wanted to post this in case anyone has run into this.  I have a feeling this is directly related to my post above.  Thanks.
0
 
LVL 123
ID: 37732172
Check if servers IP addresses and names can be resolved. e.g DNS

or use local hosts files on the Backup Server.
0
 

Author Comment

by:Papnou
ID: 37737847
Thanks for the suggestion.  I have double verified that DNS is working correctly.  I can ping each ESX server by IP and by name.  Any other suggestions?
0
Flexible connectivity for any environment

The KE6900 series can extend and deploy computers with high definition displays across multiple stations in a variety of applications that suit any environment. Expand computer use to stations across multiple rooms with dynamic access.

 
LVL 123
ID: 37737935
what about Reverse DNS, traceroute does it resolve the same?
0
 

Author Comment

by:Papnou
ID: 37739749
Yes, it does.  They both resolve the same.
0
 

Accepted Solution

by:
Papnou earned 0 total points
ID: 37755832
This issue has now been fixed.  A big thanks to Cody from Veeam Tech Support for figuring this out.  

The fix was to uninstall and reinstall the VPXA agent.  I guess there was some corruption in the vCenter database that occurred when we lost the Service Console connections.  A reboot by itself did not correct the behavior as the problem is that vCenter didn't have the correct NFC path in it's DB.

Here is a link to the article on the vmware support site
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1003714

Here are the steps that I followed to get this working again.  

1.   Run the following command via Putty into ESX Host once all vm's have been v-motioned off to another host.

service mgmt-vmware stop && service vmware-vpxa stop && service vmware-vmkauthd stop && service xinetd restart && rpm -qa | grep -i vpxa | awk '{print $1}' | xargs rpm -ef $1 && userdel vpxuser && rpm -qa | grep -i aam | awk '{print $1}' | xargs rpm -ef $1 && service mgmt-vmware start && service vmware-vmkauthd start

2.  Login to vCenter < Choose ESX Host < Right Click < Connect - This initiates a re-install of the agent and prompts you to re-authenticate to the host.  

3.  Reboot the Host.

Hopefully this post will help someone else who might be having the same issue.
0
 

Author Closing Comment

by:Papnou
ID: 37770468
Along with this posting on EE, I also put a ticket in with Veeam Tech Support.  They helped me fix this problem.  I just thought I should post the fix and close out this request.  Thanks.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The following article is comprised of the pearls we have garnered deploying virtualization solutions since Virtual Server 2005 and subsequent 2008 RTM+ Hyper-V in standalone and clustered environments.
What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
Advanced tutorial on how to run the esxtop command to capture a batch file in csv format in order to export the file and use it for performance analysis. He demonstrates how to download the file using a vSphere web client (or vSphere client) and exp…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question