Solved

Veeam Ver 6 & vCenter 4.1 Job Failure Issues

Posted on 2012-03-16
7
5,125 Views
Last Modified: 2012-03-27
I'm hoping that someone can give me a hand with this very frustrating error that I have been receiving for the past 2 days which has caused all of our backups to fail repeatedly. The exact error message is the following:

Error: Client error: File does not exist or locked. VMFS path: [[DataStoreName] Server1/Server1.vmx]. Please, try to download specified file using connection to the ESX server where the VM registered. Failed to create NFC download stream. NFC path: [nfc://conn:123.456.789.50,nfchost:host-65,stg:datastore-3433@Server1/Server1.vmx].

I have an open support request with veeam and vmware on this issue. Both vendors are pointing the finger at each other which is very frustrating.

The error seemed to be caused by our switch dying which in turn caused vcenter to lose communication to it's Service Console connection, ESX Hosts and all the vm's. Our entire company went down until we setup a new switch later that day. Everything magically started coming backup up again and working except that our backups failed that night and every night since then. We are running Veeam Version 6 Patch 3, vcenter 4.1, and ESX 4.1 update 2.

Veeam Tech Support is saying that there is an NFC communication issue that vmware should assist in resolving but vmware is saying that veeam is using their API incorrectly. Here is vmware's official responce "Unfortunately, VMware will not be addressing this issue until the next major release at this point, as from our perspective, the API in question is not actually reacting poorly, simply being used incorrectly. The API in question is called the CopyDatastoreFile_Task API, and is designed for use on files that are not locked by our VMFS Distributed Locking system. There are alternative APIs that are available for copying/accessing locked files appropriately (by farming the task to the lock-owning ESX host, or by using one of the alternative lock-slots, depending on what type of lock is in place). "

Here are the things that I have tried before putting tickets in with both veeam and vmware.

1. Rebooted vcenter and veeam server
2. Rebooted esx servers to try and clear this lock.
3. Deleted and re-setup jobs on the veeam server.
4. Verified communication from veeam server to vcenter and all esx hosts.
5. Powered off vcenter server and veeam server.
6. vmotioned vm's to different esx hosts and also different datastores.
7. Restart mgmt agent and vcenter agent on all esx servers.

The jobs are still failing with that same issue.

The weird thing is that If I setup my ESX hosts by IP address individually in veeam I can back them up my vm's but just not through vcenter. I can also not download the .vmx file or any other file through vcenter. I can only download it directly through the esx host. This issue goes away after I reboot the host but comes back once veeam tries to run a backup job and fails.

I'm open to any suggestions that any one has.

Thanks in advance.
0
Comment
Question by:Papnou
  • 5
  • 2
7 Comments
 

Author Comment

by:Papnou
Comment Utility
Another thing that I wanted to mention was that there is no service console lock on the file. I verified that by running lsof | grep command.  I did find the MAC address of the lock holder which is an unused NIC by running the vmkfstools -D (path to .vnx file command)  The result of that command is below.  

Lock [type 10c00001 offset 42524672 v 636, hb offset 3895296
gen 45, mode 1, owner 4f634c4d-63c2a440-1e08-001b2187be06 mtime 444]
Addr <4, 42, 12>, gen 13, links 1, type reg, flags 0, uid 0, gid 0, mode 755
len 3488, nb 1 tbz 0, cow 0, zla 2, bs 65536

MAC Address of owner = 001b2187be06 (vmnic10)

This is very strange to me how an un-used NIC not configured or plugged into anything can have a lock on a vm.  

Just wanted to post this in case anyone has run into this.  I have a feeling this is directly related to my post above.  Thanks.
0
 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
Check if servers IP addresses and names can be resolved. e.g DNS

or use local hosts files on the Backup Server.
0
 

Author Comment

by:Papnou
Comment Utility
Thanks for the suggestion.  I have double verified that DNS is working correctly.  I can ping each ESX server by IP and by name.  Any other suggestions?
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 117

Expert Comment

by:Andrew Hancock (VMware vExpert / EE MVE)
Comment Utility
what about Reverse DNS, traceroute does it resolve the same?
0
 

Author Comment

by:Papnou
Comment Utility
Yes, it does.  They both resolve the same.
0
 

Accepted Solution

by:
Papnou earned 0 total points
Comment Utility
This issue has now been fixed.  A big thanks to Cody from Veeam Tech Support for figuring this out.  

The fix was to uninstall and reinstall the VPXA agent.  I guess there was some corruption in the vCenter database that occurred when we lost the Service Console connections.  A reboot by itself did not correct the behavior as the problem is that vCenter didn't have the correct NFC path in it's DB.

Here is a link to the article on the vmware support site
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1003714

Here are the steps that I followed to get this working again.  

1.   Run the following command via Putty into ESX Host once all vm's have been v-motioned off to another host.

service mgmt-vmware stop && service vmware-vpxa stop && service vmware-vmkauthd stop && service xinetd restart && rpm -qa | grep -i vpxa | awk '{print $1}' | xargs rpm -ef $1 && userdel vpxuser && rpm -qa | grep -i aam | awk '{print $1}' | xargs rpm -ef $1 && service mgmt-vmware start && service vmware-vmkauthd start

2.  Login to vCenter < Choose ESX Host < Right Click < Connect - This initiates a re-install of the agent and prompts you to re-authenticate to the host.  

3.  Reboot the Host.

Hopefully this post will help someone else who might be having the same issue.
0
 

Author Closing Comment

by:Papnou
Comment Utility
Along with this posting on EE, I also put a ticket in with Veeam Tech Support.  They helped me fix this problem.  I just thought I should post the fix and close out this request.  Thanks.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Veeam Backup & Replication has added a new integration – Veeam Backup for Microsoft Office 365.  In this blog, we will discuss how you can benefit from Office 365 email backup with the Veeam’s new product and try to shed some light on the needs and …
In this step by step tutorial with screenshots, we will show you HOW TO: Enable SSH Remote Access on a VMware vSphere Hypervisor 6.5 (ESXi 6.5). This is important if you need to enable SSH remote access for additional troubleshooting of the ESXi hos…
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

5 Experts available now in Live!

Get 1:1 Help Now