VMWare, GhettoVCB and Fibre Channel backup issue to a selected Datastore

I have 3 Fibre Channel SANs that I image my VMs to, that share storage with another ESXi host.  The benefit of this is having an entire spare host that I can light up if anything happens to my primary hosts.  I simply rename the backup%date% folder to Backups and all the backup VMs are pointed to their respective vmdks already in that folder.  GhettoVCB, as opposed to trilead or veeam, allows me to FREELY image a running guest VM to this shared datastore on free esxi.  As far as I know, there is no other way to do this at the cost of free.  Veeam wants vcenter and trilead only lets you offload the VMs as disk files to your Local computer.

With that all being said, please don't grill me about my backup solution or methods, I have my reasons and up until now, they've been tried and true.  

Problem:

Out of my three fibre channel SANs, I have 6 giant RAID-0 arrays, 2 per box.    I have two esxi hosts with 6 VMs per.  Host A and Host B.  All 12 VMs backup using a script I wrote to which ever datastore is next in rotation, allowing to have multiple copies of my datacenter if I need to revert back, etc..  Three of the VMs on Host A fail to backup successfully ever since upgrading all three hosts (Host A, Host B and my spare host) to Dell's latest rev of VMWARE 6.0 U1 (OCT '15).  But, they only fail on one of the datastores out of the 6...  I remember, even though the backup was set for thin disks, it said something about enabling vmkernel for 2gb sparse when it began.  All three disks are OS disks that ghettovcb complains about.  Two windows 2008 and one 2008r2.  Even wierder, I used the p2v converter to re-image the smallest of the three failed VMs back to the host and it still fails...

My only assumption is that something to do with the block size of that datastore is causing an issue?  I've nuked the software RAID and formatted the disks, still have the issue.  I've changed the name of the backup folder and when I re-imaged the one VM with p2v, it even rename the disk file..  I want to say all disks involved have been "extended" a couple of times, but I'm not 100%.  At least two of them have, though.   When the hosts were 5.5, this datastore worked fine and the other 9 VMs would agree.

Datastore used to work, ESXi version changed, new ghettovcb script had to be downloaded from github.  Still, the script works on the same host, different datastores.  Just not this one in particular.  UGH!

I've rebooted all three hosts.  I've reformatted the datastore using dd.

Any help?
LVL 16
Chris HInfrastructure ManagerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I would drop a note on Github/VMware Community forums, for the maintainers of GhettoVCB to have a look, as GhettoVCB is just not popular any more!

do you have an error message ?
Chris HInfrastructure ManagerAuthor Commented:
2015-10-22 22:41:14 -- info: Initiate backup for IIS
2015-10-22 22:41:14 -- info: Creating Snapshot "ghettoVCB-snapshot-2015-10-22" for IIS
2015-10-22 22:46:03 -- info: ERROR: error in backing up of "/vmfs/volumes/55872700-83ec85fb-5ce4-0015178fc036/VMDISKS/IIS/IIS-RECOVER.vmdk" for IIS
2015-10-22 22:46:05 -- info: Removing snapshot from IIS ...
2015-10-22 22:46:05 -- info: Backup Duration: 4.85 Minutes
2015-10-22 22:46:05 -- info: ERROR: Unable to backup IIS due to error in VMDK backup!

2015-10-22 22:46:05 -- info: ###### Final status: ERROR: No VMs backed up! ######
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
what is the size of the IIS-RECOVER.vmdk ?
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

Chris HInfrastructure ManagerAuthor Commented:
32GB vmdk (33,554,430.00KB) with a 30GB NTFS partition on it.  22 GB used out of 30GB.

Interestingly enough, both failed disk files are exactly 4,172,800.00KB , like they failed at the exact same point.  I think you're on to something here...
Chris HInfrastructure ManagerAuthor Commented:
The other one did the same thing.  It's a 100GB vmdk, 100GB ntfs partition, only backed up exactly 5,042,176.00KB both times...  So, what am I thinking here...

The IIS-RECOVER is the machine I p2v'd, thinking this would correct the guest disk or assuming it was the guest disk in error...
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
can you screenshot the folder, at the console/ssh....so I can just look at the files
Chris HInfrastructure ManagerAuthor Commented:
Chris HInfrastructure ManagerAuthor Commented:
I manually copied over the entire backup set from one SAN's datastore back to this one and it completed no problems.  Took the same amount of time (4hrs) it would have to backup using the script.  So, it's not a size issue, disk corruption issue or file logic issue.
Chris HInfrastructure ManagerAuthor Commented:
Ok, I think I know what's doing it, or at least what the similarities and patterns are.  

The troubled datastore has vmfs 5.61.  It is the only one with...

And of course the friggin' disk is locked so I won't be able to test anything for another 8 hours....

So, the three machines that fail are all windows 2008 (not 2008 R2).  One of them is actually a virtual machine built on 2003 and upgraded internally to 2008 (not r2) without updating the virtual template.  This, to me, means that it's not the controller.  Two of the three failed disks are thick lazy zero and one is thin.  

Has to be something with vmfs 5.61 and windows 2008.  Or, just an inherent bug or flaw I'm completely overlooking in the docs.

Have you seen this?
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Escalate it to the Script owner via Github.
Chris HInfrastructure ManagerAuthor Commented:
The issue isn't with the script.  Has something to do with VMFS 5.61 (esxi 6.0) and vmkfstools (esxi 6.0) writing these three dynamic windows 2008 disk files.  Even without the script, issuing a simple disk clone command fired off the same error.  I can't explain it and I don't have enough data to point the finger at anyone.  And since I'm a free ESXi shop, they can have fun with that one...

Solution:
Installed 5.5 trial on a flash drive, unmounted, deleted, formated new, formatted a new datastore to vmfs 5.60 (esxi 5.5), copied scripts back on and now the problem has gone away.

This problem arose on ESXI 6.0.0 3073146 Dell's custom build.  There is the slight possibility that this disk was created on ESX 6.0.0 3029758--again, data inconclusive at this point.  I do find it interesting that Dell crapped out a custom release very soon after that release.

The 5.5 I used to format the drive was vmware 5.5.0 1331820 Dell's customer build.

Hope that helps someone...

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
VMFS 5.61 is the latest supported version of VMFS, with ESXi 6.0.

so GhettoVCB is not compatible with newer VMFS 5.61 (ESXi 6.0) datastores.

It would have been helpful to notify Git/GhettoVCB fork.
Chris HInfrastructure ManagerAuthor Commented:
STill having trouble with one vm.  

Check this out...  I can clone the drive successfully onto the same datastore.  When I try to copy that file to the troubled datastore, it hangs at exactly 6GB.  This is a file system issue.  I'm starting to worry that Vsphere 6.0 and VMFS 5.61 aren't 100% stable.  I'm thinking my best alternative is to revert back to the copy of ESXi 5.5 I was running.  

All it takes to replicate this problem is a copy operation.  VMKFSTOOLS reports no problems with the vmdk.

Hoping someone, anyone can look at my dmesg log and shed some light as to why I can copy every other VM on this host except this specific VMDK.  I can clone and rename the VMDK, bu tthe second I try to copy it over, it hangs and then eventually fails.

It acts like it's APD but the drive never drops on any of the hosts.  Crazy
log.txt
Chris HInfrastructure ManagerAuthor Commented:
Solution:
Installed 5.5 trial on a flash drive, unmounted, deleted, formated new, formatted a new datastore to vmfs 5.60 (esxi 5.5), copied scripts back on and now the problem has gone away.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.