CBT data is invalid in Veeam backups of VMware 6 VMs

Lately, Veeam backup jobs are taking 3 to 4 times longer. Most VMs have this note "CBT data is invalid, failing over to legacy incremental backup." I used their KB1113 article to reset CBT for VMs using PowerShell commands, but "CBT data is invalid" has come back in a few days on some VMs.

What could be causing this? Any permanent solution? BTW, I recently upgraded to VMware 6.0. Thanks.

AK
LVL 3
AkulshAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Remove the CBT file.

If CBT tracking is not enabled, it will fallback to legacy backups, and not perform fast incremental backups.

from Anton Gostev from Veeam on CBT:

In essence, CBT is all about CTK files, these are the files which contain change tracking information of the corresponding VMDK file.

The concept is pretty simple, and if you are familiar with AD DirSync control, or Exchange ICS (public folders change tracking) – it is essentially the same: global USN (Update Sequence Number) for each object. CTK file describes the state of each block for tracking purposes, and contain USN for each block in the corresponding VMDK. After any block is updated, it is assigned the new global USN (which is previous USN value that was used on previously processed block plus 1). This way, any application can ask VMware API “tell me if this block was changed since THIS moment”, and the API will easily tell that by simply comparing the provided sequence number with the actual USN on each block. If provided USN is smaller than actual for particular block, it means that the block was changed (and needs to be backed up, replicated or otherwise processed). So multiple processes cannot conflict with each other anyhow. Each process just memorizes the USN corresponding to the snapshot that the application created during processing, and next time it will use the memorized USN to query for changed blocks.

There should be one CTK file per VMDK file, and CTK file cannot grow out of proportion with number of blocks in VMDK (as it stores only 1 record per VMDK block). CTK file is also thousands time smaller than actual VMDK, because it stores only a few bytes of information (USN) for each corresponding 256KB VMDK block (I am 90% sure it is 256KB, used to calculate it once using CTK debug/stats data, just don’t remember for sure – unimportant info escapes my head automatically to prevent overload with useless facts ;) . For the same reasons, I/O overhead is barely noticeable with CBT: change few extra bytes to write for each 256000 bytes of data.

The CTK files are permanent, and should not be deleted after backup/replication.
Paul SolovyovskySenior IT AdvisorCommented:
Have you upgraded to the latest Veeam build?  IT has many features, I'm wondering if one of them is ability to use the vStorage API for vSphere 6.
Bryant SchaperCommented:
Besides upgrading to the latest Veeam build, make sure you apply a patch, ESXi600-201505001, to fix the CBT bug in ESX 6.0  Veeam will still backup but you may crash ESX and we had some SQL corruption along with the bug, but we never crashed.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AkulshAuthor Commented:
Andrew, Thanks.

PaulSolov, I am running Veeam 8.0.0.2021. I recently applied Update2.

Bryant, this ESXi600-201505001 may be the solution but somehow this patch is not showing up in Update Manager even though I have downloaded latest patches. The last one is from April 2015. Do I have to apply it manually? Thanks.
Bryant SchaperCommented:
I did apply it manually, we only have 6 hosts.  

This patch is from May 14 or so, you should see it in the patch downloads, can you import it manually to update manager?
Bryant SchaperCommented:
btw, manually was not too bad, I just placed it in a datastore that all the hosts have, and used an SSH session.

My bigeest problem was waiting for vms to migrate from server to server for the reboot.
AkulshAuthor Commented:
Bryant,

In my Patch Repository of Update manager, I see a different patch from the same day May 14, 2015 -- ESXi600-201505401-BG, which refers to KB#2116126, rather than KB#2116125. I wonder if it is an updated version. Will try to install it. Thanks.

AK
AkulshAuthor Commented:
Bryant,

The reason the May-14 patch initially did not show up in Update Manager is because my setting was for "Critical Patches" only and this is considered 'Important', not 'Critical'. Thanks again.
AkulshAuthor Commented:
Bryant,

I applied the patch (the 2 patches seem identical) last evening and rebooted all hosts, but it has not improved anything. (The Update Manager shows that entire cluster is compliant in terms of Critical and Non-Critical patches.)

I noticed in the article about the patch "Virtual Machine Migration or Shutdown Required: Yes". Not sure what it means. I have DRS enabled and VMs migrate all the time on their own. Any suggestion? Thanks.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I noticed in the article about the patch "Virtual Machine Migration or Shutdown Required: Yes". Not sure what it means. I have DRS enabled and VMs migrate all the time on their own. Any suggestion? Thanks.

That means, you must migrate ALL VMs off the Host, Put the host in Maintenance Mode, Apply the Patch, Reboot the server.
Bryant SchaperCommented:
How are you performing Veeam backups, I believe they still have a direct san bug under specific situations.  

This was in the weekly forum from Veeam, i would recommend contacting both vendors as well.


Sorry for missing the previous digest due to my travels. In the past 2 weeks, I have delivered the B&R v9 training to our Americas and APAC teams, and attended Microsoft Ignite 2015 (the new all-in-one Microsoft's conference). Here are a few impressions about the latter. First, it was extremely crowded: it's not a good idea to merge both Americas and EMEA mega-conferences into one. I missed a few sessions and lunches due to absolutely insane lanes. On the other hand, it was also one of the most interesting conferences that I have attended. Microsoft has so much really cool stuff coming in the immediate future, that a usually boring main key note flew in one breath for me – despite being one of the longest key notes ever! It's no wonder as there are many enhancements coming in Windows 10 and Windows Server 2016 which are breathtaking! Finally, if I had to name ONE feature that struck me the most, this would be the ability to stretch an individual SQL Server table to Azure SQL database. Talk about true hybrid cloud computing! The new Microsoft keeps impressing me.

On the VMware front, past two weeks of battles with the vSphere 6 issues provided for significant advancements of our troops into the new major release bugs territory. First, late last week VMware has released the patch for the major CBT issue I have covered in my previous digest (KB2114076). First customers who have already had a chance to install the patch are reporting that it fully resolved the issue. Kudos to VMware for addressing the issue so promptly and basically, this becomes a must-install patch for all vSphere 6 deployments until VMware releases Update 1. If you are one of our partners, this is a good reason to reach out to your clients and make sure they have the patch installed. And thanks for forwarding me the release notification the moment it was out, Fernando.

While this bug is off the table, there is another newly discovered issue which we are seeing a lot with the customers who have upgraded to ESXi 6.0. Fortunately, this one is environmental-specific and has a workaround that should work for majority of users. The issue causes jobs using Direct SAN access mode to crash on backup proxies with a large amount of NIC adapters having IPv6 enabled. The actual fault sits in one of the VDDK 6.0 libraries, in the function that collects DNS addresses from all proxy NICs, and puts them into a string. When string's length becomes larger than certain amount of characters, VDDK crashes. As you can probably guess by now, the main issue here is IPv6 address length. For example, in our own lab VDDK does not have any issues with a backup proxy having 10 IPv4-only adapters does not have any issues, however it crashes on a backup proxy with a single IPv4 adapter and 7 IPv6 adapters.

Obviously, aside of removing some of the unused network interfaces, the easiest workaround is to simply go into each NIC's properties, and disable IPv6 – which is enabled by default on every network connection, but rarely actually used - outside perhaps huge telecoms and service providers. We are also testing the code that patches the faulty VDDK function, and if all is well, this will be available momentarily as a hot fix for customers who cannot disable IPv6. Also, due to how wide spread the issue is – over 50 support cases as of end of last week – we are now considering releasing the new Update 2 build (U2a) with the fix embedded. This will also give us a chance to address a few other, less wide-spread U2 support issues. I will keep you posted regarding this newer build.

Note that VDDK 5.5 does not have the same bug, and since we are using VDDK 6.0 for processing vSphere 6 environments ONLY, customers with earlier vSphere versions are safe from this issue even if they have U2 installed.

Here is an important piece of information for those using Veeam Endpoint Backup to back up computers that have some folders synced to one of the popular cloud storage platforms. Our QC has just finished a comprehensive interop testing with Dropbox, Google Drive and Microsoft OneDrive. To cut the long story short, with the default backup job settings, restores may cause data loss when using Microsoft OneDrive – while Dropbox and Google Drive are safe. For more information, please refer to this support KB article > KB2032.
AkulshAuthor Commented:
Thanks Andrew. Before rebooting the hosts, obviously all its VMs got migrated to other hosts, so this step automatically happens.

Bryant, things are looking up now. The next backup job no longer has any CBT related errors. The time taken has also come down. So that patch takes time to go into effect.

I think in one more go, time of the backups would become normal. I am planning on closing the ticket tomorrow.
Bryant SchaperCommented:
That may be more a veeam thing, it probably has not had a clean backup
AkulshAuthor Commented:
Now CBT errors are no more and backup time is back to normal.
So the key was applying  May 14, 2015 patch on VMware 6.0 hosts (KB#2116125 or KB#2116126) and waiting about a day for this to go into effect. Thanks to you all.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.