sevenet
asked on
Host is in a state that cannot perform snapshot
Hello,
We have a weird case with backing up VMs on VMware ESX 5.5. It doesn't matter if we use native VMware's VDP, EMC Networker's VBA or VADP solutions, the message is always the same:
2013-12-17 03:00:34 avnwcomm Error <0000>: Plugin returned non-zero exit code (10059). Aborting NetWorker connection.
2013-12-17 03:00:24 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-17 03:00:25 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-17 03:00:34 avvcbimage Warning <16004>: Soap fault detected, Could not connect to getHostMoref 'esx2.local'., Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
2013-12-17 03:00:34 avvcbimage FATAL <0000>: [IMG0009] The Host 'esx2.local' is in a state that cannot perform snapshot operations.
2013-12-17 03:00:34 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation failed
The google is silent about it. We have restarted services, physical hosts, vcenter, VDP, VBA, Networker, nothing helps. The weirdest thing is that it was working for about a week, and then stopped. No known changes were made to VMware environment. Does anyone have an idea what could be the cause or at least how to diagnose deeply this issue?
Best regards,
Krzysztof
We have a weird case with backing up VMs on VMware ESX 5.5. It doesn't matter if we use native VMware's VDP, EMC Networker's VBA or VADP solutions, the message is always the same:
2013-12-17 03:00:34 avnwcomm Error <0000>: Plugin returned non-zero exit code (10059). Aborting NetWorker connection.
2013-12-17 03:00:24 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-17 03:00:25 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-17 03:00:34 avvcbimage Warning <16004>: Soap fault detected, Could not connect to getHostMoref 'esx2.local'., Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
2013-12-17 03:00:34 avvcbimage FATAL <0000>: [IMG0009] The Host 'esx2.local' is in a state that cannot perform snapshot operations.
2013-12-17 03:00:34 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation failed
The google is silent about it. We have restarted services, physical hosts, vcenter, VDP, VBA, Networker, nothing helps. The weirdest thing is that it was working for about a week, and then stopped. No known changes were made to VMware environment. Does anyone have an idea what could be the cause or at least how to diagnose deeply this issue?
Best regards,
Krzysztof
Does this happen if you manully snapshot the VM?
try to rebooting the vsphere host first and then see.
1, please see that the Virtual machine startup / shutdown with host feature is disabled before this step
2. Restart the management agents on the host
3. On vCenter Server, restart the vCenter and Management Web servicesand
and then try to take snapshot.
and the very important ...does this happen only with one vm or any vm ??
please share the output
1, please see that the Virtual machine startup / shutdown with host feature is disabled before this step
2. Restart the management agents on the host
3. On vCenter Server, restart the vCenter and Management Web servicesand
and then try to take snapshot.
and the very important ...does this happen only with one vm or any vm ??
please share the output
Did you have the same issue running on ESXi 5.1?
If you connect to the ESXi 5.5 host directly using the VI Client are you able to snapshot the VM?
If you connect to the ESXi 5.5 host directly using the VI Client are you able to snapshot the VM?
ASKER
Sorry for a little delay.
Manual snapshots work OK.
We have restarted management agents on the host, and the host itself too, at the end.
We have restarted vcenter services and the server itself, countless times
It happens to all VMs, regardless of operating system or host on which it runs.
We didn't have a chance to check 5.1 in this environment, as it was already running 5.5, however, we have a lab environment with 5.1 and there are no such issues at all. We cannot downgrade production environment to 5.1 :-(
The interesting thing is that Networker's VBA from EMC was working for about a week, and then stopped with this message (we do not recall any major changes to the environment which could have caused the failure). Then, we tried VADP frmo EMC, and finally to eliminate Networker's problem we installed VDP from VMware. The message is the same. It seems it is originated by either ESX or vCenter (most probable). We plan to reinstall vCenter, but this is rather long-term plan if other ideas fail.
Manual snapshots work OK.
We have restarted management agents on the host, and the host itself too, at the end.
We have restarted vcenter services and the server itself, countless times
It happens to all VMs, regardless of operating system or host on which it runs.
We didn't have a chance to check 5.1 in this environment, as it was already running 5.5, however, we have a lab environment with 5.1 and there are no such issues at all. We cannot downgrade production environment to 5.1 :-(
The interesting thing is that Networker's VBA from EMC was working for about a week, and then stopped with this message (we do not recall any major changes to the environment which could have caused the failure). Then, we tried VADP frmo EMC, and finally to eliminate Networker's problem we installed VDP from VMware. The message is the same. It seems it is originated by either ESX or vCenter (most probable). We plan to reinstall vCenter, but this is rather long-term plan if other ideas fail.
If this is causing long standing service issues, I would now escalate to VMware Support.
ASKER
Yes, we have opened the case, but so far, for about two weeks, noone is able to track the root cause. We have a webex session scheduled for tomorrow. I though maybe someone already found the solution :-) From what I google around, many people have similiar issues. It looks like some kind of ghost problem :-/
What build number of ESXi 5.5 are you using? Also are using an OEM version of ESXi of the generic VMware version from the VMware site?
Are the device drivers updated on your host? BIOS, NIC's, Storage HBA etc....
Any errors for the host in your SAN logs??
What type of permissions is the backup agent running under ?
Are the device drivers updated on your host? BIOS, NIC's, Storage HBA etc....
Any errors for the host in your SAN logs??
What type of permissions is the backup agent running under ?
ASKER
vCenter is 5.5.0 build 1312298
ESXi is generic installation (ISO from VMware): VMware ESXi, 5.5.0, 1331820
We have checked permissions. Just to make sure there are no issues we use administrator account.
There are no errors on SAN. All datastores work OK.
ESXi is generic installation (ISO from VMware): VMware ESXi, 5.5.0, 1331820
We have checked permissions. Just to make sure there are no issues we use administrator account.
There are no errors on SAN. All datastores work OK.
ASKER
Here is the most recent log from VDP:
--------------------------------------------------------------------------------------------------------
----- START avvcbimage log 2013-12-30 23:00:05 CET [7.0.181-86 Linux-x86_64]
--------------------------------------------------------------------------------------------------------
2013-12-30T23:00:05.321+1:00 avvcbimage Info <5008>: Logging to /usr/local/avamarclient/var-proxy-5/systems_daily-systems_daily-1388440800090-9f618cf4cfbe04eb51e7c62438d310bd7a7d9356-1016-vmimagel.log
2013-12-30T23:00:05.321+1:00 avvcbimage Info <5174>: - Reading /root/.avamar
2013-12-30T23:00:05.321+1:00 avvcbimage Info <5174>: - Reading /usr/local/avamar/etc/usersettings.cfg
2013-12-30T23:00:05.321+1:00 avvcbimage Info <5174>: - Reading /usr/local/avamarclient/var-proxy-5/avvcbimage.cmd
2013-12-30T23:00:05.321+1:00 avvcbimage Info <5174>: - Reading /usr/local/avamarclient/var/avvcbimageAll.cmd
2013-12-30T23:00:05.325+1:00 avvcbimage Info <6673>: CTL listening on port 47040
2013-12-30T23:00:05.327+1:00 avvcbimage Info <10684>: Setting ctl message version to 3 (from 1)
2013-12-30T23:00:05.328+1:00 avvcbimage Info <16136>: Setting ctl max message size to 268435456
2013-12-30T23:00:05.332+1:00 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-30T23:00:05.335+1:00 avvcbimage Warning <17809>: VAMI Statistics Push Provider DISABLED.
2013-12-30T23:00:05.359+1:00 avvcbimage Info <18664>: Login(https://vcenter.local:443/sdk) Datacenter: 'PRODUCTION'
2013-12-30T23:00:05.360+1:00 avvcbimage Info <0000>: - connected to 'VirtualCenter' - version: 'VMware vCenter Server 5.5.0 build-1312298', apiVersion:'5.5'
2013-12-30T23:00:05.481+1:00 avvcbimage Info <14692>: Login test is OK.
2013-12-30T23:00:05.713+1:00 avvcbimage Info <0000>: VM Info: VMX version 4(F), ctk enabled(T), fault tolerant (F), Guest OS CentOS 4/5/6 (64-bit), Cloud UUID = NoUUID. version=8
2013-12-30T23:00:05.797+1:00 avvcbimage Info <14625>: WorkOrder Disk Info:
Disk '2000': file(base):'[LUN0-ESX] dns/dns.vmdk', backItUp=1
snapshot file:'[LUN0-ESX] dns/dns.vmdk'
prior size(KB):0, current size(KB):10485760, match=0
prior change block ID:''
Datastore:'LUN0-ESX' Directly Accessible=1
2013-12-30T23:00:05.798+1:00 avvcbimage Info <11979>: proxy VM DNS name is: 127.0.0.2,vdp0.local,vdp0,10.1.211.20
2013-12-30T23:00:05.798+1:00 avvcbimage Info <18672>: proxy VM Instance name is: vdp0-proxy-5
2013-12-30T23:00:05.798+1:00 avvcbimage Info <17793>: proxy VM Moref is: vm-726
2013-12-30T23:00:05.798+1:00 avvcbimage Info <16010>: vCenter 'PRODUCTION' is vcenter.local
2013-12-30T23:00:05.798+1:00 avvcbimage Info <11981>: VM's host is esx1.local
2013-12-30T23:00:05.798+1:00 avvcbimage Info <11982>: VM's primary storage location is [LUN0-ESX] dns/dns.vmx
2013-12-30T23:00:05.798+1:00 avvcbimage Info <11983>: VM's vCenter ID is vm-98
2013-12-30T23:00:06.545+1:00 avvcbimage Warning <16004>: Soap fault detected, Could not connect to getHostMoref 'esx1.local'., Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]
'
2013-12-30T23:00:06.554+1:00 avvcbimage Warning <16004>: Soap fault detected, Get Host properties problem, Msg:'SOAP 1.1 fault: "":ServerFaultCode [no subcode]
"
Required property obj is missing from data object of type ObjectSpec
while parsing serialized DataObject of type vmodl.query.PropertyCollector.ObjectSpec
at line 2, column 651
while parsing property "objectSet" of static type ArrayOfObjectSpec
while parsing serialized DataObject of type vmodl.query.PropertyCollector.FilterSpec
at line 2, column 445
while parsing call information for method RetrieveProperties
at line 2, column 273
while parsing SOAP body
at line 2, column 258
while parsing SOAP envelope
at line 2, column 0
while parsing HTTP request for method retrieveContents
on object of type vmodl.query.PropertyCollector
at line 1, column 0"
Detail: <InvalidRequestFault xmlns="urn:vim25" xsi:type="InvalidRequest"></InvalidRequestFault>
'
2013-12-30T23:00:07.030+1:00 avvcbimage Warning <16004>: Soap fault detected, Could not connect to getHostMoref 'esx1.local'., Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]
'
2013-12-30T23:00:07.039+1:00 avvcbimage Warning <16004>: Soap fault detected, Get Host properties problem, Msg:'SOAP 1.1 fault: "":ServerFaultCode [no subcode]
"
Required property obj is missing from data object of type ObjectSpec
while parsing serialized DataObject of type vmodl.query.PropertyCollector.ObjectSpec
at line 2, column 651
while parsing property "objectSet" of static type ArrayOfObjectSpec
while parsing serialized DataObject of type vmodl.query.PropertyCollector.FilterSpec
at line 2, column 445
while parsing call information for method RetrieveProperties
at line 2, column 273
while parsing SOAP body
at line 2, column 258
while parsing SOAP envelope
at line 2, column 0
while parsing HTTP request for method retrieveContents
on object of type vmodl.query.PropertyCollector
at line 1, column 0"
Detail: <InvalidRequestFault xmlns="urn:vim25" xsi:type="InvalidRequest"></InvalidRequestFault>
'
2013-12-30T23:00:07.039+1:00 avvcbimage Info <0000>: Proxy Host Name=esx1.local() VM Host Name=esx1.local()
2013-12-30T23:00:07.039+1:00 avvcbimage Info <11984>: VM's backup account on vdp0.local is /vcenter.local/VirtualMachines/dns_Unag9GyalhQeq8btPpE68g
2013-12-30T23:00:07.039+1:00 avvcbimage Info <11986>: Changed block tracking is engaged for this VM
2013-12-30T23:00:07.039+1:00 avvcbimage Info <11988>: A reference to a valid prior backup is not available so this will be a full level zero backup.
2013-12-30T23:00:07.050+1:00 avvcbimage Warning <6788>: Found unknown section 'browse-from-catalog' in workorder, ignoring
2013-12-30T23:00:07.053+1:00 avvcbimage Info <5174>: - Reading /usr/local/avamarclient/var-proxy-5/avvcbimage.cmd
2013-12-30T23:00:07.053+1:00 avvcbimage Info <5174>: - Reading /usr/local/avamarclient/var/avvcbimageAll.cmd
2013-12-30T23:00:07.053+1:00 avvcbimage Info <7084>: target[0]=[LUN0-ESX] dns/dns.vmdk
2013-12-30T23:00:07.067+1:00 avvcbimage Info <18664>: Login(https://vcenter.local:443/sdk) Datacenter: 'PRODUCTION'
2013-12-30T23:00:07.067+1:00 avvcbimage Info <0000>: - connected to 'VirtualCenter' - version: 'VMware vCenter Server 5.5.0 build-1312298', apiVersion:'5.5'
2013-12-30T23:00:07.084+1:00 avvcbimage Info <14692>: Login test is OK.
2013-12-30T23:00:07.109+1:00 avvcbimage Info <16041>: VDDK:OBJLIB-LIB: Objlib initialized.
2013-12-30T23:00:07.109+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Attempting to locate advanced transport module in "/usr/lib/vmware-vix-disklib".
2013-12-30T23:00:08.151+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Advanced transport plugin was successfully loaded into vixDiskLib.
2013-12-30T23:00:08.152+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Enabling advanced transport modes.
2013-12-30T23:00:08.152+1:00 avvcbimage Info <16041>: VDDK:OBJLIB-LIB: Objlib initialized.
2013-12-30T23:00:08.153+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Initialize transport modes.
2013-12-30T23:00:08.154+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Available transport modes: file:san:hotadd:nbdssl:nbd.
2013-12-30T23:00:08.154+1:00 avvcbimage Info <16041>: VDDK:VMware VixDiskLib (5.5) Release build-1284542
2013-12-30T23:00:08.234+1:00 avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_Init: Initialization is completed.
2013-12-30T23:00:08.234+1:00 avvcbimage Info <9666>: Available transport modes are file:san:hotadd:nbdssl:nbd
2013-12-30T23:00:08.234+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_PrepareForAccess: Prepare to access disk.
2013-12-30T23:00:08.234+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Connect: Establish connection.
2013-12-30T23:00:08.234+1:00 avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_AllowVMotion: Disable VMotion.
2013-12-30T23:00:10.761+1:00 avvcbimage Info <17818>: VixDiskLib vMotion reservation successfully enabled by 'vSphere Data Protection'
2013-12-30T23:00:10.761+1:00 avvcbimage Info <16011>: Running cleanup of old backups.
2013-12-30T23:00:10.761+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Cleanup: Remove previous mount points and clean up .
2013-12-30T23:00:10.761+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Connect: Establish connection.
2013-12-30T23:00:10.761+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Disconnect: Disconnect.
2013-12-30T23:00:10.761+1:00 avvcbimage Info <16048>: VixDiskLib_Cleanup() found no prior mounts to cleanup.
2013-12-30T23:00:10.781+1:00 avvcbimage Info <18664>: Login(https://vcenter.local:443/sdk) Datacenter: 'PRODUCTION'
2013-12-30T23:00:10.781+1:00 avvcbimage Info <0000>: - connected to 'VirtualCenter' - version: 'VMware vCenter Server 5.5.0 build-1312298', apiVersion:'5.5'
2013-12-30T23:00:10.817+1:00 avvcbimage Info <14692>: Login test is OK.
2013-12-30T23:00:11.912+1:00 avvcbimage Warning <16004>: Soap fault detected, Could not connect to getHostMoref 'esx1.local'., Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]
'
2013-12-30T23:00:11.990+1:00 avvcbimage FATAL <0000>: [IMG0009] The Host 'esx1.local' is in a state that cannot perform snapshot operations.
2013-12-30T23:00:11.990+1:00 avvcbimage Info <9772>: Starting graceful (staged) termination, Host is in a state that cannot perform snapshots. (wrap-up stage)
2013-12-30T23:00:11.990+1:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation or pre/post snapshot script failed
2013-12-30T23:00:11.991+1:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation/pre-script/post-script failed
2013-12-30T23:00:11.991+1:00 avvcbimage Info <9763>: snapshot created:false NOMC:false ChangeBlTrackingAvail:true UsingChBl:true
2013-12-30T23:00:11.991+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_EndAccess: Disk access completed.
2013-12-30T23:00:11.991+1:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Connect: Establish connection.
2013-12-30T23:00:11.991+1:00 avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_AllowVMotion: Enable VMotion.
2013-12-30T23:00:11.993+1:00 avvcbimage Info <16038>: Final summary, cancelled/aborted 0, snapview 1, exitcode 209: plugin error 09
2013-12-30T23:00:14.224+1:00 avvcbimage Info <17819>: VixDiskLib vMotion reservation successfully released
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hello,
Actually it was my post :-) Since we had a very serious issue with it I tried posting in many places hoping to find an answer as soon as possible. Indeed, increasing LUN size helped, but the above messages are ridiculous comparing to the real reason. I would have never guessed from that output that the problem was the LUN space. Anyway, still no one is able to answer my question why was this space a case :-)
Best regards,
Krzysztof
Actually it was my post :-) Since we had a very serious issue with it I tried posting in many places hoping to find an answer as soon as possible. Indeed, increasing LUN size helped, but the above messages are ridiculous comparing to the real reason. I would have never guessed from that output that the problem was the LUN space. Anyway, still no one is able to answer my question why was this space a case :-)
Best regards,
Krzysztof