We help IT Professionals succeed at work.

Window 2012 VM freezes/hangs

kasamahesh
kasamahesh asked
on
1,686 Views
Last Modified: 2015-12-20
We have Windows server 2012 Datacenter VM hosted in windows ESXi v 5.1.0 1743533. We run globalscape FTP server on this guest OS. The problem with this particular VM is that the VM freezes/hangs time to time. When this happens we have not option but to reset it. When it freezes we can ping but cannot remote desktop to it, GlobalScape FTP stops responding to ftp client.  We also observed that when this occurs the vmware tools stops working.

The ftp server is a production server and we are helpless that we don't have any option but to reset the VM. Its been recurring once or twice every week. Further information is below:

This VM runs in vnxnet3 vNIC, Windows system log show nothing significant err before VM freezes.  We do take snapshot every Sunday to backup.
Comment
Watch Question

Muhammad BurhanManager I.T.
CERTIFIED EXPERT
Top Expert 2015

Commented:
please share event logs regarding this.
clear all logs and wait till it hangs again, then start it and check/share the logs.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
okay, can we start with the specification of the VM, memory, CPUs etc ?

also time to update your firmware, and update your ESXi version from 5.1, to the latest 5,.1 U3.
What do you mean When you say you take a snapshot every Sunday to backup?

Snapshots are not meant as a backup solution unless you are using something like VEEAM etc that actually backsup the snapshot and removes it, running on a snapshot/having a long chain of snapshots is not recommended.. This could cause performance issues

See http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180 and http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1025279

Author

Commented:
@Andrew Hancock,

Memory 8 GB and there is this specific requirement of 8 vCPU. We have upgrade option in mind. but there are other Windows server in same host that run fine but this one is not.

@Muhammad Burhan, we will share once VM hangs again.

Author

Commented:
@Mike Duckett - Iet me clear the backing up of this VM, we use vmware vSphere data protection 5.8  for backup and recovery purpose for this VM. Thats where the snapshot term came in.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
I would check, there are no snapshots, and by this I mean check the datastore, at the console/or remotely via SSH, not via Snapshot Manager.

I would also check, you have not oversubscribbed the VM.

What is the make, model of server ?

Is it on the HCL ?

What is the capacity of the server ? Memory and CPU

What is the storage ?

How do you reset the server ?

Can you login from the console ?

Author

Commented:
I have vCPU is oversubscribed in all ESXi hosts that we have and server in question is HP Proliant BL465c g8 with 256 GB memory and 32 cores. The storage that i have emc vmax array.

when VM freezes i cannot login through console and i had to right click the VM and then reset it.

I don't know how to check the snapshots with ssh login.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
So what does the overall performance of the host look like ?

100% Memory, 100% CPU ?

is the VM at the login screen, when it freezes ?

see my EE article here to check for snapshots

HOW TO: VMware Snapshots :- Be Patient

Author

Commented:
CPU goes as high as 65% when files are being uploaded/downloaded. Memory is fixed at 55%.

I checked the datastores and i don't see any -00000x.vmdk files. Also checked the event for this VM and i can see there taking and then removing the snapshot out.

Author

Commented:
The VM hanged again and had to reset it.

Capture.JPG
The Windows log has lots and lots of entry like below:

System log:
Information,12/3/2015 11:53:30 PM,Service Control Manager,7036,None,The Windows Update service entered the stopped state.
Information,12/3/2015 11:43:28 PM,Service Control Manager,7036,None,The Windows Update service entered the running state.
Information,12/3/2015 11:26:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 10:58:52 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 10:24:22 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 9:57:42 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 9:23:11 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 9:10:09 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 8:19:08 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 7:51:49 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 7:17:18 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 7:16:01 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 6:16:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 5:47:19 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 5:12:48 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 4:47:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 4:12:29 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 3:46:52 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 3:19:08 PM,Microsoft-Windows-Winlogon,7002,(1102),User Logoff Notification for Customer Experience Improvement Program
Information,12/3/2015 3:16:31 PM,Microsoft-Windows-Winlogon,7002,(1102),User Logoff Notification for Customer Experience Improvement Program
Information,12/3/2015 3:15:53 PM,Service Control Manager,7036,None,The Windows Update service entered the stopped state.
Information,12/3/2015 3:07:50 PM,Service Control Manager,7036,None,The Portable Device Enumerator Service service entered the stopped state.
Information,12/3/2015 3:05:51 PM,Service Control Manager,7036,None,The Windows Update service entered the running state.
Information,12/3/2015 3:05:50 PM,Service Control Manager,7036,None,The Portable Device Enumerator Service service entered the running state.
Information,12/3/2015 2:01:28 PM,Service Control Manager,7036,None,The Function Discovery Provider Host service entered the stopped state.
Information,12/3/2015 1:59:14 PM,Service Control Manager,7036,None,The Function Discovery Provider Host service entered the running state.
Information,12/3/2015 12:35:57 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:33:42 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:31:22 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:25:54 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:20:09 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:17:54 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:14:50 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.

application log:
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 maybe hanging (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Warning,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 running for more than 28800 seconds (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:16140 maybe hanging (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"
Warning,12/4/2015 12:30:41 AM,AWEMon,3,None,"Process GSAWE PID:16140 running for more than 28800 seconds (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"

Open in new window

Sorry log is filtered out as as this is a production server.

Author

Commented:
Since I am not getting any resolution, I am opening a new ticket.

Author

Commented:
@Andrew, Since nobody is adding any comments I open up a new ticket.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
That's what the request for Assistance button is for. The Moderators will work with you and assist you, in getting fresh eyes, to look at the issue.
Looking at your events logs

Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 maybe hanging (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Warning,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 running for more than 28800 seconds (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:16140 maybe hanging (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"
Warning,12/4/2015 12:30:41 AM,AWEMon,3,None,"Process GSAWE PID:16140 running for more than 28800 seconds (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"


This seems to show the FTP software is hanging have you spoken to Globalscape?  I would suggest checking with them.
Also if possible (although given you stated it is a production server, I understand it may not be) try disabling the software and see if the fault occurs without it running?

Do you see anything in the console when it hangs?  (I know you said you couldn't login, but does it show the logon screen?)

Author

Commented:
no, we can not disable it. Logon screen shows up but we can't login, it takes like forever in login  screen.
That certainly sounds like something at an OS/application level rather than VMware then..

I would contact Globalscape and see if they have seen the issue..

Maybe also stay logged on to the server on the console so when the issue occurs you can attempt to look at task manager (or process explorer from sysinternals) etc and narrow down what may be causing the hang.  

Does the FTP software have any logging built in that may help?

Author

Commented:
One thing I notice is that when it hangs, vmware tools stop running. How would you explain this?
VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Above is exactly what I was just typing... :)

My guess would be it doesn't detect it as running because whatever is causing the OS to hang is using so much CPU it also stops VMware tools reporting back.

Author

Commented:
We had another freezing. I checked the vmware.log for this particular VM. There were following err.

vmware.log:3312332:2015-12-08T20:29:56.096Z| vmx| I120: Vix: [7773806 guestCommands.c:1926]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest

I can verify that each time VM freezing corresponds to this type of err in vmware.log

CPU usage around this time is between 17% and 21% as seen from vsphere console.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
So, if you stop VMware Tools, completely, and set to disabled, does the VM still freeze?

simple test.

Author

Commented:
I did not tried the option to stop vmware tools. Since it is a production, i am in no position to stop this. What are the possible consequences here?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
For test purposes, nothing it just means, that the VM, will stop communicating with the host. Low risk.

To try and establish the fault, is it not better to experiment, and try and fix this freezing fault on this VM.

Author

Commented:
We have engaged the support from symantec, vmware and microsoft but could not find the root cause of the problem. so decided to deploy a new VM.

Thank you everyone for the comments.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.