Avatar of kasamahesh
kasamahesh asked on

Window 2012 VM freezes/hangs

We have Windows server 2012 Datacenter VM hosted in windows ESXi v 5.1.0 1743533. We run globalscape FTP server on this guest OS. The problem with this particular VM is that the VM freezes/hangs time to time. When this happens we have not option but to reset it. When it freezes we can ping but cannot remote desktop to it, GlobalScape FTP stops responding to ftp client.  We also observed that when this occurs the vmware tools stops working.

The ftp server is a production server and we are helpless that we don't have any option but to reset the VM. Its been recurring once or twice every week. Further information is below:

This VM runs in vnxnet3 vNIC, Windows system log show nothing significant err before VM freezes.  We do take snapshot every Sunday to backup.
Windows Server 2012VMware

Avatar of undefined
Last Comment
kasamahesh

8/22/2022 - Mon
Muhammad Burhan

please share event logs regarding this.
clear all logs and wait till it hangs again, then start it and check/share the logs.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

okay, can we start with the specification of the VM, memory, CPUs etc ?

also time to update your firmware, and update your ESXi version from 5.1, to the latest 5,.1 U3.
Mike Duckett

What do you mean When you say you take a snapshot every Sunday to backup?

Snapshots are not meant as a backup solution unless you are using something like VEEAM etc that actually backsup the snapshot and removes it, running on a snapshot/having a long chain of snapshots is not recommended.. This could cause performance issues

See http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180 and http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1025279
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
ASKER
kasamahesh

@Andrew Hancock,

Memory 8 GB and there is this specific requirement of 8 vCPU. We have upgrade option in mind. but there are other Windows server in same host that run fine but this one is not.

@Muhammad Burhan, we will share once VM hangs again.
ASKER
kasamahesh

@Mike Duckett - Iet me clear the backing up of this VM, we use vmware vSphere data protection 5.8  for backup and recovery purpose for this VM. Thats where the snapshot term came in.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

I would check, there are no snapshots, and by this I mean check the datastore, at the console/or remotely via SSH, not via Snapshot Manager.

I would also check, you have not oversubscribbed the VM.

What is the make, model of server ?

Is it on the HCL ?

What is the capacity of the server ? Memory and CPU

What is the storage ?

How do you reset the server ?

Can you login from the console ?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
kasamahesh

I have vCPU is oversubscribed in all ESXi hosts that we have and server in question is HP Proliant BL465c g8 with 256 GB memory and 32 cores. The storage that i have emc vmax array.

when VM freezes i cannot login through console and i had to right click the VM and then reset it.

I don't know how to check the snapshots with ssh login.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

So what does the overall performance of the host look like ?

100% Memory, 100% CPU ?

is the VM at the login screen, when it freezes ?

see my EE article here to check for snapshots

HOW TO: VMware Snapshots :- Be Patient
ASKER
kasamahesh

CPU goes as high as 65% when files are being uploaded/downloaded. Memory is fixed at 55%.

I checked the datastores and i don't see any -00000x.vmdk files. Also checked the event for this VM and i can see there taking and then removing the snapshot out.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
ASKER
kasamahesh

The VM hanged again and had to reset it.

Capture.JPG
The Windows log has lots and lots of entry like below:

System log:
Information,12/3/2015 11:53:30 PM,Service Control Manager,7036,None,The Windows Update service entered the stopped state.
Information,12/3/2015 11:43:28 PM,Service Control Manager,7036,None,The Windows Update service entered the running state.
Information,12/3/2015 11:26:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 10:58:52 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 10:24:22 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 9:57:42 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 9:23:11 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 9:10:09 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 8:19:08 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 7:51:49 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 7:17:18 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 7:16:01 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 6:16:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 5:47:19 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 5:12:48 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 4:47:00 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 4:12:29 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
Information,12/3/2015 3:46:52 PM,Service Control Manager,7036,None,The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
Information,12/3/2015 3:19:08 PM,Microsoft-Windows-Winlogon,7002,(1102),User Logoff Notification for Customer Experience Improvement Program
Information,12/3/2015 3:16:31 PM,Microsoft-Windows-Winlogon,7002,(1102),User Logoff Notification for Customer Experience Improvement Program
Information,12/3/2015 3:15:53 PM,Service Control Manager,7036,None,The Windows Update service entered the stopped state.
Information,12/3/2015 3:07:50 PM,Service Control Manager,7036,None,The Portable Device Enumerator Service service entered the stopped state.
Information,12/3/2015 3:05:51 PM,Service Control Manager,7036,None,The Windows Update service entered the running state.
Information,12/3/2015 3:05:50 PM,Service Control Manager,7036,None,The Portable Device Enumerator Service service entered the running state.
Information,12/3/2015 2:01:28 PM,Service Control Manager,7036,None,The Function Discovery Provider Host service entered the stopped state.
Information,12/3/2015 1:59:14 PM,Service Control Manager,7036,None,The Function Discovery Provider Host service entered the running state.
Information,12/3/2015 12:35:57 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:33:42 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:31:22 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:25:54 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:20:09 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.
Information,12/3/2015 12:17:54 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the running state.
Information,12/3/2015 12:14:50 PM,Service Control Manager,7036,None,The Windows Modules Installer service entered the stopped state.

application log:
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 maybe hanging (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Warning,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 running for more than 28800 seconds (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:16140 maybe hanging (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"
Warning,12/4/2015 12:30:41 AM,AWEMon,3,None,"Process GSAWE PID:16140 running for more than 28800 seconds (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"

Open in new window

Sorry log is filtered out as as this is a production server.
ASKER
kasamahesh

Since I am not getting any resolution, I am opening a new ticket.
ASKER
kasamahesh

@Andrew, Since nobody is adding any comments I open up a new ticket.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

That's what the request for Assistance button is for. The Moderators will work with you and assist you, in getting fresh eyes, to look at the issue.
Mike Duckett

Looking at your events logs

Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 maybe hanging (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Warning,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:29240 running for more than 28800 seconds (CPU:223.3622318, StartTime:12/03/2015 12:21:02)"
Error,12/4/2015 12:30:42 AM,AWEMon,3,None,"Process GSAWE PID:16140 maybe hanging (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"
Warning,12/4/2015 12:30:41 AM,AWEMon,3,None,"Process GSAWE PID:16140 running for more than 28800 seconds (CPU:12.7764819, StartTime:12/03/2015 12:37:47)"


This seems to show the FTP software is hanging have you spoken to Globalscape?  I would suggest checking with them.
Also if possible (although given you stated it is a production server, I understand it may not be) try disabling the software and see if the fault occurs without it running?

Do you see anything in the console when it hangs?  (I know you said you couldn't login, but does it show the logon screen?)
ASKER
kasamahesh

no, we can not disable it. Logon screen shows up but we can't login, it takes like forever in login  screen.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Mike Duckett

That certainly sounds like something at an OS/application level rather than VMware then..

I would contact Globalscape and see if they have seen the issue..

Maybe also stay logged on to the server on the console so when the issue occurs you can attempt to look at task manager (or process explorer from sysinternals) etc and narrow down what may be causing the hang.  

Does the FTP software have any logging built in that may help?
ASKER
kasamahesh

One thing I notice is that when it hangs, vmware tools stop running. How would you explain this?
ASKER CERTIFIED SOLUTION
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Mike Duckett

Above is exactly what I was just typing... :)

My guess would be it doesn't detect it as running because whatever is causing the OS to hang is using so much CPU it also stops VMware tools reporting back.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
kasamahesh

We had another freezing. I checked the vmware.log for this particular VM. There were following err.

vmware.log:3312332:2015-12-08T20:29:56.096Z| vmx| I120: Vix: [7773806 guestCommands.c:1926]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest

I can verify that each time VM freezing corresponds to this type of err in vmware.log

CPU usage around this time is between 17% and 21% as seen from vsphere console.
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

So, if you stop VMware Tools, completely, and set to disabled, does the VM still freeze?

simple test.
ASKER
kasamahesh

I did not tried the option to stop vmware tools. Since it is a production, i am in no position to stop this. What are the possible consequences here?
Your help has saved me hundreds of hours of internet surfing.
fblack61
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

For test purposes, nothing it just means, that the VM, will stop communicating with the host. Low risk.

To try and establish the fault, is it not better to experiment, and try and fix this freezing fault on this VM.
ASKER
kasamahesh

We have engaged the support from symantec, vmware and microsoft but could not find the root cause of the problem. so decided to deploy a new VM.

Thank you everyone for the comments.