Solved

Server on single ESX node, rebooted and services fail

Posted on 2011-02-13
30
565 Views
Last Modified: 2012-05-11
Hi all,

We've have this customer for about 1.5 years.
When they came to us, they required 2 servers on a dedicated server. So we put in a single vmware esx node (as they dont need HA) and used local storage to install 2 servers

1 to run exchange 2007
1 to run sharepoint on it

Last week, we rebooted 1 of their servers and it appears that the server came up, but no one could connect to it. After looking at the server, it looks like the vmware tools hadn't loaded and neither did ALOT of the other automatic services start up, didnt start and we were unable to start them.

To fix this, we had to rebuild the server and restore data.

Today, I have rebooted the Exchange server and this has happened again, to a different server on the same ESX host.

After logging into the console for ESX i can see an alert for disk usage which is red.

'Datastore usage on disk' - I'm not sure if this is any cause, but may be helpful.

Has anyone seen this issue before?


0
Comment
Question by:MarkMichael
  • 11
  • 9
  • 7
  • +1
30 Comments
 
LVL 28

Accepted Solution

by:
bgoering earned 500 total points
ID: 34884609
It sounds like you may have snapshots which has filled up the hard disk. How much free space do you have on your local datastore.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884641
Capacity: 1.63TB
Provisioned Space: 1.54TB
Free space: 98.01GB

Is this, possibly not enough space to make a snapshot?

Both servers, take up a total of 65GB of used space.

Do you think there is a possibility of finding this snapshot?

0
 
LVL 117
ID: 34884650
Check the VM properties, Snapshot Manager, do you have a snaphotis listed there, do you use Snapshots for backup?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34884692
That is only about 6% free space on your datastore - a little bit tight. I try to keep 15% to 20% Free.

As Hanccocka says check for any snapshots in your administration client, Either Virtual Infrastruction Client, or vSphere Client - depending on your version of ESX. There will be an icon on the toolbar that looks kind of like a clock with a wrench on it - click that to get into snapshot manager.

Also let us know how large of hard drives are allocated to your virtual machines.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884705
Nothing showing in the snapshot manager.

Just the simple 'You are here.' meaning I'm at the latest.

Could this have possibly made a snapshot and it didn't show up?

Is this sometime I can try and resolve do you think?
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884709
Server 1:

System drive: 100GB
Disk 2: Pagefile disk of 8GB
Data drive: 256GB
0
 
LVL 117
ID: 34884713
have a look at the datastore for snapshots, and post screengrab here...

is the datastore used for anything else other than VMs?
0
 
LVL 117
ID: 34884725
confused now, you said

"Both servers, take up a total of 65GB of used space."

but server 1 takes 364GB? (unless they are thin provisioned)

0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884727
Server 2

The

System - 50GB
Pagefile - 8GB
Data Disk that uses these 4 drives:

a. 256GB
b. 256GB
c. 256GB
d. 256GB

(theres also an old VM that we kept (100GB System drive on the same store))

0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884737
Sorry, when I said used space, I mean their actual space showing in Windows, when adding together.

I think it's all thick provisioned. I'm no ESX expert.
0
 
LVL 117
ID: 34884742
okay, any Snapshots on any VMs?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34884747
As for the alarm - with 6% free space that would be a normal alarm. But if you have no snapshots (let us verify by posting a directory listing for each server) th 98 GB Free space might be enough for now. Can't remember the defaults, but 6% would definately be a red alarm.

Did you have no alarms before?
0
 
LVL 117
ID: 34884757
okay so that totaled is 1.5TB server 1, server 2, and old VM.

So totalling it all up is 1.6TB. (with the free space).

that's very tight, and I've not included the swap space, needed for each VM equal to memory.

So that's certainly why, you've got a disk alert.

What ever you do DONT start using Snapshots, or any other Backup product, that uses them, Veeam, vRanger, vDR etc.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34884762
It sounds like all of the space can pretty much be accounted for by allocated drives. I am thinking this is possibly a virus problem. Do you have virus protection on your servers?

go to http://malwarebytes.org download and run the free scanner there. You may have to download it on another box, burn to cd, mount the cd on your windows box, and boot your troublesome vm into safe mode in order to run it.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34884848
Hi there,

You suggest downloading this and creating an ISO to connect it to the server? This will run within windows i guess?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 117
ID: 34884867
Personally, I would use Microsoft (yes they do have a free virus, malware checker) Security Checker

http://www.microsoft.com/security_essentials/

download and install, direct on server.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34884870
Yes, its a malware scanner - it does an install then trys to download a database. It won't be able to do that in safe mode. Run a full scan in safe mode, reboot and hopefully it will let you in. Let malwarebytes update itself and run another full scan.

Either burn to cd or to iso and connect the iso to your vm in order to get the scanner on the possibly infected machine.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34884905
I was just thinking - it might be easier to install it on your other vm - scan it first.

Then power down your troublesome vm and attach the system hard drive to the good vm and give it a drive letter. You can attach it by going into edit settings, add hard disk, browse to the system disk and add it.

At that point you should be able fully scan the system drive. Finally remove the hard drive from the helper vm (DO NOT DELETE FROM DISK) and try to bring up your exchange box again.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34885085
I've completed a full scan and nothing found.

Not a single item.

Any other suggestions please?
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34885088
If you can boot to safe mode try looking at event logs see if you see any errors
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34885424
Hmm.

Ok, I have found out what process is causing this.

LSASS.exe (local security authority)

When I kill this process off, all other services get starting, however.
When killing this off, the server gives a message that the server is going to restart in 1 minute, due to a server error. I assume this is to stop users from turning the security authority off.
0
 
LVL 16

Expert Comment

by:danm66
ID: 34885626
yes, that's typical behavior for lsass...  used to see it a lot back when the sasser virus was running rampant.
0
 
LVL 16

Expert Comment

by:danm66
ID: 34885657
don't forget, if you can get in with safe mode, you can use msconfig.exe to select which services to start and you can even filter out MS services.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 34885793
ive been through all that unfortunately.

looks like the only fix is

a) patch lsass somehow, cant find a way of doing that.
b) fresh install of exchange using /recover mode and use the vmdk that contains the exchange database and logs on the new vm to recover the mail.

can you think of anything i'll need to do, in case i shoot myself in the foot half way through b?
0
 
LVL 16

Expert Comment

by:danm66
ID: 34885856
if you've got enough space (but I take it from the thread that you don't) you could clone the disk before making changes to it.  You can clone it with the datastore browser or in the console with the command 'vmkfstools -i oldname.vmdk newname.vmdk'.

 The alternative to that is to take a snapshot, but that might be dangerous depending upon how much the process rewrites and how much free space you have.
0
 
LVL 117
ID: 34886939
upload the LSASS.exe file you have to http://www.virustotal.com/ just to check it's okay and the write one.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34888094
If you can get to a cmd prompt try

sfc /scannow

See if it will fix and system file inconsistancies
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 35068118
Sorry guys,

Looks like the reason this occured was because netbackup was having issues backing up the servers via VCB method.

It looks like there were several snapshots of this server hidden in the directory, taking up over 1TB or space and looks to have stopped the server from being able to read/write correctly. After rebooting it, it looks like it 'lost its way' back to the snapshot.

Thanks for all your help everyone, very much appreciate it.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 35069682
My first response (34884609) indicated that it was likel a problem with snapshots filling the disk. This was concurred with by several other experts. At one point (34884747) we even asked for a directory listing that was never received.
0
 
LVL 15

Author Comment

by:MarkMichael
ID: 35071878
Indeed, you are correct.

It's been a tough week. Sorry bgoering, I should have taken longer looking back at the answers.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

In this step by step tutorial with screenshots, we will show you HOW TO: Enable SSH Remote Access on a VMware vSphere Hypervisor 6.5 (ESXi 6.5). This is important if you need to enable SSH remote access for additional troubleshooting of the ESXi hos…
In this article, I will show you HOW TO: Install VMware Tools for Windows on a VMware Windows virtual machine on a VMware vSphere Hypervisor 6.5 (ESXi 6.5) Host Server, using the VMware Host Client. The virtual machine has Windows Server 2016 instal…
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now