Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 584
  • Last Modified:

Server on single ESX node, rebooted and services fail

Hi all,

We've have this customer for about 1.5 years.
When they came to us, they required 2 servers on a dedicated server. So we put in a single vmware esx node (as they dont need HA) and used local storage to install 2 servers

1 to run exchange 2007
1 to run sharepoint on it

Last week, we rebooted 1 of their servers and it appears that the server came up, but no one could connect to it. After looking at the server, it looks like the vmware tools hadn't loaded and neither did ALOT of the other automatic services start up, didnt start and we were unable to start them.

To fix this, we had to rebuild the server and restore data.

Today, I have rebooted the Exchange server and this has happened again, to a different server on the same ESX host.

After logging into the console for ESX i can see an alert for disk usage which is red.

'Datastore usage on disk' - I'm not sure if this is any cause, but may be helpful.

Has anyone seen this issue before?


0
MarkMichael
Asked:
MarkMichael
  • 11
  • 9
  • 7
  • +1
1 Solution
 
bgoeringCommented:
It sounds like you may have snapshots which has filled up the hard disk. How much free space do you have on your local datastore.
0
 
MarkMichaelAuthor Commented:
Capacity: 1.63TB
Provisioned Space: 1.54TB
Free space: 98.01GB

Is this, possibly not enough space to make a snapshot?

Both servers, take up a total of 65GB of used space.

Do you think there is a possibility of finding this snapshot?

0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Check the VM properties, Snapshot Manager, do you have a snaphotis listed there, do you use Snapshots for backup?
0
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 
bgoeringCommented:
That is only about 6% free space on your datastore - a little bit tight. I try to keep 15% to 20% Free.

As Hanccocka says check for any snapshots in your administration client, Either Virtual Infrastruction Client, or vSphere Client - depending on your version of ESX. There will be an icon on the toolbar that looks kind of like a clock with a wrench on it - click that to get into snapshot manager.

Also let us know how large of hard drives are allocated to your virtual machines.
0
 
MarkMichaelAuthor Commented:
Nothing showing in the snapshot manager.

Just the simple 'You are here.' meaning I'm at the latest.

Could this have possibly made a snapshot and it didn't show up?

Is this sometime I can try and resolve do you think?
0
 
MarkMichaelAuthor Commented:
Server 1:

System drive: 100GB
Disk 2: Pagefile disk of 8GB
Data drive: 256GB
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
have a look at the datastore for snapshots, and post screengrab here...

is the datastore used for anything else other than VMs?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
confused now, you said

"Both servers, take up a total of 65GB of used space."

but server 1 takes 364GB? (unless they are thin provisioned)

0
 
MarkMichaelAuthor Commented:
Server 2

The

System - 50GB
Pagefile - 8GB
Data Disk that uses these 4 drives:

a. 256GB
b. 256GB
c. 256GB
d. 256GB

(theres also an old VM that we kept (100GB System drive on the same store))

0
 
MarkMichaelAuthor Commented:
Sorry, when I said used space, I mean their actual space showing in Windows, when adding together.

I think it's all thick provisioned. I'm no ESX expert.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
okay, any Snapshots on any VMs?
0
 
bgoeringCommented:
As for the alarm - with 6% free space that would be a normal alarm. But if you have no snapshots (let us verify by posting a directory listing for each server) th 98 GB Free space might be enough for now. Can't remember the defaults, but 6% would definately be a red alarm.

Did you have no alarms before?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
okay so that totaled is 1.5TB server 1, server 2, and old VM.

So totalling it all up is 1.6TB. (with the free space).

that's very tight, and I've not included the swap space, needed for each VM equal to memory.

So that's certainly why, you've got a disk alert.

What ever you do DONT start using Snapshots, or any other Backup product, that uses them, Veeam, vRanger, vDR etc.
0
 
bgoeringCommented:
It sounds like all of the space can pretty much be accounted for by allocated drives. I am thinking this is possibly a virus problem. Do you have virus protection on your servers?

go to http://malwarebytes.org download and run the free scanner there. You may have to download it on another box, burn to cd, mount the cd on your windows box, and boot your troublesome vm into safe mode in order to run it.
0
 
MarkMichaelAuthor Commented:
Hi there,

You suggest downloading this and creating an ISO to connect it to the server? This will run within windows i guess?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Personally, I would use Microsoft (yes they do have a free virus, malware checker) Security Checker

http://www.microsoft.com/security_essentials/

download and install, direct on server.
0
 
bgoeringCommented:
Yes, its a malware scanner - it does an install then trys to download a database. It won't be able to do that in safe mode. Run a full scan in safe mode, reboot and hopefully it will let you in. Let malwarebytes update itself and run another full scan.

Either burn to cd or to iso and connect the iso to your vm in order to get the scanner on the possibly infected machine.
0
 
bgoeringCommented:
I was just thinking - it might be easier to install it on your other vm - scan it first.

Then power down your troublesome vm and attach the system hard drive to the good vm and give it a drive letter. You can attach it by going into edit settings, add hard disk, browse to the system disk and add it.

At that point you should be able fully scan the system drive. Finally remove the hard drive from the helper vm (DO NOT DELETE FROM DISK) and try to bring up your exchange box again.
0
 
MarkMichaelAuthor Commented:
I've completed a full scan and nothing found.

Not a single item.

Any other suggestions please?
0
 
bgoeringCommented:
If you can boot to safe mode try looking at event logs see if you see any errors
0
 
MarkMichaelAuthor Commented:
Hmm.

Ok, I have found out what process is causing this.

LSASS.exe (local security authority)

When I kill this process off, all other services get starting, however.
When killing this off, the server gives a message that the server is going to restart in 1 minute, due to a server error. I assume this is to stop users from turning the security authority off.
0
 
Danny McDanielClinical Systems AnalystCommented:
yes, that's typical behavior for lsass...  used to see it a lot back when the sasser virus was running rampant.
0
 
Danny McDanielClinical Systems AnalystCommented:
don't forget, if you can get in with safe mode, you can use msconfig.exe to select which services to start and you can even filter out MS services.
0
 
MarkMichaelAuthor Commented:
ive been through all that unfortunately.

looks like the only fix is

a) patch lsass somehow, cant find a way of doing that.
b) fresh install of exchange using /recover mode and use the vmdk that contains the exchange database and logs on the new vm to recover the mail.

can you think of anything i'll need to do, in case i shoot myself in the foot half way through b?
0
 
Danny McDanielClinical Systems AnalystCommented:
if you've got enough space (but I take it from the thread that you don't) you could clone the disk before making changes to it.  You can clone it with the datastore browser or in the console with the command 'vmkfstools -i oldname.vmdk newname.vmdk'.

 The alternative to that is to take a snapshot, but that might be dangerous depending upon how much the process rewrites and how much free space you have.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
upload the LSASS.exe file you have to http://www.virustotal.com/ just to check it's okay and the write one.
0
 
bgoeringCommented:
If you can get to a cmd prompt try

sfc /scannow

See if it will fix and system file inconsistancies
0
 
MarkMichaelAuthor Commented:
Sorry guys,

Looks like the reason this occured was because netbackup was having issues backing up the servers via VCB method.

It looks like there were several snapshots of this server hidden in the directory, taking up over 1TB or space and looks to have stopped the server from being able to read/write correctly. After rebooting it, it looks like it 'lost its way' back to the snapshot.

Thanks for all your help everyone, very much appreciate it.
0
 
bgoeringCommented:
My first response (34884609) indicated that it was likel a problem with snapshots filling the disk. This was concurred with by several other experts. At one point (34884747) we even asked for a directory listing that was never received.
0
 
MarkMichaelAuthor Commented:
Indeed, you are correct.

It's been a tough week. Sorry bgoering, I should have taken longer looking back at the answers.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 11
  • 9
  • 7
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now