Link to home
Start Free TrialLog in
Avatar of Leroy Luff
Leroy LuffFlag for South Africa

asked on

Hyper V machine crashing daily.

Good day,

Background :

We have 2 Dell servers with failover clustering installed.
We have 4 virtual machines configured in this cluster.

My 1 Hyper V machine is crashing on a daily basis. There is no clear reason in the logs as to why this is happening. The only consistent log i can find is everytime 4 - 10 seconds before it crashed this error gets generated :

The description for Event ID 56 from source Application Popup cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

SCSI
000000

the message resource is present but the message is not found in the string/message table



According to Microsoft this error should be ignored as its due to CSV disks not getting unique ID's or something to that line.

Any help?
Avatar of Robin CM
Robin CM
Flag of United Kingdom of Great Britain and Northern Ireland image

Is there anything in the event logs on the host that was running the VM when it crashed?
Especially look in Hyper-V-VMMS (all of them but especially the Storage log), Hyper-V-High-Availability, Hyper-V-Worker, FailoverClustering-CsvFs.

What is the stop code from the VM when it crashes?
ASKER CERTIFIED SOLUTION
Avatar of Zephyr ICT
Zephyr ICT
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Leroy Luff

ASKER

Hi,

@ Robin - It is code 41

@ sprav The hosts are working fine. It is an virtual machine crashing. Apology for confusion.

I will attempt the debug on the virtual machine and revert.
Seems like a hardware issue caused either by a RAID controller/controller driver or a SCSI device which might not be connected properly, terminated, defective or a bad/incompatible software.  Does this host have a tape drive connected to it and if so, could you disconnect it.  Are there backups running during the time when the host crashes?
@ Robin - Under FailoverClustering log not the FailoverClustering-CsvFs i get error 2051

[API] AccessCheck[AndAuditAlarm] failed.  status = 0x00000005

I doubit it is related as this error comes up for other Hyper V machines too and they are not crashing.

Else no errors on any of the other logs.
Oops...I read your question wrong.  So a VM is crashing and not the host.  In this cause, could you validate if you have any raw drive mappings as well as if the VM crashes when backups are running.
@ Mohammed - If it was a hardware failure or ISCSI device issues it would sure affect the other machines too?

Yes I am taking backups - Backups are happening on a Netapp using snapmanager for Hyper V(no actual externa HDD or tapes). Again i dont think its this as other machines are fine.
@ mohammed - VM crashes happens intermittently - not related to backup schedule.
I am probably a dummy for asking but what do you mean by raw mappings?
Have you got the full stop code?
Stop 0x00000041 indicates a driver problem: https://msdn.microsoft.com/en-us/library/windows/hardware/ff558974(v=vs.85).aspx
I'm thinking analyzing the dmp file of the VM will probably shed some more light on what is causing the crash, most likely, like robincm mentioned, it's a driver issue...
In the VM set your dump options to MiniDump (256KB). When the VM crashes there may be a C:\Windows\MiniDump\*.dmp file to have a look at.

If there is no dump file then one needs to look at hardware as one possible source.

Event ID 56 in previous experience was an NTFS error indicating corruption of the OS.

Are the logs on storage clear?
I have set it as follow :

Minidump 256 K

C:\Dump\crash.DMP

Will revert after the next crash.
There are freebie tools out there to analyze that .DMP file.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
After analyzing the dump file I noticed it is the Anti virus causing issues on the web server. I have uninstalled it and so far system is running stable. I will keep monitoring it over the next 2 days and then re-install Antivirus again.

I have to add this as other may learn from it : This is a good example of someone over thinking a problem. I thought it would be to do with ISCSI connection on cluster level when it was just the antivirus(to be confirmed) on the server itself. Perhaps Microsoft with their unclear events are to blame :)

Thank you all for participating.

Regards
It's not AVG by any chance?
No it was eset nod32 File security for servers.