Hyper V machine crashing daily.

Good day,

Background :

We have 2 Dell servers with failover clustering installed.
We have 4 virtual machines configured in this cluster.

My 1 Hyper V machine is crashing on a daily basis. There is no clear reason in the logs as to why this is happening. The only consistent log i can find is everytime 4 - 10 seconds before it crashed this error gets generated :

The description for Event ID 56 from source Application Popup cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

SCSI
000000

the message resource is present but the message is not found in the string/message table



According to Microsoft this error should be ignored as its due to CSV disks not getting unique ID's or something to that line.

Any help?
LVL 5
Leroy LuffHead of IT & DIgitalAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Robin CMSenior Security and Infrastructure EngineerCommented:
Is there anything in the event logs on the host that was running the VM when it crashed?
Especially look in Hyper-V-VMMS (all of them but especially the Storage log), Hyper-V-High-Availability, Hyper-V-Worker, FailoverClustering-CsvFs.

What is the stop code from the VM when it crashes?
Zephyr ICTCloud ArchitectCommented:
You mean 1 of the Hyper-v hosts is crashing? If that's the case I'm thinking hardware related, either disk or memory ... Or maybe even network related. Are drivers up to date, are they at the same level for both Hyper-v hosts, in other words, are they completely identical?

Is a dmp file created for each crash? Could you analyse it? You can do that with WinDbg and use this guide to make it somewhat understandable.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Leroy LuffHead of IT & DIgitalAuthor Commented:
Hi,

@ Robin - It is code 41

@ sprav The hosts are working fine. It is an virtual machine crashing. Apology for confusion.

I will attempt the debug on the virtual machine and revert.
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Mohammed KhawajaManager - Infrastructure:  Information TechnologyCommented:
Seems like a hardware issue caused either by a RAID controller/controller driver or a SCSI device which might not be connected properly, terminated, defective or a bad/incompatible software.  Does this host have a tape drive connected to it and if so, could you disconnect it.  Are there backups running during the time when the host crashes?
Leroy LuffHead of IT & DIgitalAuthor Commented:
@ Robin - Under FailoverClustering log not the FailoverClustering-CsvFs i get error 2051

[API] AccessCheck[AndAuditAlarm] failed.  status = 0x00000005

I doubit it is related as this error comes up for other Hyper V machines too and they are not crashing.

Else no errors on any of the other logs.
Mohammed KhawajaManager - Infrastructure:  Information TechnologyCommented:
Oops...I read your question wrong.  So a VM is crashing and not the host.  In this cause, could you validate if you have any raw drive mappings as well as if the VM crashes when backups are running.
Leroy LuffHead of IT & DIgitalAuthor Commented:
@ Mohammed - If it was a hardware failure or ISCSI device issues it would sure affect the other machines too?

Yes I am taking backups - Backups are happening on a Netapp using snapmanager for Hyper V(no actual externa HDD or tapes). Again i dont think its this as other machines are fine.
Leroy LuffHead of IT & DIgitalAuthor Commented:
@ mohammed - VM crashes happens intermittently - not related to backup schedule.
I am probably a dummy for asking but what do you mean by raw mappings?
Robin CMSenior Security and Infrastructure EngineerCommented:
Have you got the full stop code?
Stop 0x00000041 indicates a driver problem: https://msdn.microsoft.com/en-us/library/windows/hardware/ff558974(v=vs.85).aspx
Zephyr ICTCloud ArchitectCommented:
I'm thinking analyzing the dmp file of the VM will probably shed some more light on what is causing the crash, most likely, like robincm mentioned, it's a driver issue...
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
In the VM set your dump options to MiniDump (256KB). When the VM crashes there may be a C:\Windows\MiniDump\*.dmp file to have a look at.

If there is no dump file then one needs to look at hardware as one possible source.

Event ID 56 in previous experience was an NTFS error indicating corruption of the OS.

Are the logs on storage clear?
Leroy LuffHead of IT & DIgitalAuthor Commented:
I have set it as follow :

Minidump 256 K

C:\Dump\crash.DMP

Will revert after the next crash.
Philip ElderTechnical Architect - HA/Compute/StorageCommented:
There are freebie tools out there to analyze that .DMP file.
Robin CMSenior Security and Infrastructure EngineerCommented:
For info, here's a very competent person talking about analysing a crash dump: http://blogs.technet.com/b/markrussinovich/archive/2011/01/29/3374563.aspx
Especially note the line "Checking for a newer version of any third-party drivers displayed in this basic analysis often leads to a fix." It is rare that a Microsoft driver causes a bug check, so think about what software you've installed into the VM and if that could be the cause.
Leroy LuffHead of IT & DIgitalAuthor Commented:
After analyzing the dump file I noticed it is the Anti virus causing issues on the web server. I have uninstalled it and so far system is running stable. I will keep monitoring it over the next 2 days and then re-install Antivirus again.

I have to add this as other may learn from it : This is a good example of someone over thinking a problem. I thought it would be to do with ISCSI connection on cluster level when it was just the antivirus(to be confirmed) on the server itself. Perhaps Microsoft with their unclear events are to blame :)

Thank you all for participating.

Regards
Robin CMSenior Security and Infrastructure EngineerCommented:
It's not AVG by any chance?
Leroy LuffHead of IT & DIgitalAuthor Commented:
No it was eset nod32 File security for servers.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Hyper-V

From novice to tech pro — start learning today.