Solved

a Win 2008 R2 VM BSoD after on high CPU/IO utilization during an AV scan

Posted on 2014-09-04
12
501 Views
Last Modified: 2014-09-23
We have a VM with guest OS Win 2008 R2 which BSoD'ed after it was
on high CPU/IO (CPU utilization of about 50-65%) for more than hour.

The Realtime TrendMicro Deep Security anti-malware scan processes
was identified as the main CPU/IO consumer: from TrendM's diagnostics
logs, it was scanning a folder with several huge zip files (each of 0.5 to
1.5 GB in size) & then it BSOD'ed : refer to the blue screen attached

TrendM analysed the memory crash dump & doesn't find any of its
codes in the memory dump.

Q1:
Any idea what is the root cause based on the attached blue screen?
I may need to convert the event viewer logs to csv format, sanitize it
before I can attach here.

Q2:
Can't attach the large memory dump as it may contain sensitive info.
I don't have MS support service.  Is there any other way I can get
the memory dump analysed?

Q3:
Someone told me it's a common occurrence that when a VM (on VMware
hypervisor) is on high CPU/resource utilization, it will BSoD at some
point in time.  Anyone encounters this or has a KB to show this?
0
Comment
Question by:sunhux
  • 8
  • 4
12 Comments
 

Author Comment

by:sunhux
ID: 40303885
http://esupport.trendmicro.com/solution/en-US/1096120.aspx

Extracted from the above, url is a statement indicating BSOD when
scanning large number of files.  How can I determine my BSOD is
due to the above scenario?  Our VM has 8GB RAM.

With DSVA’s 4GB minimum memory requirement, BSOD sometimes occurs during continuous scanning of a large number of files. This is an issue with the VMware vShield Endpoint Driver. It has been reported to VMware for further investigation.
0
 

Author Comment

by:sunhux
ID: 40304177
0
 

Author Comment

by:sunhux
ID: 40305065
0
 
LVL 34

Accepted Solution

by:
Seth Simmons earned 500 total points
ID: 40305158
Any idea what is the root cause based on the attached blue screen?

A stop 0x24 is NTFS_FILE_SYSTEM.  How many files are we dealing with here?
Have you manually ran chkdsk on that volume or has the system done it automatically on boot?

Is there any other way I can get the memory dump analysed?

you can try WhoCrashed or use dumpchk

WhoCrashed Introduction
http://www.resplendence.com/whocrashed

DumpChk
http://msdn.microsoft.com/en-us/library/windows/hardware/ff542776(v=vs.85).aspx

Anyone encounters this or has a KB to show this?

I personally haven't seen it.  Once place where I worked there was an xp guest that had a process causing a CPU spike.  It was like that for months (the owner didn't bother with it) but it never crashed; just triggered CPU usage alerts in VCenter.

How can I isolate if my situation is similar to what's described in the above links' cases?

That first article is nearly 2 years old so it may have been fixed by now.  The VMware article refers to a different stop code so probably doesn't apply in your case.

What version of vmware are you using?  The article did cite the fact the issue was resolved with an ESX 5.0 patch.
0
 

Author Comment

by:sunhux
ID: 40305174
> How many files are we dealing with here?
I can't access the VM but I won't be surprised that the huge zip files
each may have hundreds of thousands of files in each of them.

We are not vSphere 5.0 Update 1
0
 

Author Comment

by:sunhux
ID: 40305176
How could we determine if TrendM's deep security is creating lots of temporary
files while scanning those huge zip files?

No, did not run chkdsk on that E: volume.

Ever since we disable the AV scan for that drive, we never have any high CPU
nor BSoD.  Perhaps I should ask my customer to send me those huge zips &
let me attempt to scan it on a test VM with Deep Security as that customer's
VM is production
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 

Author Comment

by:sunhux
ID: 40305180
I can't run whocrash on that Prod VM : if I copy the memory dump
over to my laptop, how can I make whocrash read this memory dump
& analyse it?  Or any other tool that could do this?
0
 
LVL 34

Expert Comment

by:Seth Simmons
ID: 40305191
have you tried contacting trend support?
i've used a number of their products over the years but not deep security

you can install the SDK on your notebook that includes dumpchk
0
 

Author Comment

by:sunhux
ID: 40305193
One of my colleague run an analyser against the dump & got the following.
I don't know how to interpret so appended below.

So what's the root cause & how to go about fixing it (ie preventive measure)
permanently :

========================================================

BugCheck 24, {1904fb, fffff88001fce4b8, fffff88001fcdd10, fffff88001435bc4}

Probably caused by : Ntfs.sys ( Ntfs!NtfsAddToWorkqueInternal+84 )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NTFS_FILE_SYSTEM (24)
    If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
    parameters are the exception record and context record. Do a .cxr
    on the 3rd parameter and then kb to obtain a more informative stack
    trace.
Arguments:
Arg1: 00000000001904fb
Arg2: fffff88001fce4b8
Arg3: fffff88001fcdd10
Arg4: fffff88001435bc4

Debugging Details:
------------------


EXCEPTION_RECORD:  fffff88001fce4b8 -- (.exr 0xfffff88001fce4b8)
ExceptionAddress: fffff88001435bc4 (Ntfs!NtfsAddToWorkqueInternal+0x0000000000000084)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 00000000000014a8
Attempt to read from address 00000000000014a8

CONTEXT:  fffff88001fcdd10 -- (.cxr 0xfffff88001fcdd10)
rax=0000000000000002 rbx=0000000000000000 rcx=fffffa80cb53a510
rdx=fffffa80cb53a510 rsi=fffffa80cc7634c0 rdi=0000000000000000
rip=fffff88001435bc4 rsp=fffff88001fce6f0 rbp=00000000000014a8
 r8=0000000000000000  r9=0000000000000001 r10=fffffa80c150ee80
r11=fffffa80cce57090 r12=0000000000000000 r13=0000000017fd0900
r14=fffff80001d11900 r15=fffffa80cb53a510
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
Ntfs!NtfsAddToWorkqueInternal+0x84:
fffff880`01435bc4 817d00e8030000  cmp     dword ptr [rbp],3E8h ss:0018:00000000`000014a8=????????
Resetting default scope

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  00000000000014a8

READ_ADDRESS:  00000000000014a8

FOLLOWUP_IP:
Ntfs!NtfsAddToWorkqueInternal+84
fffff880`01435bc4 817d00e8030000  cmp     dword ptr [rbp],3E8h

FAULTING_IP:
Ntfs!NtfsAddToWorkqueInternal+84
fffff880`01435bc4 817d00e8030000  cmp     dword ptr [rbp],3E8h

BUGCHECK_STR:  0x24

LAST_CONTROL_TRANSFER:  from fffff8800152954e to fffff88001435bc4

STACK_TEXT:  
fffff880`01fce6f0 fffff880`0152954e : 00000000`00000000 00000000`00000000 fffffa80`cce57300 fffffa80`cce57090 : Ntfs!NtfsAddToWorkqueInternal+0x84
fffff880`01fce760 fffff800`01a8aa77 : fffff800`01c524f0 fffff8a0`17fd09a0 fffff8a0`17fd0900 fffffa80`cb53a510 : Ntfs! ?? ::NNGAKEGL::`string'+0xdf4e
fffff880`01fce7a0 fffff800`01be89f1 : fffff8a0`17fd09a0 00000000`00000000 fffff880`01fceb20 fffff8a0`17fd09a0 : nt!FsRtlpRemoveAndCompleteWaitingIrp+0x123
fffff880`01fce800 fffff800`01e26091 : fffff8a0`17fd09a0 fffffa80`cce573a0 fffff8a0`15d8bb40 00000000`00000000 : nt!FsRtlpAcknowledgeOplockBreakByCacheFlags+0x7d1
fffff880`01fce8e0 fffff880`014b21d6 : fffff8a0`15d8bc70 fffff880`00000000 00000000`00090240 00000000`00000001 : nt! ?? ::NNGAKEGL::`string'+0x39b53
fffff880`01fce970 fffff880`014ebf8e : fffffa80`df915e40 fffffa80`cce57090 fffffa80`00000000 fffff880`00000000 : Ntfs!NtfsOplockRequest+0x176
fffff880`01fcea00 fffff880`014ec33d : fffffa80`df915e40 00000000`00000000 fffff880`01fceb20 00000000`00000000 : Ntfs!NtfsUserFsRequest+0x7e
fffff880`01fcea40 fffff880`01270bcf : fffff880`01fceb90 fffffa80`cce57090 fffff880`01fceb00 fffffa80`df915e40 : Ntfs!NtfsFsdFileSystemControl+0x13d
fffff880`01fceae0 fffff880`0129095e : fffffa80`d00dc7e0 00000000`00000001 fffffa80`d00dc700 fffffa80`cce57090 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`01fceb70 fffff880`03c19b49 : fffffa80`cce573a0 00000000`00000000 fffffa80`c7a48760 01cfc691`5f03db20 : fltmgr!FltpFsControl+0xee
fffff880`01fcebd0 fffff880`03c0e5ec : fffffa80`d00dc7e0 01cfc691`5f03db20 fffff800`01c7c200 fffffa80`c7a48760 : srv2!Smb2LeaseAcknowledge+0x239
fffff880`01fcec30 fffff800`01ae0261 : fffff880`03c07000 fffff800`01c7c280 fffffa80`c14ef040 fffffa80`00000003 : srv2! ?? ::FNODOBFM::`string'+0x418a
fffff880`01fcecb0 fffff800`01d74bae : 00000318`878b4822 fffffa80`c14ef040 00000000`00000080 fffffa80`c14df890 : nt!ExpWorkerThread+0x111
fffff880`01fced40 fffff800`01ac78c6 : fffff880`009bf180 fffffa80`c14ef040 fffff880`009ca0c0 3070ba0f`00000080 : nt!PspSystemThreadStartup+0x5a
fffff880`01fced80 00000000`00000000 : fffff880`01fcf000 fffff880`01fc9000 fffff880`01fce9e0 00000000`00000000 : nt!KiStartSystemThread+0x16


SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  Ntfs!NtfsAddToWorkqueInternal+84

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: Ntfs

IMAGE_NAME:  Ntfs.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  5167f5fc

STACK_COMMAND:  .cxr 0xfffff88001fcdd10 ; kb

FAILURE_BUCKET_ID:  X64_0x24_Ntfs!NtfsAddToWorkqueInternal+84

BUCKET_ID:  X64_0x24_Ntfs!NtfsAddToWorkqueInternal+84

Followup: MachineOwner
---------

0: kd> lmvm Ntfs
start             end                 module name
fffff880`01432000 fffff880`015d4000   Ntfs       (pdb symbols)          downstreamstore\ntfs.pdb\0842A8FED1C5463FB4078078781F5C622\ntfs.pdb
    Loaded symbol image file: Ntfs.sys
    Image path: \SystemRoot\System32\Drivers\Ntfs.sys
    Image name: Ntfs.sys
    Timestamp:        Fri Apr 12 19:54:36 2013 (5167F5FC)
    CheckSum:         001A27D8
    ImageSize:        001A2000
    File version:     6.1.7601.18127
    Product version:  6.1.7601.18127
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     ntfs.sys
    OriginalFilename: ntfs.sys
    ProductVersion:   6.1.7601.18127
    FileVersion:      6.1.7601.18127 (win7sp1_gdr.130412-0013)
    FileDescription:  NT File System Driver
    LegalCopyright:   © Microsoft Corporation. All rights reserved.
0
 
LVL 34

Expert Comment

by:Seth Simmons
ID: 40305199
everything points to the file system driver (ntfs.sys) so it could be something with the zip files
perhaps it is coming across a zip file that is corrupt or large number of files/folders?
if you're able to setup a test environment it's worth trying to reproduce
0
 

Author Comment

by:sunhux
ID: 40305981
Not that easy to clone that VM as it's huge with a few RDM LUNs.
We don't have the luxury of getting so much storage.

I'll post the event viewer logs in 20 hours
0
 
LVL 34

Expert Comment

by:Seth Simmons
ID: 40319361
any update on this?
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I will show you HOW TO: Install VMware Tools for Windows on a VMware Windows virtual machine on a VMware vSphere Hypervisor 6.5 (ESXi 6.5) Host Server, using the VMware Host Client. The virtual machine has Windows Server 2016 instal…
In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now