Solved

Server 2003: 3 Servers Blue Screen at the Same Time (VM's)

Posted on 2010-08-17
14
870 Views
Last Modified: 2012-05-10
This morning around 2:15am, three of our servers crashed, all at the same time (within a minute) and were hung on this blue screen:

http://i210.photobucket.com/albums/bb65/djfrost143/work/Untitled-1.jpg

The event logs on all of the servers that blue screened are pretty clean. The only events leading up to the blue screen’s around 2:15 were automatic update services, which only started and stopped on all servers. No updates were actually installed.

These servers are VM's. There are 13 VMs all running Server 2003 on this one Proliant DL585 G2 ESX server. Only 3 or them crashed.

If you look in VIC, and click the performance tab à Change Chart Options and sort by “last day”, you can see that around 2:15 – 2:20am, all of the servers that blue screened had a sudden spike in CPU usage, all around the same time. If you look at the other servers, their CPU usage remained stable.

                 It’s hard to tell what caused the Blue Screens. Many times they are caused from Windows Updates, or hardware failures. However, I believe if it was a hardware issue, all of the other VM’s would have crashed also. But, there’s also nothing software related going on in the event logs leading up to the crash.

               Any ideas as to what could have cause this or how to dig deeper and what to look for?

Untitled-1.jpg
0
Comment
Question by:NoneProfit
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 3
  • +3
14 Comments
 
LVL 4

Expert Comment

by:Joediggity2
ID: 33454604
What time are your windows updates set to apply (if they are set to apply automatically)?  
0
 
LVL 5

Expert Comment

by:truromeo4juliet
ID: 33454756
Yesterday was the 2nd tuesday of the month (Microsoft update day)... Chances are, they all took the same update and decided to hose themselves... try booting to safe mode in each VM and restoring to an earlier time using MS System Restore, or booting to repair mode and performing the same process
0
 
LVL 5

Expert Comment

by:Stappmeyer
ID: 33454776
Any chance they are all on the same datastore that has filled up?
0
Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

 
LVL 4

Expert Comment

by:Joediggity2
ID: 33454795
Also, was it a one time event (servers came back after reboot) or are they still dead?
0
 
LVL 5

Expert Comment

by:truromeo4juliet
ID: 33454807
I'm sorry TODAY is the 2nd tuesday, of the month, but the update could've been small and fast, but reckless and deadly like a bullet :( Continue with my advise above
0
 

Author Comment

by:NoneProfit
ID: 33454821
They are actually not set to install untill 3am nightly, set by a GPO.
In the event log, the events leading up to the crash are as follows:

12:51am  Service Control Manager  7035  The LiveUpdate service was successfully sent a start control
12:51am  Service Control Manager  7036  The LiveUpdate service entered the running state
12:51am  Service Control Manager  7036  The LiveUpdate service entered the stopped state
12:53am  Service Control Manager  7035  The LiveUpdate service was successfully sent a start control
12:53am  Service Control Manager  7036  The LiveUpdate service entered the running state
12:53am  Service Control Manager  7036  The LiveUpdate service entered the stopped state

And the next error was:

8:16am  eventlog  6008  The previous system shutdown at 2:16am on 8/17/2010 was unexpected

And what I am just realizing now is that in the application log, it shows:

12:52am  Symantec Antivirus  7  New virus definition file loaded. Version 120816p.

So the LiveUpdate service is not windows update, its symantec. Maybe the virus definition caused the BS? But, that was at 1am and the crash was at 2:15am, and none of the other servers crashed. ...
0
 

Author Comment

by:NoneProfit
ID: 33454837
Oh, and yes the servers are back up and running after a reboot. I am sorry I forgot to mention that, as important as it is. I just am looking to figure out why it happened to prevent future occurances.
0
 
LVL 5

Accepted Solution

by:
truromeo4juliet earned 500 total points
ID: 33454904
Just re-read your original post... if you had a spike in CPU usage, it could be that the 3 VM's ran the scheduled automatic update task and then encountered a paging issue... try zeroing the paging files in each of the 3 VM's and re-enable them... alternatively, run chkdsk C: /f inside each of the 3 VM's to find errors, I guess. I'm out of ideas.
0
 

Author Comment

by:NoneProfit
ID: 33454979
@ Stapmeyerr, they are all on the same datastore, but there is over 200GB free.
@ Truromeo, that would make sense, only they did not crash until 2:15 and the scheduled updates (symantec) ran at 1. When you say re-zero the page files, I am not sure what you mean by that. Isn't that when the page files are cleared out? Would that be accomplished by a restart or is a manual way possible while server is in production?
0
 
LVL 5

Expert Comment

by:truromeo4juliet
ID: 33455065
Yes, clear the page files out... it would be accompanied by a restart, then another restart when you re-enable the paging file.
0
 
LVL 4

Expert Comment

by:Joediggity2
ID: 33455358
Even though windows updates are set to run at 3:00am, If I remember correctly they actually have a 60 minute randomization in them so all the computers do not get updates at exactly the same time.  On the Symantec side, after the liveupdate, depending on the settings a scan is done either on active files or in some cases full or partial scans.  There is a chance something happened during the scan.
0
 
LVL 13

Expert Comment

by:leegclystvale
ID: 33455390
hmmmmmm.......I have seen the word "Symantec" in your post....... coincidental?......I don't think so :o)
0
 
LVL 6

Expert Comment

by:JRoyse
ID: 33466352
0
 

Author Closing Comment

by:NoneProfit
ID: 33586247
Haven't had any issue's since. There was no real resolution other than a reboot. Still not sure why it happened, I guess we will see if it happens again.
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
Ransomware is a malware that is again in the list of security  concerns. Not only for companies, but also for Government security and  even at personal use. IT departments should be aware and have the right  knowledge to how to fight it.
Teach the user how to convert virtaul disk file formats and how to rename virtual machine files on datastores. Open vSphere Web Client: Review VM disk settings: Migrate VM to new datastore with a thick provisioned (lazy zeroed) disk format: Rename a…
Advanced tutorial on how to run the esxtop command to capture a batch file in csv format in order to export the file and use it for performance analysis. He demonstrates how to download the file using a vSphere web client (or vSphere client) and exp…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question