[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Black screen of death SBS 2003

Posted on 2011-05-01
15
Medium Priority
?
1,340 Views
Last Modified: 2012-05-11
I have a server which was running for years without any issues and suddenly the other day starting freezing with a black screen. Running SBS 2003. It happens at strange times, so far never at the same time, and not in any sort of pattern I can really define, except always in the middle of the night between 9:30 PM and 6 AM. I've disabled backup jobs, Shadow Protect on the C:\ drive, VSS and WSUS, as well as all jobs in the task scheduler. Still it locks up. I saw someone suggested heavily fragmented c:\ drives could cause that, I checked and used Diskeeper to defrag my c: drive, took multiple passes all day with it until all the red fragmentation turned blue. I even freed up some space on the c:\ drive so it went from having 15% free to 30%. The black screen persists. The symptoms are like this - the server's running along great, until suddenly the screen on the monitor just turns black. No mouse, no icons. Num lock on the keyboard does not work. CD-ROM drive ejects ok. Server quits responding to PING and loses inbound and outbound network access. No blue screen, no error message. A reboot by holding down the power button on the front brings the server back up fine for a few hours, and then bam it goes down at random again. Checked the event logs, hoping to find a smoking gun like a common event id right before the lockup, but the event logs don't show any errors or warnings around the times of the lockups, and the last event logged is the winhttpautoproxy service logging it's standard information which shows up in my event logs all the time. I looked for memory.dmp but it was empty also. Any advice?
0
Comment
Question by:intelliwyse
  • 8
  • 3
  • 3
  • +1
15 Comments
 
LVL 3

Expert Comment

by:comphil
ID: 35500700
If the event viewer isn't catching anything, it may well be hardware related, possibly something to do with heat.  Have you checked RAM, disk usage etc. at the time to see if it spikes just before a crash?
0
 
LVL 10

Expert Comment

by:WayneATaylor
ID: 35500702
That sounds like a hardware issue to me.

My guess would be to check memory and processor.

Check things like processor and PSU fans are working OK, as maybe the processor is shutting down if its overheating.

If it;s SBS then there are a faw processes that are ran overnight, such as Exchange cleanup etc and these can cause a big increase in processor utilisation and hence can cause the processor to get hot!

Wayne



0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35500855
Yeah good points, the hardware is the next major thing I'm checking, the server's at a remote site so to check the hardware I have to load a backup image up as a virtual machine so users can run, but the I can pull the hardware and strip it down.  
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 39

Expert Comment

by:ChiefIT
ID: 35502422
Check power save and screen saver options. Some power save options shut down the nic card as well as the monitor. Hybernate, Sleep, or is it a coma?
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35698264
So far I've moved the os to a virtual machine to work on the physical machine. I ran extensive tests on memory, hard disk and cpu but didn't come back with anything conclusive. After swinging the OS from Physical to Virtual I haven't had any lockups either
0
 
LVL 3

Expert Comment

by:comphil
ID: 35698432
Any physical signs of failure?  Clogged fans etc?  What about temperature monitoring, can you see how well it's regulating CPU & chassis temperatures?
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35702389
Yeah I'll do a full internal cleaning of the chassis tomorrow but it's not very dusty at all, so far everything's spinning well and I don't hear any bad bearings when I listen close to the fans... Not getting any heat alarms but I don't know yet what temp the CPU's running at exactly.
Tomorrow Im swinging over the virtual machine back onto the physical.  I also figured out the "black screen of death" I said I was getting is really just because the machine was doing a hard-freeze while the monitor was in power save and since it was blank/off, it didn't wake up after the hardware locked up.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 35703075
You certain this isn't the other way around? Could the power save features cause hardware problems? It did in Windows Vista. The USB drivers wouldn't allow the computer from coming out of power save. Also, it conflicted with hardware, causing freezes. The resolve was to update the USB bus drivers. For 2003 server, maybe this will help:

http://msdn.microsoft.com/en-us/windows/hardware/gg463430
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35804967
Ok I've been fighting this ever since I made this post, so about 2 weeks and here's what I've discovered:

We noticed the server would freeze on a pattern, at a random hour, between 6:30 PM and 6:00 AM only, except for 2 out of 20 days when it froze about noon.
Most of the time it froze between 9:30 PM and 2:00 AM tho.

After eliminating the backup system as the problem (it's a Zenith Infotech BDR and I know they cause a VERY simialr issue) I figured it had to be either Exchange, anti-virus, or hardware. I couldn't find anything wrong with the hardware but that doesn't mean anything, but I disabled all the non-microsoft services, anti-virus software and even the entire Exchange system for 1 night a couple nights ago. That night was the first time in weeks the server didn't freeze.

I decided to focus on the anti-virus first.

When I disable deep scans of my anti-virus (sunbelt vipre) then the lockups quit happening now for 2 days. They were consistant daily, and the patter was trending to only during hours when a deep scan would be running.
Deep scans were scheduled to start at 9:00 AM, but I think they are scanning all the SQL and Exchange databases because the scans are taking between 9 and 20 hours to complete! Except they rarely completed because the server would freeze part way through, then when it was rebooted the deep scans would resume and sometimes by the end of the day they would freeze the system again, and I think that's were those 6:30 PM freezes were coming from.

I still have to wait to see if it stays stable for several more days before I issue the blessing that it's fixed but seriously looking like my Anti-Virus was causing the problem.

I've had Sunbelt Vipre installed on this server for over a year and it only started doing this Easter weekend and has been fairly relentless about it since. I have identical servrers (Dell Poweredge 1800 SBS 2003) running Sunbelt Vipre at other sites and I am not experiencing this issue with them yet.

Any thoughts?
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35804999
let me correct something, I said Deep Scans were scheduled to start at 9 am, I meant they are scheduled at 9 PM! they are scheduled to run at night. I also want to say the backup system actually takes incremental images of the server throughout the daytime every hour, and the backups do NOT run at night so backups never conflicted with the AV except when the backup server was taking backups during the daytime and the AV was also scanning because it had not completed from the night prior due to lockup.
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35805524
Well this is INTERESTING. I had just made my posts above talking about the AV being the root of the issue, and I was wrapping up my day. I a ticket from a user saying that they were't getting e-mail for the last couple days, and I knew this user was uniquely using the "Pop3 Mail Connector for Exchange" on the server.

When I concluded the AV was the issue, I had determined that by stopped all Exchange services and the AV services one night, and then finding that no lockups happened. I then had one of my techs restart the Exchange services, and the Anti-Virus services, but then I disabled regular deep scans on the AV.
Another night passed (last night) and the server still didn't lock up, so I concluded "AV's the issue"

Today when I went into the server to look at this ticket for this user, the first thing I checked was if my tech had restarted the Pop 3 Mail Connector for Exchange service. It was in a stopped state. I went ahead and started it back up at about 4:00 PM and did a "retrieve now" in the POP3 Mail Connector manager.

15 minutes later the server was frozen up. It's never frozen up at that time or anywhere near that time before. I had not frozen up in 2 days, making it from my Wednesday 8 AM reboot all the way until Friday at 4 PM, minutes after I started back up this one Exchange service.

I've got the service stopped now and I'm leaving it that way until I can see the server lock up with it stopped.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 35806547
What AV software and version are you using. There has to be a workaround for email to flow.
0
 
LVL 2

Author Comment

by:intelliwyse
ID: 35819997
I've worked around the issue with the POP3 Mail Connector by rerouting some mail, but that seems to be the trigger here.

The Exchange Pop3 Mail Connector is causing the server to hard freeze.

What I do not as of yet understand is this: During the whole duration of troubleshooting this, several weeks, the Pop3 Mail Connector for Exchange was running and triggering every 15 minutes - for 1 single mailbox. All the other users get their mail flow the traditional ways, except for this one mailbox, so it's not like the connector was being overloaded. Furthermore, this mailbox isn't a high volume box, it's just a basic, small time end user.

Now, the freeze ONLY occurred 75% of the time after 9:00 PM and before 6:00 AM, 20% of the time at 6:30 pm exactly, and 5% of the time, at noon.

So, there must be some other associated function that's also contributing to the trigger.

My anti-virus is Sunbelt Vipre, and I have the latest version installed and definitions I would go look it up precisely if it's really needed, but anyone have any thoughts on what I should do from here?

I'm contemplating an Exchange reinstall on SBS 03.

0
 
LVL 3

Accepted Solution

by:
comphil earned 2000 total points
ID: 35821341
Here's a couple of thoughts, don't know how much help they'll be though.  

1. The SBS Exchange POP3 Connector (as you're probably aware) is only a temporary half-way house, it was never designed to be used permanently.  It would be better to have your MX records changed so mail is delivered directly to your server via SMTP.

2. Perhaps there is a disk problem which is affected when the POP3 connector tries to write to its delivery folder?  Unlikely, I know, but the symptoms are very strange.

I've never heard of Sunbelt Vipre so can't offer any advice on that one.  However, as a general rule you could try setting an exception on the POP3 connector delivery folder.
0
 
LVL 2

Author Closing Comment

by:intelliwyse
ID: 36052179
Yes it was the pop3mail connector for sure. Very strange.
0

Featured Post

Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Setting up a Microsoft WSUS update system is free relatively speaking if you have hard disk space and processor capacity.   However, WSUS can be a blessing and a curse. For example, there is nothing worse than approving updates and they just have…
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question