Solved

Windows Server Rebooting itself

Posted on 2008-10-13
41
5,478 Views
Last Modified: 2012-04-30
We have a Windows Server 2008 that is rebooting itself on an almost daily basis.  The error code in the Reliability Monitor is:
0x00000073 (0xfffffffffc0000005, 0xfffff80001af490e, 0xfffffa600

This started with a particular domain we hosted on the server, everytime the domain is accessed, the computer would reboot with the same error message as above.  However, the computer is now rebooting at an almost random level.

Our support personnel are blaming the issue on a compromised server, claiming that on a particular day we had an FTP attack via brute force.  However, no log information suggests the intruder got in and we only have 5 valid FTP usernames, none of which are "Administrator" or "User", the only 2 usernames used in the attack.  Further, we restrict all five usernames to their home directory, so I am skeptical that an attack is the cause of the error.  He also points to "unusual" log activity centered around the windows update installer that starts at odd times, but I don't see how this is related to an FTP server compromise.  His information is not conclusive.

My research points to a RAM/harddrive problem, and when I run a chkdsk, it does report errors though I have stopped short of letting the system fix the errors.  I need to be able to rule out that the server was compromised and then determine how to resolve the issue so that we do not have these random reboots.  I am including the error log information from the Event Viewer that is present at each reboot.  Unfortunately, this and the error codes referenced above is all that I have to go with.


shutdown-events-10122008.txt
module-installer-events-10012008.txt
0
Comment
Question by:dageyra
  • 18
  • 13
  • 7
  • +2
41 Comments
 
LVL 31

Expert Comment

by:James Murrell
Comment Utility
is it a dell: reason i ask is we had a problem and we found there was a updated video/display drive on dell support site, that fix it
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Is this a 64-bit system? Do you have Hyper-V running?
0
 
LVL 10

Expert Comment

by:RubalJ
Comment Utility
This seems to be a hardware issue to me ? Is it a new system? Did you installl/add any new hardware to this machine?
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
This is a Dell, but all drivers are up-to-date.  It is a 64-bit system, and I do not know what Hyper-V is--is it enabled by default?

This is not a new system and there are no new hardware changes (we do not have physical access, only remote).

Right now we are thinking it's related to DDoS attacks on the SQL server.  There were about 10 connections/second non-stop.  I believe this overloaded the network card, and the server failed to handle any further requests.  We have enabled more hardware auditing to determine if this is the cause and we have put in place strict firewall rules.  We have to wait a few days to see if the machine goes down again.
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Can you go to your Local Area Connection then click Configure and then Advanced. Do a screenshot of the Advanced Tab for me.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
First, if you suspect it is a hardware error, run Dell diagnostic as this instruction:
http://www.cs.utexas.edu/users/deke/laptopsupport/manuals/d600/diag.htm

Hyper-V is not free and default. It's additional license required. So you may not have it set up.
What model of this server? What processor on it?
What SQL version and is it 64 bit as well?
What other application on it (including anti virus software version)?

Your attached event id is N/A to the reboot. MS said to be ignored on this error:
http://technet.microsoft.com/en-us/library/cc727677.aspx
http://technet.microsoft.com/en-us/library/cc774485.aspx

K




0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello InKevin:

I cannot run the Dell diagnostic utility as per those instructions--they require physical access.  We do not have physical access to the machine, only RDP.

It is a Dell PowerEdge 2950 with 2 Dual-Core Xeon 5130 processors with Windows Server 2008 x64 and SQL Server 2005.  We use Mail Enable Professional, Dell OpenManage Server, Diskeep 2008 EnterpriseServer, Junction Link Magic 1.0, .NET 2.0, FrontPage Server Extensions 2002, Visual Studio 2005, PowerShell Plus, WinRAR, Windows Live OneCare.

Hello darisug:

I have attached the screenshot you requested.

Thanks to everyone for their assistance, this is a major issue for us right now.
network-config-advanced.jpg
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
Here is another way of running diagnostic without physical touch on the server. Download, install and run this program to diagnose your hardware:
http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R178762&SystemID=PWE_2950&servicetag=&os=LHS64&osl=en&deviceid=15852&devlib=0&typecnt=0&vercnt=2&catid=-1&impid=-1&formatcnt=2&libid=13&fileid=243648

http://www.dell.com/downloads/global/power/ps3q05-20040302-Jayakumar.pdf

Other thing to look is if the diagnostic process does not show your server temperature (most likely it does) you can check the temperature through Open Manage Server Management. Server reboots itself normally as a result of overheat, failed fans....
I will dig more on Windows 2008 to see if there is any issue with it. Keep you posted.

K
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
Another possibility is mismatch drivers, controller drivers will cause the same reboot issue. Make sure you uncheck the Auto reboot option under start up and recovery setting so it can release the error (BSOD) instead of a none sense reboot.

K
setup.doc
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Try disabling IP Checksum and IP Large Send Offload.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello inkevin & dariusq:

I will try both of your suggestions this weekend.  dariusq, I am not sure how to proceed with your suggestions, I will use Google but may need some additional help from you.  Inkevin, I am not comfortable with unchecking the auto reboot setting as that would require getting someone to physically reboot the machine, and again as we do not have physical access and we never know when the reboots will happen, that could take the server down for a very extended period of time (until someone notices it had gone down).
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Go to this within the Local Area Connection where I asked for the print screen then select those option then disable  them.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello:

dariusq, I applied your suggested changes but the machine is still rebooting.

inkevin, I downloaded the program you suggested but it requires that the network card be taken down (it's erroring that it does not have direct access to the device due to being in use).  This goes along with the lack of physical access.  I have informed the server folks that I want this test run and now I must wait for them to get around to it.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
Thanks for the update. I will wait until you have diag results and ensure that HW is OK.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
I received the results, though not as detail as I'd like.  All hardware that was tested came back OK, it's like pulling teeth determining what was tested, but the network card at a minimum.  They have thrown the re-image into the hat, I am postponing in order to get a more accurate picture.

Any other ideas as just a place to look to determine what's causing the reboot?  Is there no way to save the blue screen info aside from letting the machine stop completely on halt?
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
You should still be under warranty from Dell. Have you called them? They have a huge knowledge base that the techs can look into to see if your problem is listed. If not they will send a tech to start replacing hardware for free since you are under warranty.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
You may just have a hardware, software, or drivers conflict. Since this server is running 64 bit, you may want to make sure all additional hardware, software (other than Dell factory) are compatible with 64 bit.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hey Guys:

Thanks for continuing to look into this for me.  Someone successfully analyzed the crash dump and the problem was traced as far as we could get.  It's related to (we believe) some web application creating a directory and the file in question is http.sys.  The trace ended at the call to create a directory through this file.  This seems to rule out hardware and point to an issue with IIS.  I first noticed the error on a constant basis when trying to migrate a DotNetNuke application, but this behavior was fixed by changing the App Poo to Legacy on the test domain and the real domain stopped halting the machine when the server admins went to investigate.  There is another test DNN application and some web applications for webmail/email administration, but we can't say definitively that they are the problem.

Does this ring a bell we haven't thought of?
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
What IIS and .Net version are you running? For 2008 you are better off running IIS 7 and .Net 3.0 and later. Make sure you download 64 bit package on the middle of the page
http://www.microsoft.com/downloads/details.aspx?familyid=10CC340B-F857-4A14-83F5-25634C3BF043&displaylang=en

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
We are using IIS 7.  When I tried to install the x64 .NET 3.0, it told me I had to use the Role Manager to install it.  I am not sure if it has already been installed or not, the Role Manager says I have ASP .NET installed, but it does not specify which version.  What is the simplest way to verify?  We only use .NET 2.0 applications.
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 1

Author Comment

by:dageyra
Comment Utility
Here is the dump analysis in case it helps.  Right now I'm trying to focus on how to get http.sys updated, if that makes any sense.  I've also disabled all non-critical websites particularly the DotNetNuke sites.
dump-analysis.txt
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Go to C:\Windows\Microsoft.NET then look to what verisons are listed.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
There are 3 folders in that location, probably Framework and Framework64 are important.  We have as follows:
- Framework: v1.0.3705, v1.1.4322, v2.0.50727, v3.0, v3.5, VJSharp
- Framework64  v2.0.50727, v3.0
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Verison 3.0 is installed.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
Right now I'm trying to focus on how to get http.sys updated, if that makes .....

Update http.sys? If you want to replace it, copy it from your windows 2008 cd under i386 folder and replace the one in system32\drivers. You can always search your C: for http.sys in other places.

You can give it a try but I don't think it's an issue. I just read your dump logs and suspect that you may have problem on firmware/driver incompatible with 64 bit. Check with Dell for any latest 64 bit firmware update packages.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Do you happen to know which driver it may be?  One of the first steps the server admins claim to have done is to update the firmware, though on what they did not specify.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
From the dump, most likely the issue happens at the communication layer, which is the process from your NIC, DRAC, IIS.... what you can do is update your driver and firmware for your NIC and DRAC to the compatible one supports 2008 and vista 64 bit. Be careful when you choose the driver and firmware. The latest driver is not always the greatest. You can contact Dell support to obtain the compatible firmware and driver. Otherwise, here is the link:
http://support.dell.com/support/downloads/driverslist.aspx?os=LHS64&osl=EN&catid=-1&impid=-1&servicetag=&SystemID=PWE_2950&hidos=WNET&hidlang=en&TabIndex=

K
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
One more thing I didn't mention was ATPI filter in IIS. Did you have any custom ISAPI? If you are not using ISAPI filter, get in IIS --> Default Web Site --> Properties --> ISAPI filter --> if you see some failed ISAPI remove it.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
I looked into the ISAPI filters, but the only two are ASP.Net_2.0.50727.0 and ASP.Net_2.0.50727-64.

I'm wondering, could it be an issue with .NET and 64-bit?  Regarding your suggestion that it could be the firmware for the NIC, how does that correlate with the CreateDirectory call in the dumb file?

I will use the Dell support link you provided to see if there are any updates, but when I checked this past weekend, there didn't seem to be any updates applicable to our system configuration.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
I looked into the ISAPI filters, but the only two are ASP.Net_2.0.50727.0 and ASP.Net_2.0.50727-64.....

Does any of them show the sign of failure? You just don't need any filter set up. You can simply remove them especially if they are failed. Let us know the status after upgrade firmware.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
I'm not sure how to tell if they have failed or not.  Where would I find this information?
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
If you see it's normal, it isn't failed. The failure appears with red x. You should the move the filter anyway since it is useless and does more harm than good. The filter always set up when you upgrade the OS, SP, or .Net from one version to other.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
An update for everyone.  The system has not encountered a BSOD since Sunday evening when I took down the DotNetNuke websites.  The plan now is to start each one and see when the server starts halting again (if it does).  Too bad the problem can't be reproduced on demand, it could speed it up a bit.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
About the ISAPI filters: I do not know much about these, using them only to allow multiple websites active at the same time on another machine.  Since they haven't failed and the server has stabilized, should I leave them alone?  I assume they are default as I wouldn't bother these.
0
 
LVL 26

Expert Comment

by:lnkevin
Comment Utility
You can leave them alone(or remove could do no harm to your system). Anyway, if the system is stable for now, don't make changes for a moment. When you have to put back additional software, make sure you carefully document and allow some time to reproduce the problem. It's a good idea to add new software once at a time and allow at least one day before the next one. For now, it looks much like a conflict between software and OS. You need some time to sort out the bad apple.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
All of the sites have been started except one that when I start and access, the machine halts.  So far, there have been no crashes.  I setup a script to run on startup that emails me when the server comes back up (I know it will create false positives for windows updates, but that is fine).  If the machine remains stable for a few days, I can show that it is the one DotNetNuke website.

Does anyone have ideas why a DotNetNuke install would halt a system?  I think it's related to IIS and becomes visible through the functions that DotNetNuke calls  upon.
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
Hello Everyone:

After doing some analysis, there has been another breakthrough.  Though I was able to narrow down the websites causing the problem, there was still a hint of randomness.  This led me to believe that cross-referencing the Event Log times and dates with IIS logs would reveal the culprits and possibly additional info.  The main site that I believed to be the issue had no log files.  I then went back to the crash dump and noticed some additional information along with the CreateDirectory call:

HTTP!UlpCreateDirectory+0xda
...
HTTP!UlQueueLoggingRoutine+0x2b
...
HTTP!UlpCreateLogFile+0x63
...
HTTP!UlLogHttpHit+0x133

I now believe that the error is not specific to any website but with IIS logging, most likely a result of incorrect permissions.  Since we have multiple virtual sites, I have configured IIS to log within the individual site directories.  Does anyone know whether this sounds plausible, and what permissions should I look for in the future if the problem persists?  I believe another of the sites has a similar problem, though it does have log files so I can't be completely sure.  I believe it suffers from the problem because when the original site was stopped, the system still failed.

Thoughts?
0
 
LVL 26

Accepted Solution

by:
lnkevin earned 500 total points
Comment Utility
For IIS, you just want to make sure the all users have R/E permission. However, you may want to make sure the .Net folder users have R/W/E permission. That's from my experience and it works most of the time.

K
0
 
LVL 1

Author Comment

by:dageyra
Comment Utility
It is not abandoned, it has been partially resolved.  With time constraints, I cannot continue testing at this point, but since part of the problem has been resolved, it may be prudent to close this question and start anew after the holidays.
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

Lync server 2013 Backup Service Error ID 4049 – After File Share Migration
When it comes to showing a 404 error page to your visitors, you do not want that generic page to show, and you especially do not want your hosting provider’s ad error page to show either. In this article, I will show you how to enable the custom 40…
This tutorial will show how to push an installation of Backup Exec to an additional server in both 2012 and 2014 versions of the software. Click on the Backup Exec button in the upper left corner. From here, select Installation and Licensing, then I…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now