Link to home
Start Free TrialLog in
Avatar of BitsBytesandMore
BitsBytesandMoreFlag for United States of America

asked on

Server 2003 Standard Goes Offline Daily

Hello Experts, I need help.... I am so saturated with this issue I cannot think anymore.
My customer has a small domain: 1 MS-Server 2003 standard - 7 Worstations. Only application running on it is Quickbooks Enterprise 2009. The server roles are domain controller, file server, and DNS. It DOES NOT have an Exchange Server.

The server has started about a week ago to go offline. Users can still connect to the shares and work but I cannot log onto the server. Eventually it locks up and users start getting error they have no written down. No errors in the event log except for the win32time which self corrects and some Quickbooks errors that have always been there (apparently Quickbooks does not provide enough info for the event log to register the reason) and have never caused any issue in the past 2 years.

Once it goes offline I have no choice but to hard boot it. I have tried remotely accessing it unsuccessfully.

I ran a dcdiag and this is the result:

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\SERVER
      Starting test: Connectivity
         ......................... SERVER passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\SERVER
      Starting test: Replications
         ......................... SERVER passed test Replications
      Starting test: NCSecDesc
         ......................... SERVER passed test NCSecDesc
      Starting test: NetLogons
         ......................... SERVER passed test NetLogons
      Starting test: Advertising
         ......................... SERVER passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SERVER passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SERVER passed test RidManager
      Starting test: MachineAccount
         ......................... SERVER passed test MachineAccount
      Starting test: Services
         ......................... SERVER passed test Services
      Starting test: ObjectsReplicated
         ......................... SERVER passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SERVER passed test frssysvol
      Starting test: frsevent
         ......................... SERVER passed test frsevent
      Starting test: kccevent
         ......................... SERVER passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x80001778
            Time Generated: 11/02/2009   16:14:26
            Event String: The previous system shutdown at 4:09:21 PM on
         An Error Event occured.  EventID: 0xC1010020
            Time Generated: 11/02/2009   16:16:06
            Event String: Dependent Assembly Microsoft.VC80.MFCLOC could
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:06
            Event String: Resolve Partial Assembly failed for
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:06
            Event String: Generate Activation Context failed for
         An Error Event occured.  EventID: 0xC1010020
            Time Generated: 11/02/2009   16:16:59
            Event String: Dependent Assembly Microsoft.VC80.MFCLOC could
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:59
            Event String: Resolve Partial Assembly failed for
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:59
            Event String: Generate Activation Context failed for
         An Error Event occured.  EventID: 0xC1010020
            Time Generated: 11/02/2009   16:16:59
            Event String: Dependent Assembly Microsoft.VC80.MFCLOC could
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:59
            Event String: Resolve Partial Assembly failed for
         An Error Event occured.  EventID: 0xC101003B
            Time Generated: 11/02/2009   16:16:59
            Event String: Generate Activation Context failed for
         ......................... SERVER failed test systemlog
      Starting test: VerifyReferences
         ......................... SERVER passed test VerifyReferences

   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidatio

      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom

   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidatio

      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : mydomain
      Starting test: CrossRefValidation
         ......................... mydomain passed test CrossRefValidatio

      Starting test: CheckSDRefDom
         ......................... mydomain passed test CheckSDRefDom

   Running enterprise tests on : mydomain.local
      Starting test: Intersite
         ......................... mydomain.local passed test Intersite
      Starting test: FsmoCheck
         ......................... mydomain.local passed test FsmoCheck
Avatar of Neale Williams
Neale Williams
Flag of Australia image

What antivirus are you using? You could have a memory leak, especially if it is McAfee.
Avatar of BitsBytesandMore

ASKER

No McAfee...... this is taboo for me ...... I have just uninstalled 2 days ago the TrendMicro and left it with Clamwin free just while I test..... it keeps on going off line...
are you getting any application errors in the event log?
ASKER CERTIFIED SOLUTION
Avatar of Neil Russell
Neil Russell
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
nealerocks: no errors ... but notice above in the dcdiag "........ SERVER failed test systemlog...." so it could be part of the reason why I am not getting errors...
Hey Neilsr :-) All updates are current ..... I'll take a look at the article... the strange thing...it has never needed it before....so why all of a sudden .... now it needs it just out of the blue or it hangs?
No errors on the .net framework.... all updates are current with .net..
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
nealerocks: yes.... actually it is one of the cleanest system logs I've ever seen (It's a Dell PowerEdge something...can't remember).... they bought it 3 years ago.... it's been a sweatheart to support.... never....never any issues... application logs totally clean and system logs (except the Quickbooks errors: SidebySide error which have been there from day 1 and intuit explains it's normal)..... they have never been a problem before but maybe....this is the error:

Event Type: Error
Event Source: SideBySide
Event Category: None
Event ID: 59
Date:  10/30/2009
Time:  5:08:02 PM
User:  N/A
Computer: SERVER
Description:
Generate Activation Context failed for C:\WINDOWS\WinSxS\x86_Microsoft.VC80.MFC_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_DEC6DDD2\MFC80.DLL. Reference error message: The referenced assembly is not installed on your system.
.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
 
Did you click on the link? Did it give you an idea of what the error is about?
I dont know a lot about that error but if the server is going offline then you may have some hardware issues. I thought a memory leak was a possibility but i think you would get more errors logged.
I'm avoiding to introduce new variables.... I could update the to the latest network card drivers but I don't understand why they would start misbehaving out of the blue.... the've been working flawlessly for 3 years....
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks lrbarrios, I'm connecting remotely right now and I'm planning on testing memory and hard drives tomorrow but they will have a fit because they can't work while I'm testing and they work by shifts 24 hours.......
Avatar of lrbarrios
lrbarrios

Yes, it's an inconvenience to the users, but when they understand that it's necessary to isolate the problem (and that you're doing your best to help them) they'll probably be less hostile.  The alternative is to continue to have daily problems.  :)  You might get lucky when you get into the BIOS on the RAID controller (if you've got one) and find that it has reported the faulty drive(s) in its log.  I would still test all of the drives anyway.  I'm interested to see what you find.
Ok guys... I'm back .... I was doing some cleanup of the events log to eliminate variables.... I only have the Quickbooks Enterprise error that everyone seems to be fighting with at the Quickbooks forums (but I'm not really worried about it at this time since Intuit support says to ignore it):
Event Type: Error
Event Source: QuickBooks
Event Category: Error
Event ID: 4
Date:  11/4/2009
Time:  8:31:28 AM
User:  N/A
Computer: Server
Description:
An unexpected error has occured in "QuickBooks":
Got unexpected error 5 in call to NetShareGetInfo for path \\server\Quickbooks\MyQuickbooksCompanyName.QBW
 
The server keeps going offline. This is the latest dcdiag:
 

Domain Controller Diagnosis
Performing initial setup:
   Done gathering initial info.
Doing initial required tests
   Testing server: Default-First-Site-Name\SERVER
      Starting test: Connectivity
         ......................... SERVER passed test Connectivity
Doing primary tests
   Testing server: Default-First-Site-Name\SERVER
      Starting test: Replications
         ......................... SERVER passed test Replications
      Starting test: NCSecDesc
         ......................... SERVER passed test NCSecDesc
      Starting test: NetLogons
         ......................... SERVER passed test NetLogons
      Starting test: Advertising
         ......................... SERVER passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SERVER passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SERVER passed test RidManager
      Starting test: MachineAccount
         ......................... SERVER passed test MachineAccount
      Starting test: Services
         ......................... SERVER passed test Services
      Starting test: ObjectsReplicated
         ......................... SERVER passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SERVER passed test frssysvol
      Starting test: frsevent
         ......................... SERVER passed test frsevent
      Starting test: kccevent
         ......................... SERVER passed test kccevent
      Starting test: systemlog
         ......................... SERVER passed test systemlog
      Starting test: VerifyReferences
         ......................... SERVER passed test VerifyReferences
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   Running partition tests on : mydomain
      Starting test: CrossRefValidation
         ......................... mydomain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... mydomainpassed test CheckSDRefDom
   Running enterprise tests on : mydomain.local
      Starting test: Intersite
         ......................... mydomain.local passed test Intersite
      Starting test: FsmoCheck
         ......................... mydomain.local passed test FsmoCheck
 
All tests are passing.... I am not getting any other event error.... It just goes offline and you cannot log into it unless you do a hard boot but it does keep managing the users and providing the shares without any other problems.
Any ideas?
The problem turned out to be a bad router..... it would hang the server when it was trying to do run ntbackup to another remote computer (at this point, in retrospect, I realize it was probably not really hanging but just very busy to allow me to login - If I would have waited maybe several hours ....lol....at some point I would have got a login screen).

The trick that gave it away was while working on cleaning up the event logs, I saw the ntbackup pop-up and never progress...... the strange part is that it did not log the ntbackup event as failed (my guess is that we never gave it a chance to finally fail - it was set to retry for 72 hours and then fail)....

I am awarding points because of the moral support. When you are "in the box" .... advice from "out of the box" ... helps you think more clearly and this is what I feel I got from you: by forcing me to address and tackle the issues that Intuit had told me to "ignore", I was able to be working on something else that allowed me to "witness" the issue....

Thanks guys.