Link to home
Start Free TrialLog in
Avatar of nysflyboy
nysflyboy

asked on

SBS2003 server hangs with EventID 1058 starting things

Two times in the past two weeks I have a client that has a SBS2003 server that has "hung" (see below). Examining the event log, the first abnormal event is a 1058, followed by multiple 1058 and 1030 errors.

1058: Windows cannot access the file gpt.ini for GPO CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=paulcuddycpa,DC=local. The file must be present at the location <\\paulcuddycpa.local\sysvol\paulcuddycpa.local\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\gpt.ini>. (Configuration information could not be read from the domain controller, either because the machine is unavailable, or access has been denied. ). Group Policy processing aborted.

1030: Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this.

Once this happens, no one can login to the server, including from the console. Although the logs go on, and there are no other abnormalities noted, once I try to login to the console, the GUI will freeze, and I have to eventually hit the power switch. Upon reboot, everything is always fine, with no more GPO related errors noted - for a week or so.

Both times this happened very early in the am, around 1am.

The server is configured as follows:

One NIC to the internet, running just TCP/IP
One NIC to the internal network, running F&P sharing, etc. 10.0.1.1
One NIC to a SAN (netgear) via crossover cable, running F&P sharing, etc (same as internal). 10.0.10.1

DNS entires (A record) for both 10.0.1.1 and 10.0.10.1 point to the server itself

I can ping the domain, and server by name and IP

\\domain.local brings up the domain, with all shares visible.

Server points to itself for DNS (10.0.1.1) and has a forwarder configured to the internet DNS.

I noted in a comment (BELOW) that the default domain policy is the one that is listed in the error - this is true. However I noted that folder redirection is also done in that policy, and that is false. We use a separate policy for that.

I have not disabled the browser service on all the XP workstations. About 20 of them, all XP SP3 with all current patches. Server is SP2 and is fully patched (both SBS and server).

Backup is generally running when this happens, but it runs every night without issue. Use just MSbackup with a schedule.

AV is provided by Trend (worry free business security advanced) and have been using it for years.

Only possible thing that changed, each time this happened we had added a new workstation the night or so before. New workstations were built from scratch with XP CD and patched immediately. Then put on the network, joined to domain, etc. Same hardware as all others.

I would really appreciate any help and direction, as this client is VERY busy this time of year, and coming in to find their only server down at 6am is very bad.

Thanks!
Avatar of nysflyboy
nysflyboy

ASKER

I just wanted to add a few other things:

Power saving/suspend/etc is all disabled, server and drives are always on.

The GPO that it seems to not be able to access is the Default Domain Policy

As some of the research I have done indicates it could be a problem, I DO have folder redirection enabled for the users (in the default domain policy)
-- First i would say check the Health of the domain controller and check if there is any errors under the File replication Service .

-- Run DCDIAG and Netdiag for any potential Issues .

-- Please make sure File Replication Service should not have any errors it should show latest event as 13516 ,Also make sure netlogon and Sysvol are shared .

Thanks .
ChiefIT has written a good article about these generic events.

https://www.experts-exchange.com/articles/OS/Microsoft_Operating_Systems/Server/2003_Server/Diagnosing-and-repairing-Events-1030-and-1058.html

(if the article helps you you should give him credit -> Was this article helpful)

SG
Dan DCDIAG. No errors except it complains about event

     Starting test: systemlog
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   10:55:49
           (Event String could not be retrieved)
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   10:55:55
           (Event String could not be retrieved)
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   11:40:42
           (Event String could not be retrieved)
        ......................... PCUDDY01 failed test systemlog

These are TermSvc events - because Im remoted in and it doesnt have the Adobe PDF print driver.

Driver Adobe PDF Converter required for printer Adobe PDF is unknown. Contact the administrator to install the driver before you log in again.


Last event for FRS is 13516, no errors noted.
ASKER CERTIFIED SOLUTION
Avatar of Justin Owens
Justin Owens
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I dont think that 1030 and 1058 will cause a system to hang. They just say that the group policy could not be applied as gpt.ini was not accessible.
-- Generally hang issues occure due to drivers. Please see the device drivers if any of those need to be updated. (Especially NIC drivers, Disk drivers (SCISI or RAID) etc. )
-- Then you can run CHKDSK /F to fix any file syste errors. (See the report to fin if there were any errors).
-- CHeck NIC binding.

Regards,
Arun.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you for all the help so far. I wanted to clarify a few things:

- When I say "hung" I do not mean hung in the traditional sense. The server itself keeps chugging along, with no errors in the event logs other that those I stated. What does happen is, every client is denied access, and they basically have to reboot - as they cannot access any resources. Once they reboot, they cannot log in. On the server console, even hitting CTRL-ALT-DEL to bring up task manager grinds to a halt (GUI basically hangs) and it is not possible to diagnose anything. However, it keeps running (going by the logs after the fact). Both times this happened were in the middle of the night (1am-3am sometime).

- This machine did "recently" (6 months ago) have a new NIC installed. Intel Gig-E server adapter, which is used as the crossover adapter for the new SAN. The SAN is hosting an iSCSI volume, which has 90% of the customers data on it.

- I did not change the adapter binding order (never thought of it). Come to think of it, after moving to iSCSI, I originally had that adapter set to TCP/IP only (no client or sharing bindings). I had to re-enable file and print sharing, due to AD errors and clients being denied login randomly. I wonder if changing the binding back so that 10.0.1.1 is first will allow me to remove everything but TCP/IP from that? And perhaps solve this issue?

I have changed binding order, moving 10.0.1.1 first, 10.0.10.1 second (SAN) and internet third. Does that sound correct?
One other comment I forgot, I checked, and I was wrong. I did not make any changes to the default domain policy. I had made the changes for folder redirection in the folder redirection policy. But I may reset the default policy using that tool anyway if the situation happens again.

Chkdsk is clean on all volumes
DCDIAG is clean (after the fact, cannot run it during the hang..)
BIOS/Firmware and NIC drivers are all the latest (and have not changed in 6 months)
Server is latest SP and patched to all recent patches
SOLUTION
Avatar of Rob Williams
Rob Williams
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes,binding order do have sense.

The NIC is used for LAN but as you have ISCI initiator card attached with ypur server,the reception & transmission will happen over that which is must faster & gives the server feel as data is locally existing.

Please follow the binding order Nic first,san second & last internet

As you said pressing control alt delete hangs the server this is also a type of server hang only & you have to notice actually which application is using most of the memory.

Did you install any new software?
Can you check the server for any virus or spyware,sometime AV don't detects them.

Use hackjack to find which application is consuming most of the memory.
OK, thanks again for the additional clarification (everyone).

As I recall, when the SAN was added, and the addtitoinal NIC, initially I bound only TCP/IP to the iSCSI NIC. I had to allow the DNS server to work on that NIC as well, or I had clients randomly dropping. This now makes sense, as the first binding was the SAN NIC, not the primary client NIC.

We rebooted the box this am, to effect the new binding order. So far, all seems good. I will test tomorrow removing the DNS server from the SAN NIC. If that works, I will remove the file & print sharing binding from that NIC as well and test.

There are NO additional NICs in this box, virtual or otherwise.

Have not installed any new software on the server recently, other than new AV (same brand, version update) several months ago.
Thanks, It appears to be stable for a week now and I have to attribute it to binding order and DNS.