SBS2003 server hangs with EventID 1058 starting things

nysflyboy
nysflyboy used Ask the Experts™
on
Two times in the past two weeks I have a client that has a SBS2003 server that has "hung" (see below). Examining the event log, the first abnormal event is a 1058, followed by multiple 1058 and 1030 errors.

1058: Windows cannot access the file gpt.ini for GPO CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=paulcuddycpa,DC=local. The file must be present at the location <\\paulcuddycpa.local\sysvol\paulcuddycpa.local\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\gpt.ini>. (Configuration information could not be read from the domain controller, either because the machine is unavailable, or access has been denied. ). Group Policy processing aborted.

1030: Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this.

Once this happens, no one can login to the server, including from the console. Although the logs go on, and there are no other abnormalities noted, once I try to login to the console, the GUI will freeze, and I have to eventually hit the power switch. Upon reboot, everything is always fine, with no more GPO related errors noted - for a week or so.

Both times this happened very early in the am, around 1am.

The server is configured as follows:

One NIC to the internet, running just TCP/IP
One NIC to the internal network, running F&P sharing, etc. 10.0.1.1
One NIC to a SAN (netgear) via crossover cable, running F&P sharing, etc (same as internal). 10.0.10.1

DNS entires (A record) for both 10.0.1.1 and 10.0.10.1 point to the server itself

I can ping the domain, and server by name and IP

\\domain.local brings up the domain, with all shares visible.

Server points to itself for DNS (10.0.1.1) and has a forwarder configured to the internet DNS.

I noted in a comment (BELOW) that the default domain policy is the one that is listed in the error - this is true. However I noted that folder redirection is also done in that policy, and that is false. We use a separate policy for that.

I have not disabled the browser service on all the XP workstations. About 20 of them, all XP SP3 with all current patches. Server is SP2 and is fully patched (both SBS and server).

Backup is generally running when this happens, but it runs every night without issue. Use just MSbackup with a schedule.

AV is provided by Trend (worry free business security advanced) and have been using it for years.

Only possible thing that changed, each time this happened we had added a new workstation the night or so before. New workstations were built from scratch with XP CD and patched immediately. Then put on the network, joined to domain, etc. Same hardware as all others.

I would really appreciate any help and direction, as this client is VERY busy this time of year, and coming in to find their only server down at 6am is very bad.

Thanks!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
I just wanted to add a few other things:

Power saving/suspend/etc is all disabled, server and drives are always on.

The GPO that it seems to not be able to access is the Default Domain Policy

As some of the research I have done indicates it could be a problem, I DO have folder redirection enabled for the users (in the default domain policy)
-- First i would say check the Health of the domain controller and check if there is any errors under the File replication Service .

-- Run DCDIAG and Netdiag for any potential Issues .

-- Please make sure File Replication Service should not have any errors it should show latest event as 13516 ,Also make sure netlogon and Sysvol are shared .

Thanks .
ChiefIT has written a good article about these generic events.

http://www.experts-exchange.com/articles/OS/Microsoft_Operating_Systems/Server/2003_Server/Diagnosing-and-repairing-Events-1030-and-1058.html

(if the article helps you you should give him credit -> Was this article helpful)

SG
How to Generate Services Revenue the Easiest Way

This Tuesday! Learn key insights about modern cyber protection services & gain practical strategies to skyrocket business:

- What it takes to build a cloud service portfolio
- How to determine which services will help your unique business grow
- Various use-cases and examples

Author

Commented:
Dan DCDIAG. No errors except it complains about event

     Starting test: systemlog
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   10:55:49
           (Event String could not be retrieved)
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   10:55:55
           (Event String could not be retrieved)
        An Error Event occured.  EventID: 0x00000457
           Time Generated: 01/28/2010   11:40:42
           (Event String could not be retrieved)
        ......................... PCUDDY01 failed test systemlog

These are TermSvc events - because Im remoted in and it doesnt have the Adobe PDF print driver.

Driver Adobe PDF Converter required for printer Adobe PDF is unknown. Contact the administrator to install the driver before you log in again.


Last event for FRS is 13516, no errors noted.
ITIL Problem Manager
Commented:
nysflyboy,
There are a few reasons that your error pairing will show up:
  • One is if password on the administrator account doesn't match complexity requirements (in the event that complexity requirements were changed after the password was set).  
  • Another is Windows Activation has, for whatever reason, lost its key and needs to be reactivated.  You would see an Activate Now balloon when logging in if that was the case.
  • If you recently replaced a NIC or added a new one, your NIC binding order could be wrong.  Go to Network Connections -> Advanced Settings. In the Advanced menu make sure that the 10.0.1.1 NIC is on top at the Adapters and Bindings tab.
  • Permissions on C:\windows\SYSVOL\sysvol\<domain name>\scripts might be messed up.  "Everyone" needs read access.
In my experience, the most common reason is teh NIC binding order.
Justin

Commented:
I dont think that 1030 and 1058 will cause a system to hang. They just say that the group policy could not be applied as gpt.ini was not accessible.
-- Generally hang issues occure due to drivers. Please see the device drivers if any of those need to be updated. (Especially NIC drivers, Disk drivers (SCISI or RAID) etc. )
-- Then you can run CHKDSK /F to fix any file syste errors. (See the report to fin if there were any errors).
-- CHeck NIC binding.

Regards,
Arun.

Commented:
First thing Microsoft never recommend to make any changes or configure any thing into default domain policy & default domain controller policy. If anything is defined into domain, create a new GPO,define the settings & link to the ou.

The reason is default domain & domain controller policy is system default & if changed anything into it may corrupt the policy &  only way to correct is either through backup with healthy one or dcgpofix.

The tool DCGPOFIX will reset the policy at default level wiping out any manual configured settings into that policy.

You can configure folder redirection policy into different GPO & can reset the default domain & domain controller policy using dcgpofix.

The hung can be many reason

-run chkdsk & look for disk related error & try to fix with chkdsk /f parameter.
-Check system utilization & which process is consuming memory.
-Check for BIOS,firmware & nic is updated on the server.
-Server is updated with the latest Service pack & hotfix.
-Look for ntfs file is updated.
-Check page file has been set & analyze Memory dump issue.

If you have dual NIC check for binding order.

There is tool like procmon,debugger etc.

Author

Commented:
Thank you for all the help so far. I wanted to clarify a few things:

- When I say "hung" I do not mean hung in the traditional sense. The server itself keeps chugging along, with no errors in the event logs other that those I stated. What does happen is, every client is denied access, and they basically have to reboot - as they cannot access any resources. Once they reboot, they cannot log in. On the server console, even hitting CTRL-ALT-DEL to bring up task manager grinds to a halt (GUI basically hangs) and it is not possible to diagnose anything. However, it keeps running (going by the logs after the fact). Both times this happened were in the middle of the night (1am-3am sometime).

- This machine did "recently" (6 months ago) have a new NIC installed. Intel Gig-E server adapter, which is used as the crossover adapter for the new SAN. The SAN is hosting an iSCSI volume, which has 90% of the customers data on it.

- I did not change the adapter binding order (never thought of it). Come to think of it, after moving to iSCSI, I originally had that adapter set to TCP/IP only (no client or sharing bindings). I had to re-enable file and print sharing, due to AD errors and clients being denied login randomly. I wonder if changing the binding back so that 10.0.1.1 is first will allow me to remove everything but TCP/IP from that? And perhaps solve this issue?

I have changed binding order, moving 10.0.1.1 first, 10.0.10.1 second (SAN) and internet third. Does that sound correct?

Author

Commented:
One other comment I forgot, I checked, and I was wrong. I did not make any changes to the default domain policy. I had made the changes for folder redirection in the folder redirection policy. But I may reset the default policy using that tool anyway if the situation happens again.

Chkdsk is clean on all volumes
DCDIAG is clean (after the fact, cannot run it during the hang..)
BIOS/Firmware and NIC drivers are all the latest (and have not changed in 6 months)
Server is latest SP and patched to all recent patches
Top Expert 2013
Commented:
Keep in mind this is SBS, not server std, and it has 2 very specific network configurations, single NIC, and dual NIC. The later is designed to use the SBS as a gateway for your LAN, not as a second LAN NIC. Any other configuration can cause "odd" results, and often the wizards (which are critical) may not run.

Having said that, have you tried in the DNS management console under properties of the DNS server, on the "interfaces tab" selecting ONLY the appropriate LAN adapter. This way DNS is only 'served' from that adapter.

Because the clients can reconnect after a reboot it sounds like their primary DNS server becomes unavailable for some reason briefly and it switches to an alternate. The PC's DNS will may not try the primary again so long as the alternate is available, except on reboot. Make sure the clients receive ONLY the 1 SBS LAN adapter IP as a DNS entry whether through DHCP or static.

Are there any other NIC's present such as a Virtual adappter (VMWare/Hyper-V,Virtual server), or a virtual VPN adapter?

Commented:
Yes,binding order do have sense.

The NIC is used for LAN but as you have ISCI initiator card attached with ypur server,the reception & transmission will happen over that which is must faster & gives the server feel as data is locally existing.

Please follow the binding order Nic first,san second & last internet

As you said pressing control alt delete hangs the server this is also a type of server hang only & you have to notice actually which application is using most of the memory.

Did you install any new software?
Can you check the server for any virus or spyware,sometime AV don't detects them.

Use hackjack to find which application is consuming most of the memory.

Author

Commented:
OK, thanks again for the additional clarification (everyone).

As I recall, when the SAN was added, and the addtitoinal NIC, initially I bound only TCP/IP to the iSCSI NIC. I had to allow the DNS server to work on that NIC as well, or I had clients randomly dropping. This now makes sense, as the first binding was the SAN NIC, not the primary client NIC.

We rebooted the box this am, to effect the new binding order. So far, all seems good. I will test tomorrow removing the DNS server from the SAN NIC. If that works, I will remove the file & print sharing binding from that NIC as well and test.

There are NO additional NICs in this box, virtual or otherwise.

Have not installed any new software on the server recently, other than new AV (same brand, version update) several months ago.

Author

Commented:
Thanks, It appears to be stable for a week now and I have to attribute it to binding order and DNS.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial