Server hangs, Event ID 1030 and 1053

Hi,
I have encountered two problems on one of my clients server that is running Win 2003 R2 Std, SP2.

The first problem is:
I get one or two error with Event ID 1030:
“Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this.”
After that I get several errors with Event ID 1053:
“Windows cannot determine the user or computer name. (Not enough storage is available to complete this operation. ). Group Policy processing aborted.”
I have 55GB free space on C:, 4GB RAM, a 6GB Swapfile. And when the error has occurred, I have checked the used memory, and it is just under 2GB. The only disk that is close to be out of memory is an USB-drive that is used for a secondary backup every night. (Main backup is a tape-drive)

When I get this errors I have noticed the following things on the server:
•      When I try to open “Active Directory Users and Computers” I get an error message:
“Naming information cannot be located for the following reason: The server is not operational.”
•      It is not possible to reach any homepage through ie, but I can ping the webserver by its name (ping www.download.com)
•      A Business program using MS SQL Express, does not work. When trying to use it I get error 18456 with the MSSQL as Source:
“Login failed for user…”
When I restart the server everything works fine. Most of the time for one week.

The second problem is:
The server freezes and it does not respond to ping. This error has always happened during the weekends, when I only have remote access to the server, so I have not been able to check if Caps-lock works on the keyboard. It has Fujitsus built in remote access though (IRMC Advanced pack), so I have been able to remote-view the screen, and the screen saver is locked and it does not respond to Ctrl+Alt+Del. After a hard Reset from IRMC, the server starts up as usual and works again.

One strange part about both this problems are that they always occur outside office hours, and almost all the times they have specifically happened on weekends. The only thing that I have found, that is different on the weekends, is that Backup Exec is only running backups to the USB-drive. Mon-Fri it also makes backup to tape as well.
The second problem started to occur a couple of weeks after the first problem occurred for the first time. As these problems almost started to happen at the same time, they seem related, but I obviously don’t know that for a fact. Before encountering this problems, the server had been working for 2 years without any problems. The only thing that has changed for the last couple of mouths, is that the Business program frequently gets updates.

The system:
PRIMERGY TX200 S4, running Win 2003 R2 Std, SP2. This is the only server on the network.
BackupExec 12
Norman Virus Control
A Business program using MS SQL Express
LanSafe (for the UPS)

Some of the things I have tried so far:
•      Run Windows update (just to see if I got lucky and the error was overwritten buy the updates)
•      Moved and increased the pagefile
•      Updated BIOS
•      Scanned for viruses

Best Regards
Andreas
-andreas-Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

-andreas-Author Commented:
Hi Shree,
Thanks for your quick answer.

I only have one network adapter in the server though.
The File and Printer Sharing for Microsoft Network is checked
I only have one Server, so I guess that the rest of the things mentioned in the other thread don’t apply either.

Worth mentioning is that two years ago, when this server was new, the AD was migrated to this server from the old one. But as I mentioned, since then it has worked without any problem until about a month ago.

Best Regards
Andreas
Shreedhar EtteCommented:
check the Directory Services event log to check for any errors. If error event id 13568 present and Journal Wrap Error.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

-andreas-Author Commented:
No errors at all in Directory Services.
13568 do occur though in File Replication Service each time I restart the server. As you mentioned in the other thread, it says that I should change a register value, and then it explains the following:
"Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery steps to be taken to automatically recover from this error state.
 [1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set."
...
 [2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will trigger a full tree sync for the replica set.

Since I don't have any other server in the network, is the server able to do this all by itself?

Two other errors that I should have mentioned earlier is in DNS Server event log. They occur when the server is in its "1030 & 1053 state":
404:
The DNS server could not bind a Transmission Control Protocol (TCP) socket to address 0.0.0.0.  The event data is the error code.  An IP address of 0.0.0.0 can indicate a valid "any address" configuration in which all configured IP addresses on the computer are available for use.
Restart the DNS server or reboot the computer.
408:
The DNS server could not open socket for address 0.0.0.0.
Verify that this is a valid IP address for the server computer.  If it is NOT valid use the Interfaces dialog under Server Properties in the DNS Manager to remove it from the list of IP interfaces.  Then stop and restart the DNS server. (If this was the only IP interface on this machine and the DNS server may not have started as a result of this error.  In that case remove the DNS\Parmeters\ ListenAddress value in the services section of the registry and restart.)

These two errors occur several times until the server is restarted.
Shreedhar EtteCommented:
For event id 13568: The server will do this by itself.

Note: Do take the backup of the SYSVOL folder and System State.
-andreas-Author Commented:
Now it’s panic…

Last night I did the change you suggested. I got to the following step in your list:
13553 – The DC is performing the recovery process.
But then I did not see any other message for a while, so I thought that I would wait until today and see what happened.

What has happened is that no one can connect to any share on the server, and when I try to connect through RDP to the server I get the error message:
"The specified domain either does not exist or could not be contacted"

Right now I am not able to try and log in locally, but I will in a while, but I guess (hope) that will be possible.
What has happened? What should I do?
I have not tried to restart the server yet, should I try to do that or maybe restore something from the backup first??
-andreas-Author Commented:
Some additional information:
It is possible to connect to the shares through ip address, but not by namn (\\192.168.10.10 works).

The server asks for username and password when trying to connect to the IP though, and if I use the same that I used for logging in to the computer it says that it has already tried that one, but if I try to use a different user that exist in the domain it works to connect.
-andreas-Author Commented:
A couple of things solved.

After logging in localy on the server, I made the WORKAROUND mentioned in http://support.microsoft.com/kb/902336.
Then I was able to log in through RDP again.

After that I used the "BurFlags registry key to reinitialize File Replication Service replica sets"
http://support.microsoft.com/kb/290762
I used D4 (authoritative mode restore) since I have a single DC.
After that everyone could log on again and use the shares and I didn't get the usual error when restarging the ntfrs.

I don't know if this was the right way of solving the problem, but I ran out of time, and had to do something fast.

One problem now is that I still don't see the Netlogon share that disappeared when I made the "Enable Journal Wrap Automatic Restore"-fix.
Now I wonder how to get that back, and I also wonder if my D4-solution has caused any other problems that I just have not seen yet?
Shreedhar EtteCommented:
Hi Andreas,

I am sorry i could not replay to your comments. As my internet connection was down.

Till now you have taken proper steps.

For Netlogon Share Missing refer this:
http://www.experts-exchange.com/Operating_Systems/Windows_Server_2003/Q_21494487.html

Note: Do backup the System State once.

Hope this helps,
Shree
-andreas-Author Commented:
Hi Shree,
I understand, anyway I’m happy that you help when you can.

Without making any more changes I restarted the server now this weekend. Then I got several errors starting with an error that said that the SCRIPTS folder was missing which was kind of logical. Therefore I then made the change that you suggested above.
After the restart I still got a few errors very similar to the errors I got in the beginning:

1058:
Windows cannot access the file gpt.ini for GPO CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=xxx,DC=yyy,DC=zzz. The file must be present at the location <\\xxx.yyy.zzz\sysvol\ xxx.yyy.zzz\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\gpt.ini>. (The system cannot find the path specified. ). Group Policy processing aborted.
1030:
Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this.

Should I create a new group-policy or something? I guess that it is gone after the fixes we have made. I had not made many changes to the old group policy, most part of it was just default.

Best Regards
/Andreas


Shreedhar EtteCommented:
Hi,

- Under \\xxx.yyy.zzz\sysvol\ xxx.yyy.zzz path does Policies and Scripts folder exists with the content in it?

- Check does the Netlogon Service has started.

- Restart Netlogon once and Run gpupdate /force

- Run netdiag/fix and dcdiag/fix

---------
Shree

Shreedhar EtteCommented:
Also check doe the Event id 13568 is occurring in File Replication Service event logs.
-andreas-Author Commented:
Hi,

No, I created the Scripts folder as you suggested, but the Policies folder also disappeared when I first did the "Enable Journal Wrap Automatic Restore" fix.
I have the old folders backed up though. Should I just copy them back? I´m just afraid that I will also get some of the problems back.

The 13568 error disappeared with the "Enable Journal Wrap Automatic Restore" fix.

I have now tried to make the fixes mentioned above. Seems both netdiag and dcdiag passed most of the tests exept for dcdiag systemlog test:
      Starting test: systemlog
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:42
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:42
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:43
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:43
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:44
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 04/25/2010   19:52:44
            (Event String could not be retrieved)
         ......................... Server1 failed test systemlog  

The 1030 and 1058 is still there though, and no Policies folder does exist.

Best Regards
Andreas
Shreedhar EtteCommented:
Hi,

As you have backup of the Policies. Take the below action:
- Stop the File Replication Service
- Copy the Policies %SystemRoot%\Sysvol\Sysvol\DomainDirectory\Policies

Reboot the server.

It should fix the problem.

---------
Shree
-andreas-Author Commented:
Hi,

Seems like the server is getting less errors. After I recovered the Policies and restarted I don´t get the 1030 and 1058 errors any more. I got two other errors though, and the restart took about 40 minutes.
First 3012:
The performance strings in the Performance registry value is corrupted when process Performance extension counter provider. BaseIndex value from Performance registry is the first DWORD in Data section, LastCounter value is the second DWORD in Data section, and LastHelp value is the third DWORD in Data section.
Then 3011:
Unloading the performance counter strings for service WmiApRpl (WmiApRpl) failed. The Error code is the first DWORD in Data section.
Then after thouse two errors I get one with type: Information with the same source (LoadPref):
1000:
Performance counters for the WmiApRpl (WmiApRpl) service were loaded successfully. The Record Data contains the new index values assigned to this service.

Best Regards,
Andreas
Shreedhar EtteCommented:
Hi,

That's looks good.

As the event id 1000 occurred for the WmiApRpl (WmiApRpl) service were loaded successfully. No need to worry about event id 3012 and 3011.

-------
Shree
-andreas-Author Commented:
Hi again,

This is a nightmare.. This morning the server froze again.
I don't know if this could have anything to do with the restore of Policies I made last night, or that this part of the old problem is not just solved yet.

The last week I had enabled CrashOnCtrlScroll to be able to force the hanged system to crash and generate a crashdump, but the server did not respond to the key combination. Therefore I tried the NMI-button. That one generated a Bluescreen, but unfortunately no minidump. I now just learned that I have to enable the NMI dump as well in the registry. Now I have done that change in case that this freeze occurs again.

The restart showed one interesting error in the eventlog though: 13568 again. Maybe this is just because of the freeze?

The only clues I can think of right now are that the eventlog did not show any events after 07:11 this morning. BackupExec was running a backup at the time. The Job log in BackupExec tells me that it was verifying D: which is just DATA on at 7:15:20 AM. Then there is nothing more in the BackupExec log either.

In the weekdays we have two backup jobs running. The tape-backup is starting at 23:30 and runs until about 5:30AM, then because the data has grown since first installed, it waits for a second tape to be inserted. At 5:00AM a USB backup job is started and is running for 3 hours. I know something has to be done about this, I don’t like the fact that it nowadays is running two jobs parallel and that we have to change tape to get the whole backup, but right now I have put all my energy in solving the things discussed in this thread instead.
The Interesting part is that this freeze has only occurred in the weekends before, at approximately  the same time. The difference in the weekends is that there is no tape-backup running.
Tonight, actually it was just one job running since I Interrupted the tape-backup job when restarting the server.
I don’t know if this has to do with anything or just a coincidence though.

Best Regards
/Andreas
Shreedhar EtteCommented:
Hi,

- The Server Freezing issue is not related to event id 1030 and 1053.
- Excluded the SYSVOL, NTDS, Exchange directories for the Antivirus Scanning.
- Run SBS 2003 Best Practice Analyzer and fix the errors reported.
- Also Run Exchange 2003 Best Practice Analyzer and fix the errors reported.
- Run chkdsk in read-only mode on the disk. If any errors reported, then backup the entire data on the server and run chkdsk /f to fix the errors.
- Check with your hardware vendor for any updates available for server BIOS, RAID controller and NIC drivers.

Hope this helps,
Shree
-andreas-Author Commented:
Hi Shree,

I don't have SBS, we run 2003 Standard with no Exchange. Is there any other "Best Practice Analyzer" that I shoud run then?

/Andreas
Shreedhar EtteCommented:
Best Practice Analyzer not avaibale for the Window 2003 Server.

- What is the role of the server?
- How much RAM installed on the server?
- Does this also Global Catalog server?
- Which is the Antivirus Application installed?

----------
Shree
-andreas-Author Commented:
The server is the only server in the network. Two years ago the old servers AD was migrated to this one.
- It is running 2003 R2 Standard SP2
- It is configured with the following roles:
File Server, Printer Server, Domain Controller (Active Directory), DNS Server, DHCP Server, Wins Server
- It is also the Global Catalog Server
- 4GB RAM
- Hardware RAID 5 (LSI MegaRAID based).
- Antivirus Application: Norman Virus Control
- It also runs a Business program using MS SQL Express
- BackupExec 12

/Andreas
Shreedhar EtteCommented:
Hi,

- Does the /3GB and /USERVA switch applied in the boot.ini? If yes, then remove it and reboot the server. This is not recommended as the server is DC and GC.

- Configure the Page file with initial and maximum size equals to 492. This will reduce the page file fragmentation.

- Run the chkdsk in read-only mode on all the drives. If it reports any errors, then backup the entire data on the server and run chkdsk /f to fix the errors.

- Update the server BIOS, RAID and NIC Drivers. If update is available.

- Also update the Backup Exec and Windows with latest Service Pack.

Hope this helps,
Shree
-andreas-Author Commented:
Hi,

Thanks for all your help Shree,

The main problem that seems to have caused everything, was found when I ran chkdsk. Both partitions on the RAID5 were good, and had no problems at all. But, for the USB-backup drive it took 40 minutes to get to 1% in the first stage, and at the same time it reported an error. I then disconnected the drive and deleted the job in BackupExec.
To me this seems quit strange, that an USB drive, only used for backup could cause these many errors, but since then non of the problems have occurred.  

The first fix for the 13568 error to set the "Enable Journal Wrap Automatic Restore" registry parameter to 1, did not seem to be the best way of solving my problem though, since I got some problems from that. Se my post at 22/04/10 01:38 AM.
When I got the error after next crash I did only the D4 - "BurFlags registry key to reinitialize File Replication Service replica sets"
http://support.microsoft.com/kb/290762 
That fix also made the error-message go away with no other problems occurring. Of course I don’t know though if that fix would have been sufficient the first time.

Thanks again Shree. Without your help I don’t know how I would have solved all the problems.

Best Regards,
Andreas
-andreas-Author Commented:
This was a whole series of problems that had to get solved. These had haunted me for a long time, and thanks to Shree I finally got them solved!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.