Solved

Nonpaged Pool on 2003 Server is empty.

Posted on 2006-06-21
4
5,781 Views
Last Modified: 2012-06-21
Hello,

We have recently begun to have some problems with our 2003 Server DC.  The Server was working just fine but over the past week or so we have begun to have to forcefully reset the Server nearly every day.

The problem is first realised when our Insight Management System reports that the server is no longer responding to pings.  We then visit the Server to find that the machine appears to be booted - but when we try to unlock the console it reports that it is unable to do so because insufficient resources are available.  In the end the only way to reboot the server is:

i) Forcefully reset using the power button.
ii) Use the ILO to remotely connect and force a warm reboot.

Upon rebooting the Server and checking the Event Log we find the following area has been logged many many times prior to the machine becoming non-responsive:

Source: Srv
Category: None
Event ID: 2019
Type: Error
Description: The server was unable to allocate from the system nonpaged pool because the pool was empty.

As an example the first of these errors was recorded at 02:36AM this morning.  I then received an error from the Insight Manager at 02:43AM saying that the Server was no longer responding.  Looking further on through the even log we then see these errors:

Source: NAVAP
Category: None
Event ID: 1001
Type: Warning
Description: System memory is running very low.  Norton AntiVirus Realtime Protection may not be able to function properly.

Source: Application Popup
Category: None
Event ID: 333
Type: Error
Description: An I/O operation initiated by the Registry failed unrecoverably. The Registry could not read in, or write out, or flush, one of the files that contain the system's image of the Registry.

However I believe that these errors are both related to a lack of memory on the system so if we can resolve these issues then we shouldn't see these messages again.  The message from Srv then repeats itself every 60 seconds whilst the one raised by Application Popup occurs every 20 seconds or so.

Of course it seems that something is consuming the nonpaged pool but the strange this is that this issue always seems to occur at the same time in the middle of the night.  Here are a few of the things we have done.

1. Disabled the Volume Shadow Copy and Microsoft Software Shadow Copy Provider Services.  This is in relation to a KB I read about backup software causing problems with the VSC Service which resulted in a memory leak which consumed the non-paged pool.

2. Disabled the backup and all ArcServe related Services (using BrightStor 11 IIRC).

3. Set up PerfMon to collect statistics from Memory\Pool Nonpaged Allocs and Memory\Pool Nonpaged Bytes.  Also set up monitoring of Process\Pool Nonpaged Bytes.  These counters when graphed in Perfmon don't show anything of interest - but i'm using selected processes so if the process causing the problem is spawned at night then it could be that Perfmon is missing it.

4. Created a scheduled task to run PoolMon and dump the output to file every 15 minutes.  It seems that a Tag called THRE is taking up significant amount of non-paged pool space and it gradually increases in size throughout the day.  Just prior to the problem this morning the Tag had a byte size which equated to 202MB.

I'm running out of ideas so any suggestions which can be offered would be greatly apprecaited.

Pete
0
Comment
Question by:peterjwest
4 Comments
 

Expert Comment

by:sofestibeest
ID: 16950465
Hi,

It could be a memory leak.
Memoery leaks can have this effect on any machine.

Check what updates or apps were least installed.
Perhaps an update is causing the problem.

Second, you can use msconfig and find the tag and set it to prevent starting during bootup.

Goodluck

0
 

Author Comment

by:peterjwest
ID: 16960014
I've ended up using MemTriage to diagnose the problem and it looks like the issue related to a process called HPBPRO.EXE which relates to an HP Printer Driver.

We have now obtained a fix from the HP Website and initially we were seeing a lot of DCOM Errors in the system log.  This pointed to one of our technical staff who also had the driver installed locally - he has now removed the driver and instead of seeing a long-lived instance of the process it appears for a short period and then vanishes again.

We are now seeing that the process consumes very little memory and the problem with it creating huge numbers of threads has also ceased.

I'll leave this question open for a while longer just to confirm the issue has definately been resolved - but at the moment it's looking hopeful.
0
 
LVL 1

Accepted Solution

by:
DarthMod earned 0 total points
ID: 17160253
PAQed with points refunded (500)

DarthMod
Community Support Moderator
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This may not be a text book method to resolve VSS backup issues but it seemed to have worked on few of the Windows 2003 servers we had issues while performing a Volume Shadow Copy backup. If you have issues while performing a shadow copy backup usin…
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question