Link to home
Start Free TrialLog in
Avatar of Suburb-Man
Suburb-Man

asked on

Memory Leak 2mb per hour Pool Nonpaged Bytes

Memory Leak, Pool Nonpaged Bytes 2mb per hour

SBS (Window's 2000 SP4)
  Running: ISA, SQL Server 2000 SP3a, DNS, IIS, H323 Gatekeeper, RRA, TS admin. mode.
  Not running: Exchange, DHCP.

Some stats collected:
  Server start time 10-28 5:48
  1GB Phy.Memory, 590Mb available
  5.5GB static pagefile / swapfile (c: =1500, d: =4000)
      Time Date Pool Nonpaged Bytes
      6:30 10-28 27,095,040
      6:35 10-28 27,262,976
      7:15 10-28 29,212,128
      5:20 10-29 74,764,288 Mb available=515 ~2Mb growth per hour


Eventually "The computer has rebooted from a bugcheck"...ouch.
I temporally have it scheduled to reboot gracefully every 2 days via a script that contains:
  NET.EXE pause mssqlserver, wait 60, NET.EXE stop mssqlserver, wait 60,
  TSSHUTDN.EXE 30 /restart /delay: 30 /v
  I may have to change it to daily!

SOLUTIONS Attempted: (reverse chronological order)

  Changed Reg. setting to start memory clean up of PoolUsageMax at 40% instead of 80%.
  See MS article: http://support.microsoft.com/?kbid=Q312362
  I reverted PagedPoolSize to 0 from suggested ffffffff (4,294,967,295), because multiple errors.

  Before I noticed the memory problem, I got some errors with performance counters.
  So I applied MS Q267831 ~unload dll’s and reload them.
  http://support.microsoft.com/?kbid=267831

  Applied Symantec’s patch then completely Document ID:2000050108464148,
  How to update the Symevent files, Document ID:1998092408260848
  Then “How to uninstall pcAnywhere from Windows NT/2000/XP” Document ID:1996123152913,

Tracking attempts:

  Created Performance Monitor, Counter Log, and Alert of “Pool nonpaged bytes” (PNB)
  I based them upon what I found in SBS’s Health Monitor samples.
    1 Counter: nonpaged pool, C:\PerfLogs\nonpaged_pool_0000xx.blg
    4 Alerts: PNB < 32Mb warning, PNB > 128MB, PNB > 256Mb, and Avail Mem. < 32Mb.
  Compared Taskmgr.exe and Performance Monitor.
    Taskmgr.exe > Processes added view of pooled and non-page pooled (NPP) columns.
    They don’t match Performance Monitor of Pool nonpaged bytes.
    97Mb current NPP shown in Perfmon, however
    Taskmgr.exe listed processes total NPP = 2.617Mb
    A 95Mb difference.

Solutions planned:

  http://support.microsoft.com/?kbid=130926 and http://support.microsoft.com/?kbid=177415


Any suggestions very much appreciated.



Avatar of Suburb-Man
Suburb-Man

ASKER

Whoa nobody touched this question...Wa ha ha ha.
Guess I don't blame anyone after looking at the length of the question.

Using: http://support.microsoft.com/?kbid=130926
Section:
  “Using Performance Monitor to Identify a Pool Leak”

Checked Process Handles first, glad I did.
W2k's Server's Total Handle Count of ~530,000 of all ~56 Processes.
Looked for Process with 10,000 or more Handles, found two hogs.
  SNMP = 250k
  MsgAgt (Promise RAID message Agent) = 250k.
  Stopped and Disabled MsgAgt and 250,000 handles freed,
  Stopped and Restarted SNMP and 250,000 more freed.
  Current Total Process Handle Count = 30,000.

It appears Promise's RAID Message Agent has a HUGE leak.

Size of “Pool non-paged bytes” started dropping immediately, it was up to 62MB.
The stair stepping size growth is reversing.

I tried the second section first yesterday but didn’t find it,  
  “An Alternate Method for Identifying a Process that is Leaking Memory”.
I probably thought that I could not stop Promise RAID, worried it would crash the array.
And or did not stop the SNMP cause the dependent is eventlog; I thought eventlog was needed for the event monitoring I am doing.  If I would have tried them I would have caught it yesterday.

I’ll give a final update after more testing.
Are you actually using SNMP?  If not, shut it off--have you looked at the SNMP log to see if it's actually logging errors?  Might be getting some from the raid card.

Have you looked to see if there is a driver and bios upgrade for the RAID card???
I didn't find a specific SNMP log, however the service showed that eventlog was dependant upon SMNP.
Where should I look for an SNMP log?

Anyway, I uninstalled SNMP protocol (Management and Monitoring Tools), IIS, and ISA
as you suggested in: https://www.experts-exchange.com/questions/20802304/SQL-Server-2kSP3a-Memory-problems.html
(thanks again)

Maybe I'm confused between the SNMP protocol and SNMP Service, I assume they need each other.

Doesn’t other monitoring: SBS's Health Monitor, Performance Monitor, and Network monitoring all need both the SNMP protocol and Service?

I’m concerned that the some events will not be logged now.
ASKER CERTIFIED SOLUTION
Avatar of arbert
arbert

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I did find a Promise FastTrack S150 TX4 update for both FW and Win Driver to v1.00.0.37
and Intel put out another one for the D875PBZ mainboard  BIOS v P17 too.
Install all and left SNMP uninstalled, and since promise's PAM utility didn't change I didn't bother even trying to install it.

Thanks again arbert, you've been a real pal.
Also, if you're in a production environment (and I'm sure you know this) be carefule with those driver and bios upgrades if you can't afford down time.  We've had bios upgrades that totally made network cards and other hardware stop working!!!

Glad things look good....I'm actually looking at a Promise card, have you been happy with its performance?
Yes using RAID10, see my answer https://www.experts-exchange.com/questions/20693896/Promise-SX4000-Raid-5-extremely-slow.html and the ending comment.
Also my multiple answers in:
https://www.experts-exchange.com/questions/20773946/Raid-card-or-what.html
especially the end.

Promises PAM seems to be more for remote access than anything.
All true array managment is in the PCI card's EPROM; ctrl-p during boot.

I have a Highpoint RocketRAID454 at home, RAID5 4x40GB maxtors, the XOR(write) is terrible.
70% CPU usage and 2-3Mb sustained. As apposed to RAID10's 30Mb sustained.

RAID10 is better for backup/restore of drive images, like using Ghost will work(copy and restore) mirrored drives. And RAID10 can be made from one single drive. (restored image).

The other bottleneck is in the PCI 32bit bus itself, that is why were starting to see true EISA 64bit bus systems comming out.  I worked on a 486 64bit EISA IBM OS2 Server 10 years ago, but the EISA 64bit bus didn't take. (Lots of bugs). Maybe they got them figured out now.
I was reading that the 32bit PCI true throughput is only about 130Mbps, funny how a high-speed USB is rated for 480Mbps but is connected to the same bus.  It seems Server MB have 64bit PCI buses and you will pay for it too. We have a 3 year old compaq ML530 server with two 933Mhz Xeons, 64Bit SCSI  controller RAID5 3x10Gb7200 that gets ~25Mbs read and write sustained.
Ya, I understand the RAID10 performance--we have 7terrabytes of RAID10 online at work :)

Damn, sounds like you've got quite the setup at home...I just ordered a new Dell server lastnight--they're having unbelievable deals!!!
Has anyone found an answer to the problem with the Promise RAID message agent leaking memory.  I just downloaded the latest one from ASUS for my mother board and it still leaks memory.  I am using version 3.2.1 build 11

I don't need it to run, but I would like to have it available to monitor the array.
Nope. I also found SNMP Service was leaking memory, windows 2000.
I still wonder if PAM altered SNMP Service for its monitoring needs.

I did inform Promise's Tech Support that their PAN needs to be WHQL certified, and gave them all my research into the matter.
>Promises PAM seems to be more for remote access than anything.
>All true array managment is in the PCI card's EPROM; ctrl-p during boot.
JordanNolan, since this question is closed, you should open your own.