Pervasive ntdbsmgr.exe not freeing up memory

HouseofFara
HouseofFara used Ask the Experts™
on
We have a Pervasive database running AccPac.   I use RBase with ODBC connections connecting to the Pervasive database to a great extent.  Both to retrieve and update data in the Pervasive database.

This has been working for 3 years without any issues and running quite well.

Two weekends ago, I did a dump and reload of the AccPac data.  I also upgraded the version of Rbase that is being used.  (I would not have thought this would make a difference, since it is using the same Pervasive ODBC driver)

Now the file server running ntdbsmgr.exe starts eating memory and eventually crashes.
Using PERFMON and virtual bytes monitor on ntdbsmgr.exe, all starts out OK.  I will then
see occasional spikes on the monitor and it never releases.   If I stop and restart Pervasive services, the memory resets to zero.

The unfortunate part is I cannot track what is causing the spikes but more importantly why Pervasive is not releasing the memory.  I have went a day without issue, but more often I have to stop and restart Pervasive 2-3 times per day.   This never happened before the dump/reload/Rbase upgrade.   (No programs changes in the Rbase, just a database engine upgrade)


We have less than 12 users accessing the Pervasive via AccPac at any given time.
We have up to 45 users connected via ODBC to the Pervasive database.

Pervasive version 9.71 with compatibility 9.0 checked.
Transaction durability turned on
System cache turned on
Allocate at startup not turned on
Back to minimal with no activity turned on
Microkernel turned off on all machines other than the file server.

File server does not run any other software except the Pervasive server engine.
File server has 4 gig memory.

Once the Perf. Monitor shows ntdbsmgr,exe hitting 2 Gig, it crashes, giving a program exception.

Any thoughts?

Thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Bill BachPresident and Btrieve Guru

Commented:
You are correct -- one Virtual Bytes hits 2GB, the engine will fail.  This is a hard limitation of ANY 32-bit process on a 32-bit OS.  Here's some more details on the problem:
  http://www.goldstarsoftware.com/papers/UnderstandingServerMemoryCounters.pdf
  http://www.goldstarsoftware.com/papers/InterpretingServerMemoryCounters.pdf
  http://www.goldstarsoftware.com/papers/CapturingVirtualBytesToALogFile.pdf

If the engine is leaking memory, and if you've already patched to v9.71 Update 4, then there may not be many other options for this unsupported platform.  PSQLv9 had/has known memory leaks, and nothing else will be fixed in this code base at this time.  Your options will be:
Upgrade to a 64-bit OS, which now gives you up to 4GB for the 32-bit application.
Upgrade to PSQLv11 with a 64-bit engine on a 64-bit OS, which increases the memory space to 8TB
Figure out why the memory is being leaked, and either change the code to release resources properly, or release memory used by Pervasive to leave more room for the leak.

The latter can be done by killing the L2 cache (Max Microkernel Memory Usage) and lowering L1 cache until you have sufficient run-time to make it through a whole day, then restarting the engine nightly.

Author

Commented:
Thanks for the reply.
Any thoughts why this might have worked for several years and then started after a  data dump/reload?

Since I upgraded another program the same weekend, that accesses this database, but uses the exact same ODBC driver,  I would not think that this would be the issue, but I have been wrong before.  I would not think that an ODBC connection would work any different?

Again, this setup worked fine on this equipment for a long time.  It is either related to the data dump/reload or the upgrade of the external database engine.  Any thoughts on these two events?

Thanks.
Bill BachPresident and Btrieve Guru

Commented:
I can think of no logical explanation.  Data files are data files, so any changes at that level  shouldn't matter.  It is possible that the files grew in size, but since there is never a need to perform a "data dump/reload" in the Pervasive database world, I cannot imagine what predicated it.

The problem is usually not the ODBC driver, but rather the application that is calling that driver.  The ODBC environment is fairly complex, and you MUST abide by the rules -- allocate a connection handle, and when you are done, free it up.  While you have an active connection, you can allocate a statement handle, and when you are done, free it up.  If the application fails to free up the resources, this could be causing the memory leak.  Further, some applications generate temporary files and other things at the SQL level that use more memory, so changes to the queries being submitted is possible to be leaking memory, too.

I've worked in environments where the application runs for a year or more before running out of RAM, and others where the system fails after less than 3 days -- both running the SAME application.  Turns out the differences were in the way the users used the application -- one used a particular SQL module a LOT more, and that module leaked more memory.  I'd suspect the application itself as the prime suspect.  try rolling back the application & see if the problems subside.  If they do, contact that app developer for help.

Author

Commented:
I will look into this.   I am the app developer so fortunately,  I can dig into that aspect.  I did not change any code when I upgraded, but I did upgrade the Rbase database engine.  I hope this upgrade is not the problem.   The engine upgrade is significant in that it's own datafile structure is different.   So it would be really difficult to go back to the old version, as significant data has been added over the recent days that is not in a "roll back" format.

Since my app code did not experience this problem with the previous version of Rbase, if it is Rbase, then it would have to be at the engine level and how it handles the ODBC driver differently.   That may be hard to find, prove and submit.

We have to do a data dump and reload in Pervsive / AccPac as when we delete old, uneeded records, the file space is not recaptured.  I.E.  A file approaching 1 gig in size, remains that size when half the data is deleted until a dump and reload is done.

I was wondering if any one had experienced a possible dump/reload issue that changed the database settings somehow.  Not likely but I thought I would ask.

Thanks again.
Bill BachPresident and Btrieve Guru

Commented:
If it is your code, then you can monitor the PSQL memory and simply call the same function over and over again (for an hour or so) and see if memory is being leaked in that call or not, then go on to the next one.

The "right" way of reclaiming free space is by using the Pervasive Rebuild Utility.  I don't know if Sage includes this tool in their custom build or not.  If so, then you can use that utility directly, and the end result will be the smallest possible file -- which USUALLY performs best, too.  If you try to simulate this with another process, then the end result depends much on the process itself.  If they do not include it, then you can get it by performing a full install of the same PSQL version to another machine, and then doing the rebuilds there.

Author

Commented:
I did some testing that posed interesting results.
It is past work hours and everyone has gone home.

Perfmon shows ntdbsmgr.exe using 1.9+ gig.  Almost ready to crash.
I shut down all programs on the network that use Pervasive such as AccPac.   No change
shown by the perfmon.

I exited out all programs on the network that use the ODBC driver.  Again no change,
after 10 minutes.

I shut down all computers on the network except the file server.   After 10 minutes, no change of memory shown on perfmon.   The ntdbsmgr.exe is definitely not releasing memory.   With all other computers shut down, would not this memory free up regardless
of any connecting program or ODBC  connection ?

I stopped and restarted the Pervasive service and memory immediately dropped to zero.   I started one PC up and no memory change.   I started AccPac on that PC and memory went to about 40 meg. (About standard I think)  I exited without doing anything in the program, simply started and exited.   Memory did not drop.

I started AccPac up again and did some reports and editing.  Memory did not increase, but did not drop on exit.

I started an ODBC program and no memory increase shown.   I did some heavy selects and other data manipulations and memory increased very slightly.   After several minutes of heavy usage, no memory increase.   Shut down the ODBC program, no memory change.  Shut down AccPac again, no memory change.   Still at about 50 meg.

Can this be hardware that somehow malfunctioned the exact time I did an upgrade?  Or is this behaviour expected?   How can I test if Pervasive itself is releasing memory?

Called to other tasks, so I cannot answer or check stuff until tomorrow.

Thanks
Bill BachPresident and Btrieve Guru

Commented:
With respect to the first questions, NO.  The engine is using memory for a series of items.  First, there is the static code and data structures -- this will not change.  Second, each thread will have its own memory space, including a 1MB stack space.  Third, the dynamic memory blocks used to track files, handles, locks, and other such components are increased slowly as the resources increase, but will never be released.  Fourth, the L1 cache is a fixed size that is allocated when the FIRST database request comes in, but this memory is never released.  However, this memory is allocated as VIRTUAL MEMORY first, so you will see it in "Commit Size/VM Size" and "Virtual Bytes", but not in "Working Set/Mem Usage".  Fifth, if you have L2 cache enabled, the database engine will increase and manage this, too.  It may shrink, but only if memory constraints on the server force it to shrink.  Sixth, you have any memory from SQL queries and other processes, including communications buffers & the like.  Finally, you have all of the memory that was "leaked" during the time the engine was running.  Out of ALL of this, only the SQL threads are ever really "freed up" when the users exit.

If you read through the first paper on memory counters, then please use specific terms to describe what you are seeing.  Without a common frame of reference and a working vocabulary, explaining memory usage is like trying to tell someone what a hot dog tastes like.  (Try doing THAT.)

As for why the memory did not leak, again, it is not about how difficult a given query was.  The engine is pretty good about not leaking memory on its own -- or at least not a lot.  It is a cumulative effect of many queries running, some simultaneously, that seems to impact the system the worst.  Try putting your code in a loop and call it 10,000 times in a row and see what happens then.  It is very unlikely that a single person is going to be able to duplicate this without some automation, or some idea of WHERE the memory is leaking from.

Better yet, just see if you can manage your memory.  Disable the L2 cache -- this is the BEST recommendation of all.  If you still need more headroom for leaks, try shrinking the L1 cache.  (By the way, you never mention the L1 or L2 c ache sizes -- this might help, too.)

Author

Commented:
Again, thanks for the assistance on this.

L1 cache is set to the default 429469696
L2 (Max MicroKernel) is set to 30%.  Default is 60%
Sort buffer is set to 0, whicih is to my understanding use as much as needed.

You comments about one user being difficult to find the issue is correct.  During the day, we have as many as 60 sessions actively accessing the database.

However, another point of interest I have observed that may be of significance.  I can start with a fresh network reboot and virtual memory with 40 users looks OK and remains stable.   It runs about 1 gig.   I have watched it run for 2 hours without any movement and then it will jump 50-100 meg.   What I have confirmed by watching  is that there is no "slow" increase in memory, but rather jumps.   It will stay stationary for some indetermined amount of time.  Sometime a couple of hours others only a few minutes.
But then a jump.  I would not think that a typical leak would cause jumps like this?

I am convinced it must be something to do with the Rbase engine upgrade.  Code has not changed, the system ran fine for years before this, never having a memory problem.  I would literally run weeks without any server restarts etc. and then it was not to flush memory.    The number of users has not changed nor the amount of data.   On a Friday, all was well.  Upgraded over the weekend and the memory problems started immediately.

So now my task is to try and determine if it is a bug in the Rbase engine or some snippet of code that the newer engine handles differently than the old one did.  Problem is that I have thousands of lines of code is many applications that run this system.  It will be quite a task!  Is there any monitor that you are aware of that may be able to tell me which session is making the ODBC call causing the memory jump?
President and Btrieve Guru
Commented:
I agree with your evaluation.  The Rbase upgrade is likely involved, and bad code inside it may be causing the leak.  If you can find out EXACTLY what is happening when the memory jumps, it may yield some more clues to finding it. (If you read the paper on graphing the Virtual Bytes, then you've seen what the jumps can look like at other sites)  I know of no way to pin it down any more accurately, though if you can capture network traffic to TCP port 1583 with a too like Wireshark, you'll have a timestamp on each SQL query coming in, then you can correlate that with the jumps in the memory.  The problem is that the memory monitoring is likely to be in minutes, while the network packets are measured in milliseconds -- makes it VERY hard to correlate.

Try dropping the L2 cache to 0 to disable it, and increase the L1 cache to at least 600-800MB to compensate for performance.  This will give you a static memory allocation around 1GB for the database and the rest of its constructs, which may make it easier to monitor.

Beyond that, I might recommend looking at moving towards PSQLv11.  In addition to being available in a 64-bit  flavor (with an 8TB memory addressing space), it may offer better tools to find out where the memory is being leaked, as Pervasive would be available to analyze a core dump to see where the memory has been allocated.

Author

Commented:
Thanks for the assistance.  Although I have not resolved the task yet, the points go to you for your good advise.   I am afraid I have an arduous task ahead in trying to find some small section of problem code among thousands.  Quite truthfully, sometimes upgrading causes way more trouble than it is worth!

Author

Commented:
After further intensive research, I found the culprit.   In certain places of my RBase programs, I would use a view that consisted of two Pervasive tables via ODBC.   This worked without issue in the previous version of Rbase.

However, with the upgrade, any place that I used a view in Rbase that consisted of two or more ODBC Pervasive tables, whenever the view was called, memory would increase in the ntdbsmgr.exe and not free up.

I changed code (unfortunately in several places) to down load the Pervasive data to Rbase temporary tables and create the view on those instead.   A positive benefit was that the new method now runs probably 80% faster than the old and the memory problem is resolved.

Everything looked proper in code, but did not function as such.   I do not know if something in the new Rbase version is faulty or if Rbase actually tightened up on it's ODBC specs and Pervasive did not like it.   Who knows as I am sure the finger pointing would probably begin!   Rbase has been very depenable for 10 years and I certainly have benefitted much from it's ease of use.  As of now I do not care, my problem is resolved.
Bill BachPresident and Btrieve Guru

Commented:
Awesome news.  You may not care, but do you know if the same type of view could be created in the PCC and, if so, if the memory increased in the same way?

Author

Commented:
A quick test showed:

Creating a view in Pervasive PCC with two tables and executing in the grid did not cause any memory surge.

Connecting Rbase to the Pervasive view via ODBC worked fine and did not cause any memory surge when called upon.

I guess I could have setup the views in Pervasive instead of Rbase and not used the temp tables, but the method I used worked so fast and this way I keep all my program code in one place.   Some views were based on one Pervasive table and one Rbase table.  Probably best to down load the data and keep it in one system of those scenarios.

Pros and Cons to both ways.  I will keep it in mind for sure.

Thanks.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial