Solved

Windows 2000 file-caching bug

Posted on 2002-06-07
17
281 Views
Last Modified: 2013-12-03
We are running Windows 2000 SP2 on several computers, and writing an application which seeks into and reads parts of several large files.  File size totals about 60GB.  A given "query" into these files typically reads a total of 500MB or so in large chunks.

Here's the problem: After performing 20 or so of these operations, the "Available Memory" in the system (as indicated by the task manager) drops by 500MB or so, and continues to drop steadily as more operations are performed.  On a system with 512MB of memory, operations grind to a halt.

I expect file-caching to take up memory, but it is normally released when processes need memory for other purposes.  However, this memory is not released until the process terminates (I haven't tested to see if it is released when the file is closed).

This is not a memory leak in the application!  We know this because:

1) Changing the file-read mode from buffered to unbuffered (in the CreateFile call) makes the problem go away.  But then we lose benefits of OS caching altogether.

2) The VM size reported for our process in the Task Manager hovers around 50MB consistently.

NT/2000 has a history of file-caching problems.  But here's another wrinkle: this issue only seems to show up on dual-CPU machines and not single-CPU machines.  

We have three systems like this exhibiting problems, which are very different.  First is a Dell 650 workstation with a SCSI stripe-set.  Second is a SuperMicro workstation with a 3Ware RAID controller.  Third is an Intel Pro mobo system woth a Mylex SAN controller.  Because of the independence of disk system, I conclude that this is not a disk-driver issue.

I'm looking for an OS patch or programming workaround to avoid the issue.
0
Comment
Question by:jlilley
  • 9
  • 4
  • 2
  • +2
17 Comments
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Have you tried setting FILE_FLAG_SEQUENTIAL_SCAN (if applicable)?
0
 

Author Comment

by:jlilley
Comment Utility
That makes no difference.
0
 

Author Comment

by:jlilley
Comment Utility
That makes no difference.
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
Are you using memory mapped files for reading your files?
0
 

Author Comment

by:jlilley
Comment Utility
I am using ReadFile()
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
Not the best approach by all means, or do you have a specific reason not to use CreateFile(...)/CreateFileMapping(...)/MapViewOfFile(...)? Are you also using OpenFile(...) which is only there for compatibility with older win-versions?
0
 

Author Comment

by:jlilley
Comment Utility
A memory-mapped approach is harder to manage because the total file size is 60GB.  It would require using a "special" memory model or moving-window mapping techniques.

But would it actually solve the problem?  I mean, have you seen these symptoms and know this solves it, or are you making suggestions for experiments that can be run as tests?
0
 

Author Comment

by:jlilley
Comment Utility
The other reason that memory-mapping makes the problem more complex is that our data is compressed, so its easier to read it as a stream and decompress it as it comes in, rather than map it and convert buffers, although a stream model can certainly be built over the memory-mapped model.  But would it help?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
You're right, I'm basically suggesting experiments - as you call it - that can be run. I haven't seen these symptoms myself but whenever I had to deal with large files I used memory mapping. My code did run on SMP-systems with no problems whatsoever. Personally I don't think it makes it any more complex, but you will have to decide for yourself. Same is true for building a stream model on top of the memory mapped structure.
Would it help? I don't know, but it certainly helps to close in on the cause of the error. If it makes a difference then your original code was probably, but not necessarily, erroneous. If the symptoms remain your code was ok -- I'd think it's worth a try anyway.
My second point: If you actually use OpenFile() you should definately replace it with CreateFile() and see if it makes a difference. OpenFile() is from a time when 33MHz was about as fast as it gets...
0
 

Author Comment

by:jlilley
Comment Utility
fl0yd, thanks for the insights.  I was really hoping to find someone who had seen THE PROBLEM before and knew the answer.  Perhaps that is wishful thinking :-)

We do call CreateFile() with:
FILE_SHARE_READ | FILE_SHARE_WRITE
GENERIC_READ
and always read on 64k boundaries (because the same code is used for non-buffered mode).

Incidentally, simply adding FILE_FLAG_NO_BUFFERING completely erases the problem.  Go figure.

I'd like to leave the question open to see if I can get feedback from someone with direct experience on this problem.
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
No problem -- since I didn't provide you with a real answer that is perfectly ok.

Something that pops to my mind: Is it necessary for you to open the file with the FILE_SHARE_WRITE flag? Like I said before, I'm kinda lost here so I'm just guessing. But granting write access to the file while reading it could get the system to make a copy of portions of the file -- not sure how this is handled, though, since I have never stumbled across a similar situation.

On the other hand, the problem is very likely connected to the dual-cpu-environment. If you read through Intel's PIII white papers about 98% of the errors are concerned with 2 cpu's running in parallel. Two more things to test: A BIOS update might eliminate the problem (motherboard, not the SCSI BIOS). Check back with the manufacturer to be sure though. Using the newest compiler available is also highly recommended and probably less risky than a BIOS update.

Bear with me -- I've finally reached the point where I need to know. A psychologist might call it obsession, but then again, I don't really care ;)
0
 

Author Comment

by:jlilley
Comment Utility
I'll check on FILE_SHARE_WRITE to see if that matters.  My summary of the problem is currently this:
1) It is definitely linked to dual CPUs -- the problem does not appear at all on single-CPU machines.  
2) It is independent of disk driver and manufacturer because we have three very different computers (Intel, SuperMicro, Dell) all exhibiting the symptoms.  
3) Turning on FILE_FLAG_NO_BUFFERING eliminates the problem, so it is clearly related to file caching in the OS.
4) I've written a 50-line test case that just opens a bunch of big files using fopen, and randomly seeks and reads them using fseek/fread.  It also shows the symptoms, so this is not a subtle application issue.
5) The locked-down memory use seems to stabilize at around 0.7% of the total file size.  Perhaps there is some data structure used in file caching (page table?) that must remain locked in RAM for dual CPUs to access it.

I know that I can always turn on FILE_FLAG_NO_BUFFERING and build up some simple caching to make this work, so it is "solved" in a limited sense.

My current experiments:
1) See if SetProcessWorkingSetSize() makes any difference.
2) Turn off FILE_SHARE_WRITE
0
 
LVL 5

Expert Comment

by:robpitt
Comment Utility
Sounds to me like you may well have found a genuine problem!

Have you tried it under XP?
If its an unknown bug it'll probably be present in XP as well (XP=NT5.1).

Anyway the following link may be of interest to you...
http://www.sysinternals.com/ntw2k/source/cacheset.shtml
0
 

Author Comment

by:jlilley
Comment Utility
I suspect this is more of a "feature" than a bug.  It is probably file-cache page tables or some such.  By the way, we have reproduced this on a single-CPU machine.  Unfortunately we have no XP box with enough disk to test.
0
 
LVL 3

Expert Comment

by:cwrea
Comment Utility
You'll find this interesting and relevant:

http://www.heise.de/ct/english/97/01/302/

0
 
LVL 3

Accepted Solution

by:
cwrea earned 200 total points
Comment Utility
Given the length of time this problem has existed, and that Microsoft hasn't provided any remedy other than letting you turn caching entirely off (which kills performance), I suggest you write your own buffered reading and writing API.  Encapsulate your logic so that if/when the problem is ever fixed, you can change implementations easily.  Perhaps provide an override to users of your app so they can choose your implementation or the system implementation.

0
 

Author Comment

by:jlilley
Comment Utility
Excellent!  This is the "official" confirmation I was hoping for.  So it is indeed an NT/2000 bug!

I was already on the way to using unbuffered I/O and writing my own cache manager, so this confirms I'm on the right track.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

This article describes how to add a user-defined command button to the Windows 7 Explorer toolbar.  In the previous article (http://www.experts-exchange.com/A_2172.html), we saw how to put the Delete button back there where it belongs.  "Delete" is …
For a while now I'v been searching for a circular progress control, much like the one you get when first starting your Silverlight application. I found a couple that were written in WPF and there were a few written in Silverlight, but all appeared o…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now