• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 469
  • Last Modified:

RFC Allocating large image buffer.


RFC on the preferred method of allocating a buffer 513 Megabytes in size.
Assume the platform is a Windows NT Machine with 768 MB of RAM.

The buffer is to be used to hold many 14MB bitmap images which will be manipulated as well as output to given window DC's for viewing.  In addition, an image buffer may be created with smaller versions of these images, to be used to render thumbnails from.  

Do I a) create a memory device context and load a 513MB size bitmap into it.  I'm not sure how Windows allocates/manages the memory for bitmaps selected into DC's) b)  Allocate the memory using memory mapped files method (memory mapped file is mapped in RAM).  c) Use the Virtual alloc function.  or d) use c++ new?    My take is to create a class which will when instantiated, allocate a buffer large enough using virtualalloc().  This class will be instantiated as a global object when the application initializes.  Another class it contains will load the image(s) from files into the buffer.
0
Taurus
Asked:
Taurus
  • 10
  • 10
  • 10
  • +1
1 Solution
 
nietodCommented:
VirtualAlloc() is good for a case where you have sparsley used memory, like a sparse array, I don't think you have such a case.

Memory mapping is great if you don't want to consume too much RAM memory at one time and can reasonabley control what portions of the data you want to have mapped into RAM at a time.  Since the windows OS will be using he memory directly, you will have to have it all mapped in, so I can't see how this helps.

It sounds like you have to allocate all the mnemory in one chunk and let windows swap it in and out  if needed according to its usually virtual memory scheme.  So you might as well use new to allocate the memory (or GlobalAlloc() if you are afraid your heap might not be able to expand enough.)

Can windows even handle bitmaps this large?  I don't know.  You might consider alternative algorithms...
0
 
dwaynenCommented:
When manipulating huge amounts of memory like this, it is important to design your software with care.  For instance, even though you need to store this vast amount of data, how much of it is needed at any one time?  As an example, pretend you are writing a paint program that is manipulating these huge image buffers.  If a user minimizes or iconifies one of the images, you should immediately shove the data for the image back to disk and free up the memory it was consuming.  This is to allow other programs (or indeed, even your own program) to operate more smoothly and avoid resource contention.

OK.  So Windows provides a set of facilities for this.  The API CreateFileMapping should be used to create your 513 Meg file, and commit it to disk.  You should probably use your own file for the hFile parameter, since most paging file settings would not accomodate your needs - but it might, so you could experiment with that.  Then you should use the MapViewOfFile(Ex) API to only map certain regions of this huge file into memory.

Obviously, this will take some memory-management infrastructure that you will probably need to write.  Keeping track of what pieces are in memory and at what address are they mapped at.  As soon as the memory is no longer being actively used, use the UnmapViewOfFile API to release the memory being used.  This will keep your working set to a minimum an hopefully help your application's performance.

If you want to play with the cutting edge, check out the "sparse file" capabilities of NTFS 5 on MSDN.  Kind of cool, though it may not apply directly to your problem.
0
 
dwaynenCommented:
nietod...
I just posted an answer that contains much of the same information.  Want to share the points? :^)
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
nietodCommented:
Actually, I'm not sure I aggree with you.  I agree with you in general, but not in this specific case.  The memory used for a bitmap is managed by windows.  What are you going to put in the file mapping?  You really don't have options other than to live with it or redesign.  The only time the memory is under your control is when you initialize the bitmap, like when you use SetDIBits() or when you get the bits back wioth GetDIBits().  And for those instances,I don't see how file mappping is goign to help you.
0
 
TaurusAuthor Commented:
The buffer cannot be allowed to swap out to disk.  I want contents of the buffer resident & waiting in RAM as long as the application is running.  This is a dedicated application.  As for other applications they can use any memory on the system above 513 MB. This is a requirement
0
 
dwaynenCommented:
As you point out, we don't have many options about how memory for bitmaps is managed.  This seems to be a windows internal detail that isn't exposed to the user.  Hence we have to use the APIs CreateCompatibleBitmap and its relatives.

But what IS under our control is how we back those bitmaps into storage.  So as soon as we don't need to be displaying an image, we should deselect it from the DC, stream the bits into storage that we can control, and destroy the actual HBITMAP handle.  Especially when the bitmaps are this big!

Its unfortunate that we can't just tell Windows "display the bits in this buffer".  But I understand how that would complicate the interfaces, and introduce ownership complications.
0
 
TaurusAuthor Commented:
I would add that I may allow the user to change the size of the buffer.  I know I can do this with MM files, and perhaps with Virtual alloc and virtual lock.  However, I probably can't rely on new or the device context method.
0
 
nietodCommented:
>> As for other applications they can use
>> any memory on the system above 513 MB.
>> This is a requirement
Its very unlikely that they will only be using the ~200MB left.  Windows uses virtual memory so parts of the memory used by your bitmap will probably be swapped out as other programs need memory.  Adn then swapped back in as you use it again.  You have no control over this, nor would your really want control over it.
0
 
nietodCommented:
>> I would add that I may allow the user to
>> change the size of the buffer.
Based on what?  What is it that you are doing with this huge image?
0
 
dwaynenCommented:
Rejected answer...

To keep a buffer in memory at all times you must do the following.

1) Crank up the allowed working set for your process via the SetProcessWorkingSetSize API.  I can not emphasize the care that should be taken.  From MSDN:

Using the SetProcessWorkingSetSize function to set an application's minimum and maximum working set sizes does not guarantee that the requested memory will be reserved, or that it will remain resident at all times. When the application is idle, or a low-memory situation causes a demand for memory, the operating system can reduce the application's working set. An application can use the VirtualLock function to lock ranges of the application's virtual address space in memory; however, that can potentially degrade the performance of the system.

When you increase the working set size of an application, you are taking away physical memory from the rest of the system. This can degrade the performance of other applications and the system as a whole. It can also lead to failures of operations that require physical memory to be present; for example, creating processes, threads, and kernel pool. Thus, you must use the SetProcessWorkingSetSize function carefully. You must always consider the performance of the whole system when you are designing an application.

2) Use the VirtualAlloc API to allocate your big buffer.  I think you could actually use any memory allocator, but VirtualAlloc is probably safest.

3) Use the VirtualLock API to lock the buffer in memory.  Again, this is a fairly mean thing to do to an OS, especially with buffers of this size.  Again from MSDN:

Locking pages into memory may degrade the performance of the system by reducing the available RAM and forcing the system to swap out other critical pages to the paging file. By default, a process can lock a maximum of 30 pages. The default limit is intentionally small to avoid severe performance degradation. Applications that need to lock larger numbers of pages must first call the SetProcessWorkingSetSize function to increase their minimum and maximum working set sizes. The maximum number of pages that a process can lock is equal to the number of pages in its minimum working set minus a small overhead.

Pages that a process has locked remain in physical memory until the process unlocks them or terminates.

*******************

But even if you do this, as neitod points out, you don't have control over the internals of GDI.  When you allocate bitmaps via the CreateCompatibleBitmap API, Windows is free to do its own thing - and all you get back is a handle, so you can't even lock the bitmap's memory in place. So the best you can do is have a duplicate copy of your bitmaps in locked RAM, and create/destroy the actual HBITMAP GDI objects as needed.  This has the sad effect of essentially doubling the required amount of RAM if all of your bitmaps are being displayed.


0
 
TaurusAuthor Commented:
dwaynen,

I thought that was what directx was about, displaying bitmaps directly?

Per my intentions, it may be that swapping single images into and out of a DC from my 513MB RAM buffer may be ok performance wise? If not then I'll have to look into directX.  However, when you create a bitmap with CreateDIBitmap() does it copy the bits to a new buffer?  I believe it does which is unfortunate.

My question still is geared towards the advantages and or disadvantages of any particular method of allocating the 513MB buffer assuming that it will be non-paged.  I forget if I have to do anything more than up my working set size when using Virtualalloc?
0
 
TaurusAuthor Commented:
See my comment about directx.  Also, what I am still looking for is some comment about a preferred method of allocating the 513MBs non-paged.  

One part of this question concerns performance , and another concerns using a common method that is not to attached to the windows platform.  When you allocate with "new" is the memory paged?  I read that the C++ spec. doesn't specify the implimentation and that "new" could allocate a variety of ways including but not limited to calling malloc.  
0
 
dwaynenCommented:
Ah.  Well, if you are doing DirectX, then the previously mentioned GDI limitations are moot.  I apologize if I missed the Direct-X nature of the question somewhere along the way.

Yes, with DirectX you can do this.  Use the SetProcessWorkingSetSize/VirtualAlloc/VirtualLock steps described in my answer.  Then use the DirectDraw support for "client memory surfaces".  Essentially, this involves setting the lpSurface member of the DDSURFACEDESC2 structure in your call to DirectDraw::CreateSurface.

This should provide the solution you are looking for.
0
 
dwaynenCommented:
In general, memory allocations are handled in a manner that the OS designers determined as most advantage to the health of the ecoloogy of programs co-inhabiting the OS.  As such, almost all Operating Systems implement paged memory so that it can be safely and easily swapped out.  To circumvent this facility is likely going to be an OS-dependent mechanism.  On Win32 this is VirtualLock.  I cannot speak for other Operating Systems.

To answer your questions:
1) Performance is going to be degraded for everyone but yourself.
2) Languages like C++ do not expose such low-level OS-specific functionality since they are designed to work on a wide-range of environments - PCs, Macs, Workstations, Super-Computers, Amigas, enbedded controllers, etc.
3) Its not that there is a prefered way to allocate unpaged memory.  There is only one way - at least from user-level applications.  And that is VirtualLock on Win32.  You could actually use C++ new() to allocate the memory and then use VirtualAlloc to lock it into memory permanently.  And you should note that the memory is still backed by storage - namely the system paging file - but that the OS guarantees that it won't page the memory out until it is unlocked.
4) There have been many requests for memory that is not backed at all by storage.  The main reason is to store passwords and other critical security data that if ever written to disk could pose a security problem.  At present Win32 does not support this feature, as the underlying virtual memory machinery wasn't designed for it.

0
 
TaurusAuthor Commented:
I haven't used directx.  Can client memory surfaces be within window extents automatically, or do I have to handle all of that myself, i.e. tell the directx api where in screen coords the window extents are etc..  
0
 
dwaynenCommented:
I'm not an expert on DirectX.  I've used it before enough to get a little ball to bounce around the screen.  And yes, it can be "windowed", though with a performance hit.  Handling being windowed takes some work, but I don't think it is too hard.

Check out the DirectX Developer Center at http://www.msdn.microsoft.com/directx/default.asp

For an example of windowing, check out the "Switcher Sample" at http://msdn.microsoft.com/library/psdk/directx/ddsamp_3mlz.htm

DirectX (or more explicitly, the DirectDraw component) alows for much greater control than GDI but is also a little more work.  For instance, you have to handle some extra cases where your surfaces get nuked!  This is because surfaces are commonly allocated out of video RAM (nice and fast), and obviously other applications need video RAM if they want to render.  Still, DirectDraw is really cool and might do exactly what you need.  It does place limitations on the supporting OS though.  The current verison of DirectX is 7.0, and I think NT4 only support DirectX 2.0 or something.  You would have to check your requirements against what is available.  Window2000 greatly improves DirectX support, and Win95+ can all install DirectX upgrades.  NT tends to frown on that, not sure why.

0
 
nietodCommented:
>> So the best you can do is
>> have a duplicate copy of your
>> bitmaps in locked RAM
No that would be the worst thing you could do.

first of all it would mean having two copies of something that you need one copy of.

Second of all, where do you get the idea that locking memory improves performance (You are not alone in this, I've seen it time an time again.)  The logic is that "the memory can't be swapped out so it can be access faster" so performance increases.  That is not what happens.  Windows swaps memory based on the frequency and recency of use.  If memory is being used a lot, it doesn't get swapped out.  If it isn't it does.  When you lock memory you may be locking memory that would be a good candidate for swapping out and forcing the OS to swapp out memory that would be better left in.  You are very likely to degrade performance this way.
0
 
nietodCommented:
>> However, when you create
>> a bitmap with CreateDIBitmap() does
>> it copy the bits to a new buffer?  
Yes.

>> My question still is geared towards the
>> advantages and or disadvantages of any
>> particular method of allocating the 513MB
>> buffer assuming that it will be non-paged.
I don't know what you mean by "non-paged"

But the best method depends on what you will be doing with the buffer.

My guess is that the times that you will need such a buffer wil be very infrequent and short lived.  Basically just when you need to load or save the bitmap, right?  In that case, Globalalloc() will be fine.

>> When you allocate with "new" is the memory paged?
Yes.

>>  I read that the C++ spec. doesn't specify the
>> implimentation and that "new"
But in windows all "user" memory is paged.  You can prevent memory from being swapped out, but this is usually a bad idea.  The goal is to improve performance, it is rare when it doesn't degrade performance.

>> 1) Performance is going to be degraded
>> for everyone but yourself.
If you lock memory indiscriminantly, there is a very good chance your program's performance will degrade too.  And considering the size of the lock, I would guess it might be very noticable.
0
 
dwaynenCommented:
>>>> So the best you can do is
>>>> have a duplicate copy of your
>>>> bitmaps in locked RAM
>>No that would be the worst thing you >>could do.
If he moves to DirectDraw he can avoid this duplicity and still have his bitmaps (surfaces) under his control.

I completely agree with nietod on the issue of not locking memory.  You would have to work very hard at convincing me that such a design was necessary.
0
 
TaurusAuthor Commented:
>>I completely agree with nietod on the issue of not locking memory.  You would have to work very hard at convincing me that such a design was necessary.

Ok, try this on for size.  The application is the integral software portion of a dedicated NT system which has one purpose, running the application and doing what the application does.  One of the most important aspects of the application is being able to load and view up to, and most often, 36 14MB size images.  The images don't always come in from files and often come in via the driver for custom PCI or Firewire hardware.  This hardware is very application specific not a general purpose board sold for other purposes.

Each image gets tiled as a thumbnail in a proof sheet window.  The thumbnail size can be changed to be any one of five sizes.  When the user clicks on any one of the thumbnail images, we want to display a portion of the 14MB image in a view window.  We don't want the user to wait to view while the images load from or to a file.  Keep in mind that the user will 90 % of the time be viewing most or all of the images in rapid succession.  The viewing may occur in order or out of order.  Specific routines to process the images will be happening concurrently.

Put another way, the images come in rapid fire from hardware.  They are displayed as thumbnails.  The user will immediately view one or more images in quick succession.  Concurrent threads will run algorithyms on each image.  Some of the images will get saved to disk.  Saving of images to disk should run most of the time as a background operation so as to minimally impede the workflow of viewing and applying algorythyms to the images.

Still think I shouldn't lock the memory?

0
 
dwaynenCommented:
OK, here is what I would suggest.  Don't worry about locking the memory down just yet.  Just allocate your gigantic buffer from the process heap.  I would recommend that you stay away from allocateors such as new or even malloc, since they are managed by the run-times and the size you are requesting may be burdensome to them.  Use DirectDraw for your application.  This should work well for you and will allow you to use the buffers you allocated directly.  Also, you'll enjoy a significant performance boost in your graphics routines.

When done, profile your application - especially looking for page faults.  If the page faults are excessive, you might want to consider locking down your memory.  But the point is: if you are using the memory, Windows will keep it resident anyway.  If you aren't using it, then its wasting memory space.

I would be interested in the performance results.
0
 
nietodCommented:
>> Just allocate your gigantic buffer from
>> the process heap.  I would recommend
>> that you stay away from allocateors
>> such as new or even malloc, since they
>> are managed by the run-times and the size
>> you are requesting
>> may be burdensome to them
I definitely aggree, except I don't see any reason for having a single huge buffer, why not 36 14Mb allocations?

>> If the page faults are excessive,
>> you might want to consider locking
>> down your memory
What memory do you lock?  The memory that tends to get paged out?  The memory that is used less frequently?  You want to keep the less frequently used memory and swap out the more frequently used?  

While Windows system for tracking memory ussage has limitations, it is almost always yields at least as good if not better results than a programmer's intuition.  

Typically lockign works best for small allocations that need to be accessed unusually quickly, like if they store information that must be read from a hardware device during an interrupt.
0
 
TaurusAuthor Commented:
In certain instances when pictures come in from hardware it is not ok to have a pause between the pictures.  This is the case even if the hardware supports it.
 
further, if I have 36 pictures and one of them has not been selected for a while and gets partially paged out for any reason,it is not ok to have extra time elapse for paging when the user selects it for viewing.  Unless, its inperceptable time, like tens of milli-seconds. I don't think this is the case with 14MB images.

Again, the machine is dedicated.  If there are other data processing routines that need memory then the machine will be configured with extra memory above 513 MB to accomodate.  If the user wants to run photoshop in parallel and load big images, then he will need to have enough memory beyond 513MB (and what NT needs) to accomodate this without impacting the dedicated application.  A typical configuration will have at least 768MB.

If a particular algorithym(s) of the dedicated application requires lots of memory then our basic memory requirements will be adjusted upwards.  
 

0
 
dwaynenCommented:
I would still encourage you not to perform memory-locking at this point in the game.  Rather, get your application working first, and then consider what tweaks you need to do to meet customer requirements.  It must be nice to be able to require 3/4 Gig of memory for an app!  :^)

Rest assured that you can lock down your memory if absolutely required.  But this is a fairly heavy-handed approach, so you should only do it if absolutely needed.  However, as nietod points out, the OS may swap out your application code itself if it goes idle for a while.  Doh!  However, I'm afraid that I don't know an extensive amount about the subtleties of the way NT implements virtual memory and how tweaking it can effect performance.

0
 
KangaRooCommented:
In all this it is consistently assumed that you actually need to conserve the full 14MB image for *thumbnail* viewing??
36 x 14 MB that is, eh, 13.000 x 13.000 (24 bit RGB) pixels being displayed on, say, 1000 x 1000 pixel view. So the GDI needs to scale this 500+MB bitmapped image back to approx 2 MB every time the viewing window needs to be redrawn. That process in itself will take quite a few seconds (what was the throughput again for memory?)...

Seems better to create a scaled down thumbnail, fitting its 1/36 area of the actual client view and redraw that when it is time.
Scaling down takes time, but it has to be done anyway, either by you or by Windows (no graphics card can handle 500MB images).

A simple DDA scaling could be done, probably even while image data pours in from your hardware (DDA simply skips pixels). This is the type of scaling the GDI uses anyway.
0
 
nietodCommented:
I think the 14MB images are themselves the thumbnails.  That's how I understood it anyways.  i.e. those won't be scaled.  However, a square 14M 32bit color image is about 7K pixels on a side, so maybe not.  (after thinking about a 1/2 Gig image, a 14Meg image seemed such an improvement, I didn't think to check its size.  :-)  )
0
 
TaurusAuthor Commented:
The images are 2384 X 2040.  These images when thumbnailed can come from a smaller pre-scaled image.  However, when the thumbnails are selected to be viewed, the view will be a 1:1 window on the 14MB image.  The user will be able to scroll and zoom.  If the user zooms then the interpolation will happen on the 14MB image using bi-cubic or bi-linear interpolation.

>>DDA
?
>>This is the type of scaling the GDI uses anyway.
 
Is it?  I was told by someone else that stretchDIBits() does an excellent job of scaling.  I don't personally know what it uses, because I have only used Intel's free Image Processing Library for scaling.

Not withstanding, stretchDIBits() needs a DC and if the scaling is to come from a 14MB image then copying it (14MB)into a DC doesn't seem like the way to go because your copying 14MB twice.
0
 
nietodCommented:
>> stretchDIBits() does an excellent
>> job of scaling.
To the best of my knowledge it does it by removing rows/columns of pixels when scalling down and by adding duplicate rows/columns when scaling up.  This is the easest way to scale, but doesn't yield the best results.  There are many other ways to scale, but they can be considerabley slower.  (And no one way is best for all cases)

>> if the scaling is to come from a 14MB
>> image then copying it (14MB)into a DC
>> doesn't seem like the way to go because
>> your copying 14MB twice.
I'm not following this.  Why the two DCs?  Where are you copying from? to?
0
 
TaurusAuthor Commented:
>>To the best of my knowledge it does it by removing rows/columns

I'll have to check this out for myself as I can't believe that the person who told me about stretchDIBits would use subsampling and oversampling methods.  There just not good enough.

To clarify on the copying ?.  14MB Pictures come in rapid fire and are placed into their respective memory buffer.  To scale them for viewing(if they are being scaled), why copy them into a DC prior to performing the scaling?  If I were to want to use StretchDIBits() for scaling then I would have to create a compatible bitmap, copy bits, and select it into the DC.  Hence an extra copy of 14MB.

The person that implemented this before, I believe, allocated the 513MB buffer as a bitmap selected into a DC and then used StretchDIBits() to scale and otherwise render the views.  In addition, she created a secondary buffer with pre-scaled images for rendering the thumbnails.  However, keep in mind the views were always rendered from the 14MB images unlike the thumbnails.

I don't think using a DC this way is the most optimal approach because it leaves it up to windows to manage the buffer.  So what I think I am saying is that first allocate 36 14MB buffers, then allocate a smaller buffer for holding 36 smaller images which the thumbnails are scaled/rendered from.  Then when the user selects a thumbnail to view, copy only the portion of the 14MB image that will fit the view into a compatible bitmap.  If it needs to be scaled first then I scale it by reading from the 14MB buffer, but don't copy the whole 14MB into a compatible bitmap and then scale with StretchDIBits().  
0
 
nietodCommented:
>>  Hence an extra copy of 14MB.
I see.  and I guess that that goes on frequently enough that it really sin't fair to consider that a temporary additional memory usage.

Your scheme sounds pretty reasonable.  You might consider using memory mapping for storing the 36 14MB buffers.  However as you have a lot of memory available for this and don't want to "share it", you might do best just storing the data diectly in RAM.  (If you use file mapping you definitely will be swapping out unused portions, if you just use memory, you only _might_ be swapping out unused portions.)
0
 
KangaRooCommented:
DDA boils down to removing pixels / rows (when scaling down). It is very similar to the Bresenham line drawing algorithm. DDA can be easily transformed to do a nicer (lineair) job at almost the same cost, much like Wu anti-aliased linedrawing.
Stretching (zooming) a screen-shot will tell you a lot about the method that is used. In general, digitized photographs stay acceptable with DDA while vector graphics detoriorate rapidly.

If DDA is unacceptable, you can not use StretchBlit and you'll have to zoom yourself. In that case you'll have the original image and a scaled down copy.
A scaled version of the entire image, rather then only the visible part would have my preference.
0
 
TaurusAuthor Commented:
Thanks to all of you for the comments!  It was a coin toss to choose either nietod's or dwaynen's comments as the answer.
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 10
  • 10
  • 10
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now