STL Allocator Question

I am reading some strings from a file into a multiset.  The file contains approx 41,500 strings.  I can read them into a multiset using the std::copy function very efficiently (it takes about 200 ms).

The problem is, when I try to destroy the multiset it takes about 6 seconds to deallocate all the little string objects that were created.

I'm wondering if I can fix this delay by creating a custom STL allocator that basically doesn't do anything until all of the items it controls have been deleted, at which time it blows away the entire heap it is using.  In other words, the idea is to give each collection it's own allocator object that creates and owns it's own private heap for the contents of that collection.  Then when the collection is destroyed, have the heap destroy all the little objects it manages with a single delete/free/::FreeMemory call.  

Also, it's worth noting that the string objects in the collection I foresee would need to allocate from the same allocator, so that their contained memory is controlled by the "outer" allocator of the collection so that the actual string contents (and not just the string objects) are all destroyed at the same time.

Ideally, each collection would have it's own FastAllocator object so that it's data would be distinct from any other collections of the same type.  I would like to declare the collection like this: multiset< string, FastAllocator > dictionary;

Of course this assumes that the contained objects have no meaningful state to close gracefully (such as file handles, etc).  While that is not a safe assumption in every programming situation, it's fairly common (at least for me) to have large numbers of small data objects that could be dumped en-masse this way.

Any experiences or code samples would be appreciated.


1. No, this is not a class project.  
2. Yes, I know I could do this another, faster way - e.g. use qsort with text pointers, rather than string objects.  
However, this has become an issue of curiosity now.
3. The code below was compiled & times using MSVC 7 with unmanaged code.

The code follows & error checking, etc. has been removed for clarity:
typedef istream_iterator< string  > string_input;
typedef ostream_iterator< string  > string_output;
typedef multiset< string > MyColl;

      MyColl* pDictionary = new MyColl;
      MyColl& dictionary = *pDictionary;

      ifstream ifs( path.c_str() );  // path is a parameter to the function..

      cout << "\nreading file\n" << flush;
      copy( string_input( ifs ) , string_input() , inserter( dictionary, dictionary.begin() ) );  // timing this statement: it takes ~ 200 ms for my input file (release mode).

      cout << "\nfile size: " << (unsigned int) dictionary.size() << endl;

      delete pDictionary;            // this delete takes over 6 seconds in release mode!!  (Interestingly it only takes 2 seconds in "debug" mode, while the copy also takes 2 seconds.)
Who is Participating?

Improve company productivity with a Business Account.Sign Up

jkrConnect With a Mentor Commented:
Arg, C&P strikes again - that should have read ("A Custom Block Allocator for Speeding Up VC++ STL")
millsoftAuthor Commented:
P.S. My real concern here is STL collections & allocator behavior, not sorting.  Sorting is just the example I'm using.  Therefore, no points for alternative sorting solutions that don't address the underlying STL collection/allocator issues.  (Thanks!)
Jaime OlivaresSoftware ArchitectCommented:
I know you want an specific answer, I don't like such thinks like change a standard library functionality.
But anyway, here is my suggestion:
If you are reading an specific or predictable amount of data, a single buffer is the most efficient way to do this (if you will read all, and all will fit in available RAM memory).
Even you can build a little index in memory to point to every element of the "big bag".
Just a suggestion. Hopes this help.
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

millsoftAuthor Commented:
Hi Jaime,

>I don't like such thinks like change a standard library functionality.
Huh?  The STL is all about generic programming.  That's why it makes custom allocators an option.

>a single buffer is the most efficient way to do this
Yes, I know.  Please see my original message concerning this (note #2).  


grg99Connect With a Mentor Commented:
Sounds like there's something fishy going on.

It could be that the higher level allocation and freeing strategies are fighting with the heap allocation and freeing code.  For example, if the deleting happens in reverse order of allocation, the heap freeing code might be forced to search the whole free list in order to coalesce blocks.  If it happened in the other order, it might happen much faster.

I'd write a few more benchmark tests to narrow down the problem.
A few simple tests allocating blocks and freeing them in various patterns may be informative.
The idea itself is good - however, an allocator is not the easiest object to design. Not as far as providing the actual functionality is concerned, but the efficiancy of the implementation. Basically, the idea is to allocate a big pool that can be freed easily, but keeping that a contiguous area can be expesive if the limits are reached. So, depending on the parameter you have to fill in, I'd suggest providing an allocator (maybe as a singleton) that prealloc's N*sizeof(typical_object) and then successively returs chunks of that memory, and when the prealloc'd size is exhausted, allocates the same again. For a sample, see e.g. ("")
millsoftAuthor Commented:
Hi jkr,

Ok, things get wierder now.  I tried the block allocator you suggested, and it doesn't compile in VS2003.  Therefore, I rebuilt my project using VC6.  

In that case, the blockallocator does compile, but doesn't speed things up much since the entire process takes only about 200ms INCLUDING DELETING THE CONTAINER.  
Recall that on VS2003, it takes over 6 seconds just to delete the container.

Conclusion: One of several possible things is going on:

1. VC6's STL implementation doesn't actually clean up the strings that are contained in the collection. (i.e. has a Memory Leak)
2. VC7 is doing something decidedly different (and in-efficient) with memory allocation and handling.  I am, to the best of my knowledge, not doing anything that would have memory allocation get routed through the .NET runtime.  (VC6 has a small block allocator, which is, IMHO, very fast.  Does VC7 lack one?)
3. ?

millsoftAuthor Commented:
Ok, I did a little more checking on the small block allocator in VS7.  It turns out it is OFF BY DEFAULT if the application is running on Win2k or later.  (I'm using XP).

Adding the call:


to my program increased the speed in VS7 to 130ms total.

millsoftAuthor Commented:
I don't know why MSFT thought they were doing us a favor by turning off the SBH in VS7, but there you have the explanation.

>>1. VC6's STL implementation doesn't actually clean up the strings that are contained in the collection

No, it does (otherwise I'd be in *deep* trouble :o)

>>2. VC7 is doing something decidedly different (and in-efficient) with memory allocation and handling

That *might* be possible, though I cannot verify that now...
millsoftAuthor Commented:
Here is another pooled allocator that may work on VS2003 - haven't tested it however.
millsoftAuthor Commented:
I asked MSFT about this in the VC newsgroup, here is the reply:
The small-block allocator was not ported to Win64 (ia64 and amd64),
and, the LowFragmentation heap was instead introduced as an operative system
and that one should have the same features of SBH, with a better flexiblity.
BTW, yo ushould be able to use the __MSVCRT_HEAP_SELECT
environment variable to accomplish the same "switch".

That said, there was a thread about 2 weeks ago in this forum
about alternate memory managers,
where this very same topic was discussed in details.

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at

"brad" <bradl at x_remove_me_x dot costbook dot com> wrote in message
> Dear Microsoft:
> In the RTL, there is a function: _set_sbh_threshold
> _set_sbh_threshold sets the current threshold value for the small-block
> heap. The default threshold size is 1016 bytes for Windows 95, 98, ME and
> Windows NT, and zero for Windows 2000 and later platforms. By default, the
> small-block heap is not used on Windows 2000 and later platforms, though
> _set_sbh_threshold can be called with a nonzero value to enable the
> small-block heap in those instances.
> When I took a program that ran fine on VC6, and ported it to VC7, the
> performance was horrible.  It turned out that a collection of small
> in an STL collection were being deleted.  On VC6 it took about 30 ms, and
> VC7, it took over 6000 ms.
> By enabling the sbh handler with this function the VC7 version performed
> well as, or better than the VC6 code.
> My question is WHY is the small-block heap not used on Win2k and later??
> there some alternative coding scheme I should be using to get good
> performance?
> Thanks,
> Brad
Now, that's interesting - will keep that in mind when it comes to 64 bits...
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.