Solved

STL Allocator Question

Posted on 2004-08-11
13
1,383 Views
Last Modified: 2013-12-14

I am reading some strings from a file into a multiset.  The file contains approx 41,500 strings.  I can read them into a multiset using the std::copy function very efficiently (it takes about 200 ms).

The problem is, when I try to destroy the multiset it takes about 6 seconds to deallocate all the little string objects that were created.

I'm wondering if I can fix this delay by creating a custom STL allocator that basically doesn't do anything until all of the items it controls have been deleted, at which time it blows away the entire heap it is using.  In other words, the idea is to give each collection it's own allocator object that creates and owns it's own private heap for the contents of that collection.  Then when the collection is destroyed, have the heap destroy all the little objects it manages with a single delete/free/::FreeMemory call.  

Also, it's worth noting that the string objects in the collection I foresee would need to allocate from the same allocator, so that their contained memory is controlled by the "outer" allocator of the collection so that the actual string contents (and not just the string objects) are all destroyed at the same time.

Ideally, each collection would have it's own FastAllocator object so that it's data would be distinct from any other collections of the same type.  I would like to declare the collection like this: multiset< string, FastAllocator > dictionary;

Of course this assumes that the contained objects have no meaningful state to close gracefully (such as file handles, etc).  While that is not a safe assumption in every programming situation, it's fairly common (at least for me) to have large numbers of small data objects that could be dumped en-masse this way.

Any experiences or code samples would be appreciated.

Thanks,
Brad


Notes:
1. No, this is not a class project.  
2. Yes, I know I could do this another, faster way - e.g. use qsort with text pointers, rather than string objects.  
However, this has become an issue of curiosity now.
3. The code below was compiled & times using MSVC 7 with unmanaged code.

The code follows & error checking, etc. has been removed for clarity:
========
typedef istream_iterator< string  > string_input;
typedef ostream_iterator< string  > string_output;
typedef multiset< string > MyColl;

      MyColl* pDictionary = new MyColl;
      MyColl& dictionary = *pDictionary;

      ifstream ifs( path.c_str() );  // path is a parameter to the function..

      cout << "\nreading file\n" << flush;
      copy( string_input( ifs ) , string_input() , inserter( dictionary, dictionary.begin() ) );  // timing this statement: it takes ~ 200 ms for my input file (release mode).

      cout << "\nfile size: " << (unsigned int) dictionary.size() << endl;

      delete pDictionary;            // this delete takes over 6 seconds in release mode!!  (Interestingly it only takes 2 seconds in "debug" mode, while the copy also takes 2 seconds.)
      
============
      
0
Comment
Question by:millsoft
13 Comments
 
LVL 5

Author Comment

by:millsoft
Comment Utility
P.S. My real concern here is STL collections & allocator behavior, not sorting.  Sorting is just the example I'm using.  Therefore, no points for alternative sorting solutions that don't address the underlying STL collection/allocator issues.  (Thanks!)
0
 
LVL 55

Expert Comment

by:Jaime Olivares
Comment Utility
I know you want an specific answer, I don't like such thinks like change a standard library functionality.
But anyway, here is my suggestion:
If you are reading an specific or predictable amount of data, a single buffer is the most efficient way to do this (if you will read all, and all will fit in available RAM memory).
Even you can build a little index in memory to point to every element of the "big bag".
Just a suggestion. Hopes this help.
0
 
LVL 5

Author Comment

by:millsoft
Comment Utility
Hi Jaime,

>I don't like such thinks like change a standard library functionality.
Huh?  The STL is all about generic programming.  That's why it makes custom allocators an option.

>a single buffer is the most efficient way to do this
Yes, I know.  Please see my original message concerning this (note #2).  

Thanks,
Brad


0
 
LVL 22

Assisted Solution

by:grg99
grg99 earned 125 total points
Comment Utility
Sounds like there's something fishy going on.

It could be that the higher level allocation and freeing strategies are fighting with the heap allocation and freeing code.  For example, if the deleting happens in reverse order of allocation, the heap freeing code might be forced to search the whole free list in order to coalesce blocks.  If it happened in the other order, it might happen much faster.

I'd write a few more benchmark tests to narrow down the problem.
A few simple tests allocating blocks and freeing them in various patterns may be informative.
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
The idea itself is good - however, an allocator is not the easiest object to design. Not as far as providing the actual functionality is concerned, but the efficiancy of the implementation. Basically, the idea is to allocate a big pool that can be freed easily, but keeping that a contiguous area can be expesive if the limits are reached. So, depending on the parameter you have to fill in, I'd suggest providing an allocator (maybe as a singleton) that prealloc's N*sizeof(typical_object) and then successively returs chunks of that memory, and when the prealloc'd size is exhausted, allocates the same again. For a sample, see e.g. http://www.codeproject.com/vcpp/stl/blockallocator.asp ("http://www.codeproject.com/vcpp/stl/blockallocator.asp")
0
 
LVL 86

Accepted Solution

by:
jkr earned 125 total points
Comment Utility
Arg, C&P strikes again - that should have read

 http://www.codeproject.com/vcpp/stl/blockallocator.asp ("A Custom Block Allocator for Speeding Up VC++ STL")
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 5

Author Comment

by:millsoft
Comment Utility
Hi jkr,

Ok, things get wierder now.  I tried the block allocator you suggested, and it doesn't compile in VS2003.  Therefore, I rebuilt my project using VC6.  

In that case, the blockallocator does compile, but doesn't speed things up much since the entire process takes only about 200ms INCLUDING DELETING THE CONTAINER.  
Recall that on VS2003, it takes over 6 seconds just to delete the container.

Conclusion: One of several possible things is going on:

1. VC6's STL implementation doesn't actually clean up the strings that are contained in the collection. (i.e. has a Memory Leak)
or
2. VC7 is doing something decidedly different (and in-efficient) with memory allocation and handling.  I am, to the best of my knowledge, not doing anything that would have memory allocation get routed through the .NET runtime.  (VC6 has a small block allocator, which is, IMHO, very fast.  Does VC7 lack one?)
or
3. ?


0
 
LVL 5

Author Comment

by:millsoft
Comment Utility
Ok, I did a little more checking on the small block allocator in VS7.  It turns out it is OFF BY DEFAULT if the application is running on Win2k or later.  (I'm using XP).

Adding the call:

    _set_sbh_threshold(1000);

to my program increased the speed in VS7 to 130ms total.

0
 
LVL 5

Author Comment

by:millsoft
Comment Utility
I don't know why MSFT thought they were doing us a favor by turning off the SBH in VS7, but there you have the explanation.

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
>>1. VC6's STL implementation doesn't actually clean up the strings that are contained in the collection

No, it does (otherwise I'd be in *deep* trouble :o)

>>2. VC7 is doing something decidedly different (and in-efficient) with memory allocation and handling

That *might* be possible, though I cannot verify that now...
0
 
LVL 5

Author Comment

by:millsoft
Comment Utility
Here is another pooled allocator that may work on VS2003 - haven't tested it however.

http://www.sjbrown.co.uk/pooled_alloc.html
0
 
LVL 5

Author Comment

by:millsoft
Comment Utility
I asked MSFT about this in the VC newsgroup, here is the reply:
========================================================================================
The small-block allocator was not ported to Win64 (ia64 and amd64),
and, the LowFragmentation heap was instead introduced as an operative system
feature.
and that one should have the same features of SBH, with a better flexiblity.
BTW, yo ushould be able to use the __MSVCRT_HEAP_SELECT
environment variable to accomplish the same "switch".

That said, there was a thread about 2 weeks ago in this forum
about alternate memory managers,
where this very same topic was discussed in details.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm


"brad" <bradl at x_remove_me_x dot costbook dot com> wrote in message
news:OtEu1nIgEHA.3700@TK2MSFTNGP12.phx.gbl...
> Dear Microsoft:
>
> In the RTL, there is a function: _set_sbh_threshold
> _set_sbh_threshold sets the current threshold value for the small-block
> heap. The default threshold size is 1016 bytes for Windows 95, 98, ME and
> Windows NT, and zero for Windows 2000 and later platforms. By default, the
> small-block heap is not used on Windows 2000 and later platforms, though
> _set_sbh_threshold can be called with a nonzero value to enable the
> small-block heap in those instances.
>
> When I took a program that ran fine on VC6, and ported it to VC7, the
> performance was horrible.  It turned out that a collection of small
objects
> in an STL collection were being deleted.  On VC6 it took about 30 ms, and
on
> VC7, it took over 6000 ms.
>
> By enabling the sbh handler with this function the VC7 version performed
as
> well as, or better than the VC6 code.
>
> My question is WHY is the small-block heap not used on Win2k and later??
Is
> there some alternative coding scheme I should be using to get good
> performance?
>
> Thanks,
>
> Brad
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Now, that's interesting - will keep that in mind when it comes to 64 bits...
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now