STL Allocator Question
Posted on 2004-08-11
I am reading some strings from a file into a multiset. The file contains approx 41,500 strings. I can read them into a multiset using the std::copy function very efficiently (it takes about 200 ms).
The problem is, when I try to destroy the multiset it takes about 6 seconds to deallocate all the little string objects that were created.
I'm wondering if I can fix this delay by creating a custom STL allocator that basically doesn't do anything until all of the items it controls have been deleted, at which time it blows away the entire heap it is using. In other words, the idea is to give each collection it's own allocator object that creates and owns it's own private heap for the contents of that collection. Then when the collection is destroyed, have the heap destroy all the little objects it manages with a single delete/free/::FreeMemory call.
Also, it's worth noting that the string objects in the collection I foresee would need to allocate from the same allocator, so that their contained memory is controlled by the "outer" allocator of the collection so that the actual string contents (and not just the string objects) are all destroyed at the same time.
Ideally, each collection would have it's own FastAllocator object so that it's data would be distinct from any other collections of the same type. I would like to declare the collection like this: multiset< string, FastAllocator > dictionary;
Of course this assumes that the contained objects have no meaningful state to close gracefully (such as file handles, etc). While that is not a safe assumption in every programming situation, it's fairly common (at least for me) to have large numbers of small data objects that could be dumped en-masse this way.
Any experiences or code samples would be appreciated.
1. No, this is not a class project.
2. Yes, I know I could do this another, faster way - e.g. use qsort with text pointers, rather than string objects.
However, this has become an issue of curiosity now.
3. The code below was compiled & times using MSVC 7 with unmanaged code.
The code follows & error checking, etc. has been removed for clarity:
typedef istream_iterator< string > string_input;
typedef ostream_iterator< string > string_output;
typedef multiset< string > MyColl;
MyColl* pDictionary = new MyColl;
MyColl& dictionary = *pDictionary;
ifstream ifs( path.c_str() ); // path is a parameter to the function..
cout << "\nreading file\n" << flush;
copy( string_input( ifs ) , string_input() , inserter( dictionary, dictionary.begin() ) ); // timing this statement: it takes ~ 200 ms for my input file (release mode).
cout << "\nfile size: " << (unsigned int) dictionary.size() << endl;
delete pDictionary; // this delete takes over 6 seconds in release mode!! (Interestingly it only takes 2 seconds in "debug" mode, while the copy also takes 2 seconds.)