building memory allocator

My application is making many memory allocations. A lot of C++ objects are created an destroyed in a short period of time.

The application is also using few threads under the same process, and few processes. (each process has few threads).

I would like to overwrite the new and delete operators of my objects, in order to make memory allocation more efficient.

I saw at the MSDN an example of how to do it with Heap functions.

My questions:
1. should i use the idea of this sample or should i write my own memory allocator (which the new "new" and "delete" operators will use)

2. are there any recommended ways to write a memory allocator ?

3. Any available sources on the internet ?

4. What are the advantages and disadvantages of each way

5. I guess I'm not aware of all the issues concerning my problem, and will be glad to hear about it.

Please state comments only.
I will divide more than 200 points for this question, between those who give me good comments.

LVL 2
livniAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

nietodCommented:
>> I would like to overwrite the new and delete
>> operators of my objects, in order to make
>> memory allocation more efficient.
What do you plan to do in order to make it more efficient???
Just rewritting your own new and delete isn't enough, you need to write one that is more efficient than the default one.  That isn't very easy to do.

>> should i use the idea of this sample
Writting a set of heap management functions is not easy to do.  Just a basic set that doesn't do any optimization is hard enough.  But the windows ones is full of many features to make the heap faster and smaller (as well as other fancy features too).  It would be VERY hard to write a better set.

>> 2. are there any recommended ways to write
>> a memory allocator ?
Many books on data structures will tell you how to write a heap.

>> 4. What are the advantages and disadvantages of each way
I can't see any advantage to writting you own heap, unless you are very experienced and willing to spend a lot of time on it.  But I can't see an adavante to to writting your own new and delete either.  There are ways to make overloaded new and delete faster, but I suspect they won't do you any good.  (like creating a seperate heap for each type of class.  The heap would be optimized for objects of only that class size.  Thus can reduce some of the heap's overhead and complexity.  But the dissadvantage is that you need a heap for each class.  That can be 100s of heaps.)
0
nietodCommented:
>> My application is making many memory
>> allocations. A lot of C++ objects are created
>> an destroyed in a short period of time.
Before you go to any such extreme as writting your own heaps or even just oveloading new and delete, you should determine if there is even a need.  Try running your code under a profiler and see if it spends a decent amount of time in new and delete.  If it doesn't you don't even need to do this.  If it does, you can get an idea of what the best possible savings could be by writting your own.  (Although its unlikely to be much better than the default.)
0
NT_ProgrammerCommented:
Each process has it's own address space, and therefore it's allocation will not interfere with another process' space.

new & delete, malloc/free, etc. use the default heap.  The heap is a reserved block of virtual memory within your process space.  You can specify how much memory you want to reserve and pre-commit with the /HEAP command on the VC++ linker.

Reserved memory means that the memory table has reserved that range of virtual addresses.  Reserved memory does not actually exist yet.  Commited memory is memory that has been reserved and also "created".  The memory range is now a valid range of addresses.

In order to use the heap it must first be locked, so no other thread will be "allocating" the same section of memory.  So if a thread gets swapped off of the CPU in the middle of an allocation and another thread runs that trys to allocate, it will hit the lock and get swapped off.  It would have been more efficient to have let the first thread finish it's allocation and unlock the heap before swapping it off of the CPU.

If you use only the default heap, every allocation will go through this narrow bottle neck.  There are several ways to avoid bottle necking on this one heap.

One of the most common is to have a "Pool" class.  I have one that is templated.

Basically the pool has two functions "Get" and "Put".  This is how it is used:

// global pool object:
CPool< MyClass > g_MyClassPool;

// in code:

MyClass* pMyclass = g_MyClassPool.Get( );

// use pMyclass

// done with it:

g_MyClassPool.Put( pMyclass );
pMyclass = NULL;

If you call Get( ) and there aren't any objects already in the pool it allocates a new one for you and passes it out.  When you call Put( ) it puts it back on the list, it does not free up the heap space.

This avoids _every_ allocation bottlenecking at the same place.  Instead all the MyClass "allocations", or pooling, bottleneck at the same place.  You should notice an improvement in processing times by taking this small step.  And once your app has gon through it's heaviest activity, all the memory that it needs is allocated, no more new/malloc calls for the pooled objects.

If you fear running out of memory you can use that /HEAP linker switch to increase the amount of memory in the default heap.

Another way to get away from one bottleneck is to have one heap per object type that is used frequently.  You basically do the same pooling mechanism but each object resides in it's own heap.  This cuts down more on fragmentation than anything else and I doubt you will see much increase over pooling.
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

NT_ProgrammerCommented:
BTW, I can help you with that pool class if you want.  I already have one that works great.
0
mikeblasCommented:
> In order to use the heap it must first be locked, so no
 > other thread will be "allocating" the same section of memory.  

This problem has been remedied in Windows NT 4 with Service Patch 6, and the fix was carried forward in Windows 2000.

 > A lot of C++ objects are created an destroyed in a short period of time.

Why? If you can remedy that requirement, then everything is solved: you'll have an inherently better alogrithm, and won't need to fool around with your memory manager.

If you have lots of objects, and they're all of a fixed size, one of the more efficient algorithms you can choose is a fixed-size allocator. General-purpose allocators, like those built into Windows, are built with the assumption that you're  going to allocate [possibly wildly!] varying blocks of memory.

Optimizing out that assumption immensely helps.  You can find a decent fixed-size allocator built into MFC; see the CFixedAlloc class. It isn't documented, but you get all the source code.

..B ekiM
0
nietodCommented:
>> This problem has been remedied in Windows NT 4
>> with Service Patch 6, and the fix was carried
>> forward in Windows 2000.
Mike, does the VC RTL use the windows heap functions?  It doesn't appear to in the debug version at least.
0
livniAuthor Commented:
To sum it up to this point:

1. Be sure that if I could change my algorithm of creating/destroying objects I would have done it already.

2. In my application/algorithm a fixed size allocator is not a solution, cause we're using lot of classes, each class - different size, and pointers to variant size strings.

3. My problem is the fact that each thread locks the process heap before allocating. if I could have a heap per thread - the problem would have been solved, and I have written something that let's me have actually a heap per thread, and going to check it in a few days (if anyone want the source code - I'll be glad to give it to you)

4. If the locking problem was solved at service pack 6 - I don't need my code. I just need to install SP6 and see if it indeed solves the problem. (cause now i'm running SP5)

So - I need to know what exactly did microsoft fix at SP6 (What's been changed in the allocation algorithm).

0
NT_ProgrammerCommented:
> This problem has been remedied in
> Windows NT 4 with Service Patch 6,
> and the fix was carried forward in
> Windows 2000.

How did they change the heap allocation to still use one heap but not have thread collisions?

I can see it if each thread get's its own "section" of the heap or maybe even its own heap, but what if on thread allocates and another de-allocates?  How would the system pass the dealloc command to the correct thread without slowing everything down again?
0
NT_ProgrammerCommented:
I still think that pooling is the all-around best answer.  Think about it.  If SP6 doesn't pan out, with pooling you have two steps:

1.  Put heavily used objects in pools.  This greatly reduces thread collision.

2.  If that's not enough, put the "heavy hit" pools in tls.  Now there will be no collision on those objects.  And no worry, since you are sharing the same heap accross all the threads it won't matter that when you call CleanPool( ) and the pool deletes all it's objects, it will put them in the same heap it got them from.  Good safe cleanup.

And if SP6 does pan out, it didn't hurt.  What if for other reasons you can't "enforce" SP6 on the boxes that will run your app?  By coding smarter yourself instead of leaving it in the hands of a dubious OS practice you make your performance foolproof.

If you need help writing your pool class, email me NT.Programmer@home.com.  I'll be happy to help.
0
livniAuthor Commented:
mikeblas - Can you give me the specific fix number ?

NT-Programmer -
I agree with you completely.
Writing my own memory allocator which is built for my application will probably do the best for me.

That's why I have written that piece of code that actually causes each C++ class that I want, to allocate itself in a heap, which belongs to a thread (and not a process).

Few classes may use the same heap, or have their own heaps.

It's kinda cool actually. very simple.
If you'd like - I could email this for you.
0
NT_ProgrammerCommented:
sure, you can email it to me.  I'd be happy to look it over.  NT.Programmer@home.com

you going to give me the points?  :D



0
livniAuthor Commented:
I'll give you the points after getting your opinion about my memory handler, and after you'll find out what's that SP6 fix , cause I haven't been able to find it...
0
nietodCommented:
livni, about how much of your program's time is spent in allocation or deallocation?  
0
livniAuthor Commented:
nietod - A LOT!
really lot...
0
nietodCommented:
Have you profiled it?  You really need to profile it before and after the change.  You run two risks.  The most common is optimizing a non-bottleneck.  It would be unusual for a typical C++ program to spend more than a few percent of its time in allocation (mine does a lot and it spends about 1/2% of its time in new and less in delete.)  So if it is intensive, it might still only be spending 10% of its time in new and delete. If so, that would probably not be worth optimizing.  (Now its possible that it is spending a lot of time in new/delete, but unless you profile it you don't know.)  The second problem is that your optimization might not speed things up and might even slow things down.  This happens A LOT. That is why you must profile after your optimization.   Sometimes the results are shocking.  Even if the optimization produces marginal improvements like cutting time in new from 20% to 18%, it might not be worth adding the complexity of optimization.  

And if you think you get a good estimate of the time spent on some aspect of the program without a profiler--you can't.  Even assembly language programmers like myself are notoriously bad at this.  (Ihad a great quote on this from Scott Meyes, but cn't find it.)  Ever few months I profile my program and I am always schocked at what I see.  There is always 1 or 2 bottlenecks I would never have guessed, ones that easily take 20% of the time and then I find that all the huge, slow algorithm, procedures I was warry about usualy take a tiny fraction of a percent of the time.

Prifile before you do anything.
0
mikeblasCommented:

 livni> mikeblas - Can you give me the specific fix number ?

The fix might've been in SP5, but I think it finally made it in in SP6.

The fix was made at the request of the ASP and IIS teams, who's scalability significantly suffered because of the thread/heap contention issue. The team didn't want to rewrite OLE, so the fix needed to be integrated in the operating system.

I don't know what you mean by "Fix number". Do you mean a specific KB article?  There may or may not be a public KB article that discusses the fix; not all fixes are documented. You can start with Q241211 and look around at the list of fixes it contains; it enumerates fixes in SP6a, and has links to articles that list the fixes in older SP's.

 > How did they change the heap allocation to still use one heap but not have thread collisions?

Unfortunately, I wasn't involved in the code review. My understanding is that each memory block is given a tag that indicates which thread pool allocated the memory. Whenever that block is touched, only that thread-pool's anchors are locked. Even if another thread tries to free memory, the heap manager knows who to lock because the tag on the block indicates the owning pool.

But, it's quite possible my understanding doesn't match reality. The ATL team noticed the negative scalability caused by the original design back when we were trying to submit code for a magazine benchmark request. The ASP team already knew about it and was working to escalate the issue into the NT war room at the time, so we let them continue to handle it.

..B ekiM
0
nietodCommented:
But does the VC RTL even use the API heap procedures, or does it have its own heap algorithms?  In the debug version at least, it seems to have its own heap allocation/deallocation code.
0
mikeblasCommented:
> But does the VC RTL even use the API heap procedures,

The VC++ 6.0 runtime libraries call HeapAlloc() for memory larger than the small block threshhold. For small blocks, the runtime suballocates blocks which originally came from HeapAlloc(). The small block threshold is settable, but it's 1016 bytes by default.

See the documentation for the _get_sbh_threshold() and _set_sbh_threshold() functions. (Unforuntately, the defaults have changed since the documentation was written; the docs weren't carefully updated.)

..B ekiM
0
nietodCommented:
Wow.  good info and links into some more good info.  Good stuff to know, thanks.
0
mikeblasCommented:
Sure. It always seems like I end up giving more information to other experts than I do to the questioners, but I guess that's okay.

..B ekiM
0
livniAuthor Commented:
mikeblas - what does it mean:
"For small blocks, the runtime suballocates blocks which originally came from HeapAlloc()" - If they came from HeapAlloc it means they are in use ? isn't it ?

and about that fix - is this fix talking about Heap per each thread ? how the process's heap mutex problem have been solved ?
0
nietodCommented:
What is happening is that when you request a large allocation, the RTL allocates its directly with HeapAlloc().  But if you request a small allocation it allocates its from a heap (or heaps?) used for small allocations.  The memory used for this "small allocation heap (or heaps)", was in-turn allocated with HeapAlloc().  

>> If they came from HeapAlloc it
>> means they are in use ?
Yes, they are in use as a heap.  So you get a heap inside a block of a heap.

>>  It always seems like I end up giving more
>> information to other experts than I do to the
>> questioner
That is par for the course.
0
mikeblasCommented:
livni> "For small blocks, the runtime suballocates blocks which originally
 livni> came from HeapAlloc()" - If they came from HeapAlloc it means they
 livni> are in use ? isn't it ?

The operating system figures there in use, but the runtime library doesn't.

When the CRTLs init, they get a block of memory from HeapAlloc() and don't use it for anything.

If you ask the CRTL for a block of memory larger than the small block threshold--say, two kilobytes--the runtimes turn around and ask HeapAlloc(). In a release build, the pointer you get back is the pointer that HeapAlloc() returned. In a debug build, the runtimes actually ask for a little bit more than 2K of memory, then add in some debugging information (eg, the block type and line number and file name from where the rqeuest was made) and return an adjusted pointer.

If you ask the CRTL for a block smaller than the small block threshold,  the runtimes start walking their own data structures in the block that was returned from HeapAlloc() back when the runtimes started. The runtimes attempt to satisfy the request from a small portion of that block. If that block is exhausted, another block is allocated and linked to the first.

If memory that was in the small block pool is freed, it's not returned to the operating system--it's marked free in that preallocated pool and the system can try to use it again, there.

..B ekiM
0
mikeblasCommented:
nietod> That is par for the course.

To fight a cliche with a cliche: it's not a two-way street. Asking other experts side-questions is something I endavour not to do.

..B ekiM
0
nietodCommented:
I wasn't trying to ask a side-question to cheat the point system.  You were suggesting that changes to HeapAlloc() would help a C++ program's allocations, which did not appear to be correct to me, so I was trying to confirm it.  (I woudl not go as far as to say I doubted you!)

And I do think it is a two-way street.  I think I can remember a question or two where you made mistakes concerning some of the more remote rules of the C++ standard.  I can't image how someone couldn't manage to learn something on EE.
0
mikeblasCommented:
> so I was trying to confirm it.  

It's a fact.  Some versions of VC++ don't do that, of course--VC++ 2.x always suballocated from the Windows heap, for example. But it's been true since VC++ 5.0, at least.

If you still don't believe me, it should only take you a few minutes to trace into the malloc() or operator new() implementation and see the code yourself.

 > I think I can remember a question or two

A couple questions of a thousand or so? That doesn't a trend make.

..B ekiM
0
nietodCommented:
>> If you still don't believe me,
I did believe.  That was the problem, because I have traced into the RTL allocatation procedures and remember seeing a lot of "manual" allocation code, which didn't fit with what you had said.  That was why I was confused.  What I had probably been debugging into is the sub-allocation procedures.  (Although actually, it would almost certainly have been debug versions and therefore the debug heap procedures, I don't know if those follow this same scheme.)

I don't think I made a trend of asking for information either.
0
mikeblasCommented:
Both the debug and release code ends up calling _heap_alloc_base(). As I explained, the debug code does a little bit of calculation and verification to make sure that it can store the per-block data it wants for diagnosis.

..B ekiM
0
livniAuthor Commented:
I have written my own memory manager (heap-per-thread) and it worked well and incrediblly solved the performance problem!

I would like to give mikeblas the points so please post an answer.

0
hsk0Commented:
Run, don't walk, to http://www.microquill.com and get a copy of SmartHeap. High performance C/c++ memory manager, can be used as simple drop-in for malloc/free/new/delete or you can use their API -- the debug stuff is also useful though we found it's not as valuable as the fixed-size memory allocator and "pools".

The NT SP / W2K / VC6 memory manager's improved, but SmartHeap is still a better choice if you're thinking perf problems lie in the C++ heap usage. MS has improved, but SmartHeap still edges perf out even as drop-in, let alone if you code to their APIs (fixed-size blocks and pools).

Just another happy user,

    - Howard
0
mikeblasCommented:
I can't post an answer because hsk0 has locked the question.

BTW, there's a sample that shows an MT-enabled heap in the SDK. It's called MPHEAP.

..B ekiM
0
livniAuthor Commented:
Sorry hsk.
I'm gonna have to accept mikeblas's answer.
0
mikeblasCommented:
Thank you.

..B ekiM
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
hsk0Commented:
That's cool. Whatever works for you.

BTW, 'niteod' has the unequivocal best answer -- profile profile profile. W/o hard evidence where the bottleneck(s) really are, even experts are wasting breath.
0
livniAuthor Commented:
hsk0
I created a heap-per-thread manager
which completely solved my problems.
So I didn't have to profile.
I knew the problem was the locking of my process's heap, which was locked by many threads who perform many new/delete operations.

Anyway -
If I happen to check this memory manager in the future, you will see a question named "points for hsk0..."

0
xyzzerCommented:
I think this was quite an interesting discussion and I would like to invite you a similiar subject.

http://www.experts-exchange.com/Programming/Programming_Platforms/Win_Prog/Q_20536762.html
Which dynamic memory allocation method to use when?

--Filip
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Development

From novice to tech pro — start learning today.