Internal malloc behaviour of MS C++ compilers with char/tchar strings

Hi Folks,

I've been fine tuning a c++ program for years which does/did lots of mallocs and now its pretty good in the sense that its not spending much time on mallocs as opposed to the other "real" processing it has to do. I've literally spent many months optimising this program and performance and memory optimisation is absolutely critical.

I really would like to understand how the compiler/runtime will treat predeclared variables.

Lets take a simple function which uses a work area as a string

    TCHAR workarea[1024];

Now, if I declare that inside the function, and the function does not recurse, is C++ going to the use same memory buffer every call, or is it going to do a dynamic malloc on every function call, either of the 1024 bytes or of the block size needed for all variables declared in that function ? I'm thinking if its mallocing and freeing a block on every function call, it might be slowing things down.

If there's no recursion, the workarea could be declared outside of the function, and then just memset it to 0 on each call, and there's only one memory space used throughout the execution of the program. This is of course horrible programming, but if it gives me performance gains and memory optimisation, I'm up for it (and I can use static I suppose?)

Any advice and discussion around this topic would be much appreciated.. thanks




LVL 8
plqAsked:
Who is Participating?
 
evilrixConnect With a Mentor Senior Software Engineer (Avast)Commented:
>> I was just wondering internally if the memory for that variable is allocated dynamically
In the example you give it is a stack based variable... it is fixed at compile time. It is not dynamic and malloc plays no part in its creation.

>> so I wanted to check out how the internals worked.
How malloc works is platform dependent and is not defined by the C++ standard, you should check your compiler documentation. Generally, though, the memory is allocated from the heap, which is a linked list type structure of free memory fragments. When you call malloc the freelist will be walked and an appropriate memory fragment will be chosed based upon the allocation strategy use by the OS (to minimise heap fragmentation),

>> I just did a test on the program and the same variable only changed its address once in about 1000 calls
As it's stack based it probably should change at all (unless your OS is implementing a security mechanism to prevent fixed position stack offsets to try and avoid exploits from buffer overruns).

>> I would have been more concerned if it changed every tim
I wouldn't, some OSs (Linux for example) can purposefully change the offset for stack based variables every time the process runs to minimuse the security risk of someone using a potention buffer overrun as an exploit.

>> So it is pretty efficient.
Depends what you mean by efficient. It's 1KB (or 2KB if TCHAR is wide) of stack allocation that may not be used. Also, it's a potential buffer overrun.

>> Any advice and discussion around this topic would be much appreciated
Why not just use a vector?
http://www.cplusplus.com/reference/stl/vector/
0
 
evilrixSenior Software Engineer (Avast)Commented:
The example you give is a stack based variable so malloc is never called. it is not a dynamic variable, it is fixed at compile time.

BTW: If this is C++ code why are you using malloc instead of new/delete?
0
 
evilrixSenior Software Engineer (Avast)Commented:
You know malloc and free are just functions that YOU as the programmer have to call if you want to allocate memory dynamically, right?

With the example you give you could use placement new for that kind of optimization.
http://www.devx.com/tips/Tip/12582
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
plqAuthor Commented:
Well yes it does need a rewrite. But I'm stuck with 10k lines of code that aren't perfect, I don't want to risk stability by going through changing it all.

I was just wondering internally if the memory for that variable is allocated dynamically, just because we all know malloc and free can kill performance if not used properly, so I wanted to check out how the internals worked.

Interestingly, I just did a test on the program and the same variable only changed its address once in about 1000 calls. I don't know why it changed once, but I would have been more concerned if it changed every time. So it is pretty efficient. I guess if we were recursing then it would have to change every time.
0
 
plqAuthor Commented:
thanks. I think I have my answer now. I have started using vectors, I didn't know about them until someone mentioned it in a recent ee thread, but rewriting the whole thing isn't reasonable as you can imagine. We're pretty good with overrun buffer controls, by using the _s cruntime functions
0
 
evilrixSenior Software Engineer (Avast)Commented:
>>  by using the _s cruntime functions

Heh. These are no where near as safe as Microsoft would have you believe. For example, they don't prevent underuns, buffer overlap or transposed parameter issues.
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=260

Be careful with relying on non-standard functions. The standard ones aren't perfect but at least their behaviour is well understood and consistent across all compilers!
0
 
plqAuthor Commented:
thank you
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> thank you

You're very welcome.

If you need any more guidance regarding this please don't hesitate to post back here. The question may be closed but I am happy to still assist you here on this specific matter.
0
 
pgnatyukCommented:
evilrix has answered on your question http:/#a30913352.

I simply have 2 coins :)
If you call a function 1000 times, TCHAR workarea[1024]; is for sure better than
TCHAR* workarea = new TCHAR[1024] and delete [] workarea;

We are in C++, so we may talk about a class and an instance variable - a memory buffer, an array, that will allocate before these 1000 calls and released immediately after.
If you have a small function that you call 1000 times, maybe it is possible to make this function inline, add 'const' to your pointers, optimize the loops, etc.

Sorry for a pessimistic mood but even if you will fix 100 such functions it will not improve dramatically the application performance and the memory fragmentation. In this way you will get 3-5% compare to the initial version. But you are in risk to make new unexpected bugs. The real results you will get if you will bring new algorithms on the application level and, in the worst case, will use assembler on the low level.



A small idea that can be interesting for you is to replace TCHAR and all tchar functions with WCHAR; the project should be in Unicode - of course, if it is only for Windows.

 
0
 
phoffricCommented:
>> I just did a test on the program and the same variable only changed its address once in about 1000 calls. I don't know why it changed once.
     If (1) A calls F most of the time (and F has the auto variable array, TCHAR workarea[1024]), but sometimes (2) A calls B, and then B calls F, then in these two cases the workarea begins on different portions of the stack, so its address will be different. I'm not talking about security risk issues. Just plain vanilla compiler.
0
 
phoffricCommented:
As far as performace goes, is it necessary to memset workarea everytime? Or, does the function fill up the workarea in a way that you know what the structure is and how much good data there is.

If you have to memset, maybe you could do instead:
    TCHAR workarea[1024] = {0};
Maybe that is a slight improvement over memset. Time it and see.
0
 
plqAuthor Commented:
Yes that might be why. It is called from 6 distinct points in the program. I chose TCHAR many years ago as we need char and wchar support which I could control through configuration manager, so we had a build for each. TCHAR resolves to WCHAR of course when _UNICODE is defined, so it won't be any different in the exe.

0
 
evilrixSenior Software Engineer (Avast)Commented:
>> If you call a function 1000 times, TCHAR workarea[1024]; is for sure better than
Even better is to use a vector as the scratch area and pass it into the function by reference.

>>  maybe it is possible to make this function inline
Be very careful about explicit use of inline as it can actually make the compiler generate sub-optimal code! Generally, the compilers optimization will always to a better job than you ever can.
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=438

>> some OSs (Linux for example) can purposefully change the offset for stack based variables
I couldn't remember what the technique was called... had to search Google for it :)
http://en.wikipedia.org/wiki/Address_space_layout_randomization
0
 
pgnatyukCommented:
>>Generally, the compilers optimization will always to a better job than you ever can.
That's absolute true. :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.