Link to home
Start Free TrialLog in
Avatar of Gezna
Gezna

asked on

Crashing at malloc() after many iterations

I have written a C program than iterates hundreds of thousands of times.  There might be several million calls to malloc(), realloc() and free() after so many iterations.

There is no memory leak.  The program does NOT crash at realloc().  It only crashes at malloc().

The program runs perfectly for many hundrends of thousands of iterations, with every function getting called several times each iteration.  Then, for no reason I can discern, it crashes at some random malloc().

I've ony seen the crash occur near the beginning of a function call.  Probably this is because most, if not all, occurences of malloc() occur near the start of a function.
The crash does not occur at the same malloc() every time.  If I use the same random seed, the crash occurs at the same malloc(), and after exactly the same number of iterations.  However, if I adjust the random seed, the crash will occur at a probably different malloc() after a different number of iterations.

Any ideas on what causes this?  
Avatar of van_dy
van_dy

what system you are
running your program on?
If it is on some unix
system, then most probably
you are hitting the resource
limit. RLIMIT_DATA is the resource
on unix systems that determines
the maximum size of data segments
for a process. this data segment
includes the bss, initialised data
and heap memory(from where malloc
allocates space). it would be better
if you could post your code so that
we can decide if it is really this matter
or some possible bug.

hope this helps,
van_dy
Avatar of Gezna

ASKER

I'm running on Windows XP Professional and I'm using Borland C++ Builder X.

The code has many lines so posting them here probably is no good.

Here is the beginning of a function where the latest crash occured:

char * treeToString(struct node *t, char **str)
   {
   int i,len,tempD=0;
   struct node *aNode;
   char *tmp;
   char line[100];
   int max = 100;
   if(D) printf("treeToString...BEGIN\n");
   tmp = (char*)malloc(1);


The crash occurs at the last line.  If I run the program again with a different random seed, some other malloc in some other function at some different iteration will crash instead of this one.

This crash occurs after about 750,000 iterations of the main program.  This function here is probably called 5,000,000 times.  

I see no increase in the amount of memory beining used while this program is running, and I have 512 MB free with 1GB total RAM on my desktop.
Are you checking the returned value from malloc()?

Are you checking that you're requesting a reasonable sized chunk?

-----------------

I'd do this:

#define SomeReasonableLimit   50000

void * MyMalloc( int Len )
{
    void *;
    if( Len <= 0 || Len >= SomeReasonableLimit ) { fprintf( stderr, "bad param to malloc: %d\n", Len ); Len = SomeReasonableLimit; };
     p = malloc( Len );
     if( p == NULL ) fprintf( stderr, "NULL returned by malloc( %d )  \n", Len );

   return(p);
}

#define malloc  MyMalloc

Hi,

Your problem is interesting, but can you provide more information -

* What's the signal/exception (Segment Violation, Out of memory...)
* What platform are you on?
* Does it coredump? Did you try stepping through the core file to see if you could find anything
* Each iteration allocates memory and frees up ALL the memory allocated. Correct?
* Within each iteration, I presume malloc is called once at the beginning, multiple reallocs subsequently, and finally a free. Correct?
* Does your program's execution path change depending on the random seed? Would the seed affect the number and size of malloc/realloc/free calls?
* How have you verified that there is no memory leak?

If all else fails, you can try this:
Write a test program that mimics your program, but with simple logic - starting with malloc, realloc, free. Run this within a loop and see if that crashes as well. Start fleshing this out incrementally with more and more of your actual program's logic and test at each increment to see what triggers off the crash.
Regards
are you freeing the memory that you acquire
in each of your iterations? I dont know about
xp, but it must impose some resource limits on
processes as well. does the seed affect the
way in which your functions are called?
Crashes within malloc() are usually the result of memory arena corruption
(writing beyond the bounds of the allocated block).  You can usually isolate
these problems using debugging memory allocation library like Bounds Checker,
Purify, or DMalloc:  http://dmalloc.com

One question I have from the code fragment posted:  Why are you using
malloc() to allocate a single byte?  malloc(), calloc(), realloc(), and free() are
resource intensive routines.  If you are indeed allocating, reallocating and
freeing small pieces of memory hundreds of thousands of times, you may
find that your program is spending a large percentage of its time in the
memory allocation libraries rather than your program logic.

Avatar of Kent Olsen

> There is no memory leak.  The program does NOT crash at realloc().  It only crashes at malloc().

Interesting symptom, but not definitive.

Many implementations of memory managers do NOT reassign the block to a new address unless the size of the reservation changes.  Every malloc() is satisfied with a block that has a size of (hypothetically) 16*N.  The low-order bytes are the header, which is transparent to the application.  The remaining bytes are the user area.  *malloc(1)* actually reserves 32 bytes.  16 for the header and 16 for the application.  *realloc (malloc(1), 2)* may not move the buffer generated with malloc(1) because the user area of 16 bytes is within the same reservation.

But malloc() will always return a buffer when there is sufficient memory to satisfy the request.

I suspect that this combination of factors is why malloc() always generates the fault.


How many buffers can you have assigned at any given point in time?  And how much memory do yoy believe that this totals?


Kent
Avatar of Gezna

ASKER

Here's some more info:

* I always check to make sure malloc() doesn't return NULL.

* "Stopped at exception c0000005: access violation at 0x004048ee: write of address 0x00000008" is the exception.

* I do get something called the CPU view.  I see assembly language commands.  If I scroll up I get to a header which is probably the assembly language function name:  _internal_malloc.  Before that is _internal_free, a few other function names, and then there is "malloc(unsigned int):" which I think corresponds to the last line in the code sample above.  By the way, I'm sure that the last line above is the culprit because of printf statements.  The next two main function headers that appear in cpu view before "malloc(unsigned int)" are "operator new[]" and "std::bad_alloc::what:"

*each iteration does free up the vast majority of the allocated memory using free().  However, some small portion of the memory is reallocated.

*Every iteration executes in the same manner.  The purpose of the main program is to execute the sub functions many times to make sure they are robust.  The random seed doesn't affect much.  So the main program is a loop that goes like this: Randomly create a string of some random length that represents a First Order Logic Term->convert that string into a tree->convert that tree into a directed acyclic graph->free memory and repeat.

*I've verified no memory leak by observing the task manager.  The memory use is flat.  I built the main program incrementally and each time I made sure there was no memory leak.  I could work backwards and incrementally remove functions and see if this problem goes away, however I'm hoping this is an obvious C bug.  I'm new to C.  I am a Java programmer.

*random seed does not affect memory usage.
C++ Builder.  :)  A personal favorite!

Which version of the Builder are you using?  There are quite a few bugs in the basic V5.0, but the update to 5.5 fixes most of them.  6.0 is pretty reliable.  I don't have any feedback on BuilderX.

How big is your source program?  Is it small enough to effectively post?

Are you running a single/multi-threaded model?

Kent
Avatar of Gezna

ASKER

Okay I found out something important.

When I comment out the code that freed up the memory used by the directed acyclic graph (DAG) the problem goes away.  So I think I'm probably not freeing my memory up properly.  Probably I'm freeing a pointer that I later am trying to realloc.

When you free() the pointer, set it to NULL.

That won't solve the logic problem, but when you go to realloc() the pointer, it will assign a valid block.  You'll probably have a memory leak, but at least you'll run.


Kent
Another one that is really tough to find is when your stack overflows into your heap. You then get exactly the same kind of randomness for crashes asd you are seeing. Try using /STACK:5000 (or some other large value) in your linker parameters and see if the problem goes away.

Paul
Ok, next possibility is you may be overflowing one of your allocations.  Really easy to do, and the problem won't be detected until malloc() or free() try to manipulate the next block.


Try adding a "+ 10" to your malloc() parameter and see if the problem goes away.


You should be able to compile your program with a run-time selection to "test stack frame".  If the stack overflows, the program will abort with an appropriate message instead of corrupting the heap.  The program is just as dead, but you'll have an accurate diagnostic.


Kent
ASKER CERTIFIED SOLUTION
Avatar of Peter-Darton
Peter-Darton

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Peter's ideas are very good-- to summarize and extend them a bit, here's what a really good heap-debugging tool would do:

** allocate some space before and after each block

** make the size of that extra space user-controllable, up to several K just in case you have
a really wild pointer or struct acess that dosnt zap the immediately following bytes, but goes farther astray

** fill that extra space with some known pattern, or random pattern that is checksummed.

** also keep track of each block, where it was allocated from in the code, how many times you
tried to free() it.

** on each call to malloc or free() or even more often, scan the list of allocated and free blocks for consistency.

** On CPU's that can do it (x86), allocate each block in a hardware-protected segment of its own,
so it will have strict limit checks done by the hardware.

** Sime-related-- on each function entry, add code to set all the local variables to some really bad value that can't be a good address, all ones is a good value.   This finds unset pointer usage really quick.

------------
A long time ago I did a heap-checker like this for the X86 in rewal and protected mode.  It found dozens of errors in a program that was thought to be very stable!

grg99's right - the example I gave was fairly limited in scope.  Fortunately, there are now numerous heap-debugging tools available, some of which are good.

I would, unreservedly, recommend Purify - it's £"&%ing marvellous - it impressed me when I first used it around 1995, and I've yet to see anything do a better job.  It is, however, not cheap, so only viable if you do C/C++ as a full time job.  If you _do_ do this as a day-job, it is most cost effective tho', especially when integrated into one's build/test system.

Back to the problem at hand:  See if you can get a trial version and try it out for a few weeks - should be more than long enough to solve this issue.  (Assuming you're running on a supported architecture - if not, look at the GNU toolset)

There are other such tools "out there" that are aimed at the same issue, some of which do a good job (so I'm told).  I've not tried any myself (I've always used Purify), but this is not an unusual issue and hence lots of people have hit this before, and many have written tools to help sort it out, a number of which are freeware or GPL.

Writing a full heap-checker is a project in its own right, and making one that uses the processor's own memory management unit is distinctly non-trivial: grg99 - I'm impressed.
Avatar of Gezna

ASKER

Thanks for the input.  I'll try and get a trial version of Purify. BTW I'm a Master's student in Computer Science so I can't by it.  

 
I've always described Purify as God's gift to programmers.  It works extremely well (especially on Solaris).
It is expensive, 'tho.  I don't think you can even buy it alone anymore (it comes in some bundle of apps
from IBM now).   There are other debugging libraries that do a reasonable job for less bucks.  I already
mentioned dmalloc (above), but there is also BoundsChecker, MallocDebugger, Dr Watson.  etc.

Avatar of Gezna

ASKER

I'm trying Peter's custom my_malloc function.  The idea is to allocate 3 extra blocks large enough to store a "long" data type (size of 4).  Two of the extra blocks precede the actual originally desired requested memory, and one block goes after.  The first extra block stores the total amout of memory allocated (12 + whatever was requested) and the block immediately preceding and the block immediately following the requested block is just some arbitrary magic number that we write in to verify that we've never written outside our requested memory block.  The magic number you chose was 0x123456789.

Am I right so far?

void *my_malloc(size_t s)
{
  size_t actual_size = s|(sizeof(long)-1) + sizeof(long)*3;
  long *space = (long *)malloc(actual_size);
  if( !space ) return NULL;
  int index = actual_size/sizeof(long);
  space[0] = actual_size;
  space[index] = space[1] = 0x12345678;
  return &space[2];
}

The first statement confuses me.  Would this make more sense:
size_t actual_size = s+(sizeof(long)-1) + sizeof(long)*3;
?

On my machine when I printf the sizeof(long) I get 4, so I could also say this:
size_t actual_size = s+3 + sizeof(long)*3;


Avatar of Gezna

ASKER

Also, I think index should be defined as:

index = actual_size/sizeof(long)-1;
Otherwise, space[index] is out of bounds.
Avatar of Gezna

ASKER

I tried my formula and it isn't running.  It  throws an application defined exception.
Avatar of Gezna

ASKER

Okay I figured out why it threw an appliction defined exception: I misplaced #define my_malloc malloc
Avatar of Gezna

ASKER

Here's my_realloc.  Please verify that it makes sense:

void *my_realloc(void *p, size_t s)
{
 int index;
 size_t actual_size = s+(sizeof(long)-1) + sizeof(long)*3;
 long *space = (long *)realloc(p,actual_size);
 if( !space ) return NULL;
 index = actual_size/sizeof(long)-1;
 space[0] = actual_size;
 space[index] = space[1] = 0x12345678;
 return &space[2];
}
Your analysis of what I had in mind is spot-on.

The reason the "actual_size" formula says "s|(sizeof(long)-1)" is well motivated, but wrong - I screwed up.
What it _should_ say is
size_t actual_size = ((s-1)|(sizeof(long)-1)) + 1 + sizeof(long)*3;
I think I've got the maths right now...

The purpose behind it is to take "s", and ensure that it is rounded up to a nice multiple of sizeof(long), then add the additional space requirements of sizeof(long)*3.
If you're always allocating memory in nice multiple of sizeof(long) then this isn't necessary, but in general it will be, as a lot of architectures really don't like you treating unaligned pointers as anything other than char*.


Your my_malloc method looks good, except it doesn't have any checking in there - the idea is that we check that our "magic number" at the start and end hasn't been corrupted, and if it has then we throw a tantrum.
You should probably add anti-NULL protection as well - some reallocs allow you to pass in a NULL pointer, in which case they work just like malloc, so the first thing should be "if p is NULL, return my_malloc(s)"
The second thing should be to adjust p backwards by two sizeof(long) - remember that the last thing we do in the my_malloc is not to return the actual pointer malloc returned, but to return the pointer a bit forward of that.
Once we've recovered the original value of "space" we had in the my_malloc method, we can then check that our first magic number is intact, i.e. that space[1] is 0x12345678.
If that's ok, we go on to work out what the index of our last magic number is (that's where space[0] comes in - we stored the length we allocated in my_malloc there, so we can work out what value index should be) and check that.
If that's ok _then_ we go and do the actual call to realloc etc.

Note: In the realloc method, it's probably a good idea to set space[index] to 0 just before you call realloc, and then once you've realloced you'll need to recalculate what index should be now, and reinstate your magic number in its new location.
Avatar of Gezna

ASKER

Okay got I'll do that.  What does the "|" operator do?  I've only seen it used as an "or" operator.
| is a bitwise OR
|| is a logical (boolean) OR

You know that, as far as C is concerned, any non-zero value is considered to be logically "true", and any zero value is "false", e.g.
int c = 3;
if( c )
  printf("true\n");
else
  printf("false\n");
will print "true", whereas replacing "c = 4" with "c = 0" makes it say "false"

So, if we complicate matters a little using a logical OR, we could do
int c = 3;
int d = 0;
if( c || d )
  printf("true\n");
else
  printf("false\n");
that'll print "true" as long as either c or d is not 0.

We can assign the results of these logical conditions to int variables, e.g.
int c = 3;
int d = 0;
int e = c || d;
if( e )
  printf("true\n");
else
  printf("false\n");
That'll do exactly as the previous example did.

The fun comes when we start assigning the results of these logical conditions to int variables, e.g.
int c = 3;
int d = 5;
int e = c || d;
printf("e is %d\n",e);
Now, as I understand it, the result of this is implementation-specific, but e will NOT be zero unless both c and d are.
(e may well turn out to be 1 in most implementations)

However, that's all fine & dandy, but it tells you nothing about the | operator.
This one is simple - it works on each binary digit (bit) in the numbers on which it operates.
int c = 3;
int d = 5;
int e = c | d
printf("e is %d\n",e);
This is not implementation specific, this will print 7.
c, in binary, is 00000011
d, in binary, is 00000101
e, therefore is the result of ORing those two values together on each bit, hence
e, in binary, is 00000111
which is 7.


Now, the reason for all that fiddling is because we know that "long" will be a power of 2, and if you subtract 1 from a power of 2 you get a binary number of the form 00000....1111.... - lots of zeros, followed by one or more ones.
If you bitwise-OR any other number with that, it will end up with all the bits set at the bottom end (the least-signficant end).
If you then add one to the result, you'll have rounded up to a nice multiple of that power-of-two we started with.

It's a faster way of doing modulo maths (the % operator) that just takes advantage of the fact that most CPUs can do logical OR operations and increment and decrement operations a bit faster than they can do generic maths, but it only works on computer-friendly numbers.
An alternative would be something like
size_t actual_size = ((s-1+(sizeof(long)))/(sizeof(long)))*(sizeof(long)) + sizeof(long)*3;
Explanation: Take s, add one less than "number", divide by it (rounding down), multiply by it - now it is a nice multiple of "number" rounded up.  Then add 3x "number".
This is just to ensure that, if you ask for 5 bytes, the memory (assuming sizeof(long) is 4), ends up as follows
[0,1,2,3] = (long)5
[4,5,6,7] = (long)0x12345678
[8,9,10,11,12] = the 5 bytes originally asked for
[13,14,15] = additional padding required to ensure the next long is long-aligned
[16,17,18,19] = (long)0x12345678

There are a million and one ways of rounding numbers up - I just used the one I'm more familiar with (and still got it wrong 1st time around!).  Feel free to replace that code with a version you'll understand in 3 weeks time (hint: comments are useful...)
Avatar of Gezna

ASKER

Thank you very much Peter.  I'm starting to like C now thanks to your help.  However, something is not right about these new memory functions.  For some reason I get an exception about reading at memory location 0xfffffff8.  It seems as if "space" has this value on line 5 of the function my_free.  Line 5 is index = space[0]/sizeof(long)-1;

I print out the "malloc," "realloc," and "free" each time the respective function is called and I count that the program calls malloc 6 times, realloc 2 times, malloc once, realloc twice, malloc once, realloc once, free once, and then crahes on the second call to free on line 5.

I've pasted my memory functions.  If everything looks okay here, then I have a problem elsewhere in my code.

void * my_malloc(size_t s)
{
 size_t actual_size;
 long *space;
 int index;
 printf("malloc\n");
 actual_size = ((s+sizeof(long)-1)/sizeof(long))*sizeof(long) + sizeof(long)*3;//trying to round up to nearest multiple of sizeof(long) and then add 3 sizeof(long).
 space = (long*)malloc(actual_size);
 index = actual_size/sizeof(long)-1;
 if( space == NULL ) {fprintf( stderr, "NULL returned by malloc( %d )  \n", s );input();exit(1);}
 space[0] = actual_size;
 space[index] = space[1] = magicNumber;
 return &space[2];
}

void my_free(void *memory)
{

 long *space = (long*)memory;
 int index;
 printf("free\n");
 space-=2;
 index = space[0]/sizeof(long)-1;
 if( space[1]!=magicNumber || space[index]!=magicNumber )
  {
    printf("Memory error in free().  Probably wrote outside of memory block!\n");
    printf("Preceding block value: %d\nFollowing Block value: %d\n",space[1],space[index]);
    input();//pauses program until user hits any key
    abort();
  }
 free( space );
}
void *my_realloc(void *p, size_t s)
{
 int index;
 size_t actual_size = ((s+sizeof(long)-1)/sizeof(long))*sizeof(long) + sizeof(long)*3;//trying to round up to nearest multiple of sizeof(long); then add 3 sizeof(long).
 long *space = (long*) p;
 printf("realloc\n");
 space -= 2;
 //space points to the beginning of the real block.
 //let's do some error checking.
 index = space[0]/sizeof(long)-1;
 //both space[1] and space[index] should have magic numbers.
 if( space[1]!=magicNumber || space[index]!=magicNumber )
  {
    printf("Memory error in realloc().  Probably wrote outside of memory block!\n");
    printf("Preceding block value: %d\nFollowing Block value: %d\n",space[1],space[index]);
    input();
    abort();
  }
  //if we get here, then the magicNumbers are intact.
  space = (long *)realloc(space,actual_size);
  if( !space ) {fprintf( stderr, "NULL returned by realloc( %d )  \n", s );input();exit(1);}
  index = actual_size/sizeof(long)-1;
  space[0] = actual_size;
  space[index] = space[1] = magicNumber;
  return &space[2];
}
Oh dear "starting to like C" - it's a slippery slope you know - soon there will be no hope for you ;-)

Ok, the my_malloc looks ok.  First (long) is the rounded-up memory size, second and last are the magic numbers, the rest is the memory we return.

The my_free looks fine as well - good use of abort() by the way, very appropriate in this circumstance.  you might want to fprintf to stderr instead of stdout, but I think we're splitting hairs here.

The my_realloc could probably do with over-writing space[index] with something other than the magic number just before the call to realloc.  This won't sort out the problem in hand, but when using "magic numbers" it's important not to leave them lying around when they're not required, otherwise you increase the chance of memory corruption not being detected (not by much, but...)

All in all, those methods look fine to me, except the comments.
First, this is C, not C++.  In C, comments are of the form /* comment */
A lot of C compilers do not understand the C++ style "to end of line" // comment
If you're writing C, use C comments and don't use C++ // comments.
(But this is not going to be the cause of your problem here)
Also, when writing C, you have to do all your variable declarations between an opening { and the first real line of code.  C++ is much friendlier and allows variable declarations whereever you like, but C isn't friendly like that.
I'm guessing that you're using a C++ compiler, not a pure ANSI-C compiler.
(But this isn't going to be the cause of your problem either!)


I think the clue here is in the address "memory location 0xfffffff8".
If you were to add two lots of sizeof(long) to that number, you would get zero, yes?
I think you're calling my_free(NULL) somewhere in your code.
my_free is doing the following:
 long *space = (long*)NULL;
 space-=2;
 // now space is (0 - sizeof(long)*2) = 0xfffffff8

Now, some implementations of free() allow you to pass in a NULL pointer and simply do nothing.  In C++, it is officially legal to call delete on a NULL, hence is ok on all compilers.
My personal opinion is that, in C, if you free(NULL) you're asking for trouble -  in code I write, I always do
  if( pointerThatMayNeedFreeing ) free(pointerThatMayNeedFreeing);
instead of simply
  free(pointerThatMayNeedFreeing);
however I note that, on both Solaris 2 (@work) and Linux (@home), free() and realloc() are defined as explicitly permitting a NULL pointer to be provided.

So, I'd guess that I'm a bit behind the times and free() and realloc() are both allowed to be given NULL pointers, so we better ensure our my_free() and my_realloc() can do likewise.

You'll need to add a line into my_free that says
  if( !memory ) return;
before it does anything significant, e.g. just after it's printf'ed "free".

Similarly, in my_realloc, you want to have a line that goes
  if( !p ) return my_malloc(s);


Do that and our wrapper functions should behave just like the official malloc, free and realloc.
Note: Are you using calloc() in your code?  If so, you'll need a wrapper around that as well - just make it call my_malloc.


Note: If you keep this code for over a week, I'd advise you put the "round up to nearest multiple" code into a function of its own, e.g.
size_t round_up_to_multiple(size_t size, size_t multiple)
{
  return ((size+multiple-1)/multiple)*multiple;
}
 size_t actual_size = round_up_to_multiple(s,sizeof(long)) + sizeof(long)*3;

Rule of thumb: If something is used once, consider putting it in a function of its own.  If something is used twice it damn well should be in its own function.  _Never_ have duplicated code.
You might like to apply that same logic to the "is this memory still intact" check as well ;-)
Another possible enhancement: add another pointer-sized field where you keep  the address of the previously allocated block.  That way you'll have a linked-list of all the allocated blocks which gives you the ability (carefully) to walk the list at any time so you can: (1) Display the number of allocated blocks and their sizes.  (2) Check them all for good magic numbers.  Comes in VERY handy sometimes.

Another suggestion: Instead of all the subscripting to get to the magic fields, define a pair of structs for the header and trailer info, something like:

typedef   int32    Magic;
#define  MagicNumber   0x12345678

typedef struct {
        Magic  HdrPw1;   /* to validate header base */
        void * Prev;         /* points to previous malloc()'ed block */
         uint32  BlockSize; /* Tells us where the end of block is */
         void * AllocatedAt; /* tells us where this was malloced() from in the code */
        Magic  HdrPw2;   /*  validates end of header block */
   } HdrRec;

typedef struct {
    Magic TrailerPw1;  /* validates trailer */
    void * PtrToBegin;  /* ptr to block start */
    Magic  TrailerPw2; /* just to be sure */
}  TrlRec;

--------------------------------

This makes the code a LOT clearer, as you're using symbolic names instead of integer offsets.

Just my 2 cents...

The linked-list stuff, agreed.
This would then allow _any_ code to call upon the "have we corrupted our memory yet" checking function at any time.  Would be nice.
Structures, yes, it does make the code clearer (using long arrays is a quick & dirty solution, but it _was_ intended as a quick fix just to diagnose a single bug, rather than as a generic programming tool).

As a generic programming tool this definately the way to go.
I just hope the existing wrappers let Gezna find the (original) bug before we have to go that far!


I'm not sure how one would go about using the "AllocatedAt" field tho' - I've often wanted to do just this in C code, but I've never found a way to walk back up the stack - I've had to settle for using the __FILE__ and __LINE__ macros, and storing a char * (__FILE__) and integer (__LINE__) which have to be passed in by the caller.

Whilst we're on the subject of generic extentions to this idea, one thing I'd do if I needed this long-term is to let the caller specify an "object type" string, so that each module that allocated/deallocated memory could say "this is for an XXX object" etc, e.g. my_malloc(sizeof(struct mystructure), "MyStructure main storage", __FILE__, __LINE__) - it'd be easier on the brain than just filenames and line numbers, especially if a single file uses malloc for more than one purpose.
Oh yes, and put the __FILE__ and __LINE__ into a #define macro so the client code doesn't need to remember to do it...
And perhaps the ability to have different magic numbers used by different sections of the code, thus allowing us to detect when one bit of code gets memcpy'ed over another...


However, if we introduce too much complexity into these malloc wrappers, we're in danger of taking more time getting this "tool" working than we would take hunting for the bug using other methods (it's all highly instructive stuff tho' ;-)
Yep, the perfect is the enemy of the good-enough.

But if you have a BIG program you may eventually need all the bells and whistles.  I stared out with a simple 0x12345678 at each end of the block and ended up with all of the above.

It IS hard to get the calling address, you can do it sometimes by indexing a few dwords past your last local variable, but that is of course very compiler-dependent and compile-flags dependent.

And it is nice to add a text message tag so you know what kind of block each one is.

BTW  doesnt realloc() have to copy the payload to the new block?

Another fribble:  have free() write 0xFFFF into the freed block, just to catch cases where you use the pointer after it is freed.

Yet Another (hazardous) feature:  have your malloc return the pointer into a parameter, not as the function result, so malloc() can save away the address of that variable, so free() can set it to NULL!    Hazardous of course as you may malloc() into a local variable that may be out of scope when you do the free().  But if you know this is a restriction, you can avoid doing so.






In every implementation of realloc() I've met, the data gets copied by realloc(), so all you have to do is ensure that your end marker is at the "new" end of the data instead of where it used to end.

Very good point about free() tho' - I'd completely overlooked that.
A good idea, in the my_free() method would be to put
  memset(space,0xFF,space[0]);
just before the call to free() - that way no data will survive being freed.

Might be a good idea to do likewise for memory that's just been allocated (maybe set that to 0xEE instead so we can tell the difference when we next explode).  I know some malloc()s zero memory before returning it, but I wouldn't advise relying on it.  Doing that should help guard against using uninitialised memory.
Avatar of Gezna

ASKER

Peter, you were right on the free(NULL).  I found a little redundency where I freed something twice.  The first time I set the pointer to NULL.  So that's one thing fixed, but I doubt it was the cause of my original problem, though.

If these memory routines are correct then I'm definitely writing outside the allocated block.  I have to track down where/why this is happening.  In free() the preceding block is -1, and the following block is 20 rather than the magicNumber.
Good stuff - this is, after all, progress.

I think the next stage is to put the "is this block of memory a valid, uncorrupted, block of memory with intact magic numbers" check into a separate function that your main code can call.  e.g.

int my_memory_is_intact( void *memory )
{
  long *space = (long*)memory;
  int index;
  space-=2;
  index = space[0]/sizeof(long)-1;
  if( space[1]!=magicNumber || space[index]!=magicNumber )
    return 0;
  else
    return 1;
}
#define CHECK_MEMORY_BLOCK(p) if( p & !my_memory_is_intact(p) ) { fprintf(stderr,"Memory corruption found %s:%u",__FILE__,__LINE__); fflush(stderr); abort(); }


Then you call this method (using this macro) from your main program call at the entry and exit of each of your functions on all data the function uses.  In the case of allocated memory being used to store pointers to other allocated memory, check the outer "layer" first, then use it to determine the inner pointers and then check those etc.
i.e. each function should check any data it uses for "intactness" both before it does anything, and just before it returns.

Ok, so this will slow your program down a lot, but it might well then tell you where the problem is

But only if the function that's at fault is only corrupting it's "own" memory.  If the function that's causing all these problems is causing damage elsewhere then we'll need to take grg99's advice and go the whole-hog on this wrapper concept - linked lists & everything.
Avatar of Gezna

ASKER

Sounds good to me.  I deeply appreciate your help.

I'm going to award the points now, and open another question when (maybe if?) I need further help on this.  I think you've put enough useful info out there to earn those points.  If it were $500, maybe I'd keep stringing you along a bit more!  Keep a look out for me and my next question which will be for 500 points in the C programming section.  It may not get posted until tomorrow, or if I'm lucky and solve this, it may not get posted ever!  Let's hope!