Solved

Crashing at malloc() after many iterations

Posted on 2004-11-01
1,331 Views
Last Modified: 2013-12-14
I have written a C program than iterates hundreds of thousands of times.  There might be several million calls to malloc(), realloc() and free() after so many iterations.

There is no memory leak.  The program does NOT crash at realloc().  It only crashes at malloc().

The program runs perfectly for many hundrends of thousands of iterations, with every function getting called several times each iteration.  Then, for no reason I can discern, it crashes at some random malloc().

I've ony seen the crash occur near the beginning of a function call.  Probably this is because most, if not all, occurences of malloc() occur near the start of a function.
The crash does not occur at the same malloc() every time.  If I use the same random seed, the crash occurs at the same malloc(), and after exactly the same number of iterations.  However, if I adjust the random seed, the crash will occur at a probably different malloc() after a different number of iterations.

Any ideas on what causes this?  
0
Question by:Gezna
    36 Comments
     
    LVL 5

    Expert Comment

    by:van_dy
    what system you are
    running your program on?
    If it is on some unix
    system, then most probably
    you are hitting the resource
    limit. RLIMIT_DATA is the resource
    on unix systems that determines
    the maximum size of data segments
    for a process. this data segment
    includes the bss, initialised data
    and heap memory(from where malloc
    allocates space). it would be better
    if you could post your code so that
    we can decide if it is really this matter
    or some possible bug.

    hope this helps,
    van_dy
    0
     

    Author Comment

    by:Gezna
    I'm running on Windows XP Professional and I'm using Borland C++ Builder X.

    The code has many lines so posting them here probably is no good.

    Here is the beginning of a function where the latest crash occured:

    char * treeToString(struct node *t, char **str)
       {
       int i,len,tempD=0;
       struct node *aNode;
       char *tmp;
       char line[100];
       int max = 100;
       if(D) printf("treeToString...BEGIN\n");
       tmp = (char*)malloc(1);


    The crash occurs at the last line.  If I run the program again with a different random seed, some other malloc in some other function at some different iteration will crash instead of this one.

    This crash occurs after about 750,000 iterations of the main program.  This function here is probably called 5,000,000 times.  

    I see no increase in the amount of memory beining used while this program is running, and I have 512 MB free with 1GB total RAM on my desktop.
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Are you checking the returned value from malloc()?

    Are you checking that you're requesting a reasonable sized chunk?

    -----------------

    I'd do this:

    #define SomeReasonableLimit   50000

    void * MyMalloc( int Len )
    {
        void *;
        if( Len <= 0 || Len >= SomeReasonableLimit ) { fprintf( stderr, "bad param to malloc: %d\n", Len ); Len = SomeReasonableLimit; };
         p = malloc( Len );
         if( p == NULL ) fprintf( stderr, "NULL returned by malloc( %d )  \n", Len );

       return(p);
    }

    #define malloc  MyMalloc

    0
     
    LVL 7

    Expert Comment

    by:ravs120499
    Hi,

    Your problem is interesting, but can you provide more information -

    * What's the signal/exception (Segment Violation, Out of memory...)
    * What platform are you on?
    * Does it coredump? Did you try stepping through the core file to see if you could find anything
    * Each iteration allocates memory and frees up ALL the memory allocated. Correct?
    * Within each iteration, I presume malloc is called once at the beginning, multiple reallocs subsequently, and finally a free. Correct?
    * Does your program's execution path change depending on the random seed? Would the seed affect the number and size of malloc/realloc/free calls?
    * How have you verified that there is no memory leak?

    If all else fails, you can try this:
    Write a test program that mimics your program, but with simple logic - starting with malloc, realloc, free. Run this within a loop and see if that crashes as well. Start fleshing this out incrementally with more and more of your actual program's logic and test at each increment to see what triggers off the crash.
    Regards
    0
     
    LVL 5

    Expert Comment

    by:van_dy
    are you freeing the memory that you acquire
    in each of your iterations? I dont know about
    xp, but it must impose some resource limits on
    processes as well. does the seed affect the
    way in which your functions are called?
    0
     
    LVL 23

    Expert Comment

    by:brettmjohnson
    Crashes within malloc() are usually the result of memory arena corruption
    (writing beyond the bounds of the allocated block).  You can usually isolate
    these problems using debugging memory allocation library like Bounds Checker,
    Purify, or DMalloc:  http://dmalloc.com

    One question I have from the code fragment posted:  Why are you using
    malloc() to allocate a single byte?  malloc(), calloc(), realloc(), and free() are
    resource intensive routines.  If you are indeed allocating, reallocating and
    freeing small pieces of memory hundreds of thousands of times, you may
    find that your program is spending a large percentage of its time in the
    memory allocation libraries rather than your program logic.

    0
     
    LVL 45

    Expert Comment

    by:Kdo

    > There is no memory leak.  The program does NOT crash at realloc().  It only crashes at malloc().

    Interesting symptom, but not definitive.

    Many implementations of memory managers do NOT reassign the block to a new address unless the size of the reservation changes.  Every malloc() is satisfied with a block that has a size of (hypothetically) 16*N.  The low-order bytes are the header, which is transparent to the application.  The remaining bytes are the user area.  *malloc(1)* actually reserves 32 bytes.  16 for the header and 16 for the application.  *realloc (malloc(1), 2)* may not move the buffer generated with malloc(1) because the user area of 16 bytes is within the same reservation.

    But malloc() will always return a buffer when there is sufficient memory to satisfy the request.

    I suspect that this combination of factors is why malloc() always generates the fault.


    How many buffers can you have assigned at any given point in time?  And how much memory do yoy believe that this totals?


    Kent
    0
     

    Author Comment

    by:Gezna
    Here's some more info:

    * I always check to make sure malloc() doesn't return NULL.

    * "Stopped at exception c0000005: access violation at 0x004048ee: write of address 0x00000008" is the exception.

    * I do get something called the CPU view.  I see assembly language commands.  If I scroll up I get to a header which is probably the assembly language function name:  _internal_malloc.  Before that is _internal_free, a few other function names, and then there is "malloc(unsigned int):" which I think corresponds to the last line in the code sample above.  By the way, I'm sure that the last line above is the culprit because of printf statements.  The next two main function headers that appear in cpu view before "malloc(unsigned int)" are "operator new[]" and "std::bad_alloc::what:"

    *each iteration does free up the vast majority of the allocated memory using free().  However, some small portion of the memory is reallocated.

    *Every iteration executes in the same manner.  The purpose of the main program is to execute the sub functions many times to make sure they are robust.  The random seed doesn't affect much.  So the main program is a loop that goes like this: Randomly create a string of some random length that represents a First Order Logic Term->convert that string into a tree->convert that tree into a directed acyclic graph->free memory and repeat.

    *I've verified no memory leak by observing the task manager.  The memory use is flat.  I built the main program incrementally and each time I made sure there was no memory leak.  I could work backwards and incrementally remove functions and see if this problem goes away, however I'm hoping this is an obvious C bug.  I'm new to C.  I am a Java programmer.

    *random seed does not affect memory usage.
    0
     
    LVL 45

    Expert Comment

    by:Kdo
    C++ Builder.  :)  A personal favorite!

    Which version of the Builder are you using?  There are quite a few bugs in the basic V5.0, but the update to 5.5 fixes most of them.  6.0 is pretty reliable.  I don't have any feedback on BuilderX.

    How big is your source program?  Is it small enough to effectively post?

    Are you running a single/multi-threaded model?

    Kent
    0
     

    Author Comment

    by:Gezna
    Okay I found out something important.

    When I comment out the code that freed up the memory used by the directed acyclic graph (DAG) the problem goes away.  So I think I'm probably not freeing my memory up properly.  Probably I'm freeing a pointer that I later am trying to realloc.
    0
     
    LVL 45

    Expert Comment

    by:Kdo

    When you free() the pointer, set it to NULL.

    That won't solve the logic problem, but when you go to realloc() the pointer, it will assign a valid block.  You'll probably have a memory leak, but at least you'll run.


    Kent
    0
     
    LVL 16

    Expert Comment

    by:PaulCaswell
    Another one that is really tough to find is when your stack overflows into your heap. You then get exactly the same kind of randomness for crashes asd you are seeing. Try using /STACK:5000 (or some other large value) in your linker parameters and see if the problem goes away.

    Paul
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Ok, next possibility is you may be overflowing one of your allocations.  Really easy to do, and the problem won't be detected until malloc() or free() try to manipulate the next block.


    Try adding a "+ 10" to your malloc() parameter and see if the problem goes away.

    0
     
    LVL 45

    Expert Comment

    by:Kdo

    You should be able to compile your program with a run-time selection to "test stack frame".  If the stack overflows, the program will abort with an appropriate message instead of corrupting the heap.  The program is just as dead, but you'll have an accurate diagnostic.


    Kent
    0
     
    LVL 2

    Accepted Solution

    by:
    Another technique you can use: "magic numbers"

    The concept is that you write your own method such as
    void *my_malloc(size_t s)
    {
      size_t actual_size = s|(sizeof(long)-1) + sizeof(long)*3;
      long *space = (long *)malloc(actual_size);
      if( !space ) return NULL;
      int index = actual_size/sizeof(long);
      space[0] = actual_size;
      space[index] = space[1] = 0x12345678;
      return &space[2];
    }
    as you can see, we're allocating extra memory beyond what was requested, space to store 0x12345678 either side of the memory we're returning, plus we remember how much we allocated.
    When you come to free things, you use your own free method
    void my_free(void *memory)
    {
      long *space = (long*)memory;
      space-=2;
      int index = space[0]/sizeof(long);
      if( space[1]!=0x12345678 || space[index]!=0x12345678 )
        abort();
      free( space );
    }
    You do similar things with realloc.

    Then you change your code so that you use these new methods isntead of malloc and free directly.

    This way, if your code is accidentally writing over the end of your memory, it will corrupt our 0x12345678 numbers (aka "magic numbers") at either end, and we can detect this when we free the memory.
    You could go one better and keep a linked-list of all the memory you've allocated and, before you allocate any more, go through and check that you've not corrupted any of those "magic numbers" every time you allocate memory or free memory, but if you do it'd be a good idea to also be able to record a string "tag" against each block of memory so that when your corruption-check shouts "corruption found!", it can tell you what it was being used for.

    Alternatively, use Purify - it's expensive, but it'd tell you exactly where your problem was right away (Purify implements all these checks - in effect, you get an ArrayBoundsWrite or ArrayBoundsRead exception, which is what you're missing from Java right now)

    Disclaimer: The method above are untried and untested and for illustrative purposes only - I've not even tried to compile them, let alone run them.  but you probably get the idea of what I'm suggesting.
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Peter's ideas are very good-- to summarize and extend them a bit, here's what a really good heap-debugging tool would do:

    ** allocate some space before and after each block

    ** make the size of that extra space user-controllable, up to several K just in case you have
    a really wild pointer or struct acess that dosnt zap the immediately following bytes, but goes farther astray

    ** fill that extra space with some known pattern, or random pattern that is checksummed.

    ** also keep track of each block, where it was allocated from in the code, how many times you
    tried to free() it.

    ** on each call to malloc or free() or even more often, scan the list of allocated and free blocks for consistency.

    ** On CPU's that can do it (x86), allocate each block in a hardware-protected segment of its own,
    so it will have strict limit checks done by the hardware.

    ** Sime-related-- on each function entry, add code to set all the local variables to some really bad value that can't be a good address, all ones is a good value.   This finds unset pointer usage really quick.

    ------------
    A long time ago I did a heap-checker like this for the X86 in rewal and protected mode.  It found dozens of errors in a program that was thought to be very stable!

    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    grg99's right - the example I gave was fairly limited in scope.  Fortunately, there are now numerous heap-debugging tools available, some of which are good.

    I would, unreservedly, recommend Purify - it's £"&%ing marvellous - it impressed me when I first used it around 1995, and I've yet to see anything do a better job.  It is, however, not cheap, so only viable if you do C/C++ as a full time job.  If you _do_ do this as a day-job, it is most cost effective tho', especially when integrated into one's build/test system.

    Back to the problem at hand:  See if you can get a trial version and try it out for a few weeks - should be more than long enough to solve this issue.  (Assuming you're running on a supported architecture - if not, look at the GNU toolset)

    There are other such tools "out there" that are aimed at the same issue, some of which do a good job (so I'm told).  I've not tried any myself (I've always used Purify), but this is not an unusual issue and hence lots of people have hit this before, and many have written tools to help sort it out, a number of which are freeware or GPL.

    Writing a full heap-checker is a project in its own right, and making one that uses the processor's own memory management unit is distinctly non-trivial: grg99 - I'm impressed.
    0
     

    Author Comment

    by:Gezna
    Thanks for the input.  I'll try and get a trial version of Purify. BTW I'm a Master's student in Computer Science so I can't by it.  

     
    0
     
    LVL 23

    Expert Comment

    by:brettmjohnson
    I've always described Purify as God's gift to programmers.  It works extremely well (especially on Solaris).
    It is expensive, 'tho.  I don't think you can even buy it alone anymore (it comes in some bundle of apps
    from IBM now).   There are other debugging libraries that do a reasonable job for less bucks.  I already
    mentioned dmalloc (above), but there is also BoundsChecker, MallocDebugger, Dr Watson.  etc.

    0
     

    Author Comment

    by:Gezna
    I'm trying Peter's custom my_malloc function.  The idea is to allocate 3 extra blocks large enough to store a "long" data type (size of 4).  Two of the extra blocks precede the actual originally desired requested memory, and one block goes after.  The first extra block stores the total amout of memory allocated (12 + whatever was requested) and the block immediately preceding and the block immediately following the requested block is just some arbitrary magic number that we write in to verify that we've never written outside our requested memory block.  The magic number you chose was 0x123456789.

    Am I right so far?

    void *my_malloc(size_t s)
    {
      size_t actual_size = s|(sizeof(long)-1) + sizeof(long)*3;
      long *space = (long *)malloc(actual_size);
      if( !space ) return NULL;
      int index = actual_size/sizeof(long);
      space[0] = actual_size;
      space[index] = space[1] = 0x12345678;
      return &space[2];
    }

    The first statement confuses me.  Would this make more sense:
    size_t actual_size = s+(sizeof(long)-1) + sizeof(long)*3;
    ?

    On my machine when I printf the sizeof(long) I get 4, so I could also say this:
    size_t actual_size = s+3 + sizeof(long)*3;


    0
     

    Author Comment

    by:Gezna
    Also, I think index should be defined as:

    index = actual_size/sizeof(long)-1;
    Otherwise, space[index] is out of bounds.
    0
     

    Author Comment

    by:Gezna
    I tried my formula and it isn't running.  It  throws an application defined exception.
    0
     

    Author Comment

    by:Gezna
    Okay I figured out why it threw an appliction defined exception: I misplaced #define my_malloc malloc
    0
     

    Author Comment

    by:Gezna
    Here's my_realloc.  Please verify that it makes sense:

    void *my_realloc(void *p, size_t s)
    {
     int index;
     size_t actual_size = s+(sizeof(long)-1) + sizeof(long)*3;
     long *space = (long *)realloc(p,actual_size);
     if( !space ) return NULL;
     index = actual_size/sizeof(long)-1;
     space[0] = actual_size;
     space[index] = space[1] = 0x12345678;
     return &space[2];
    }
    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    Your analysis of what I had in mind is spot-on.

    The reason the "actual_size" formula says "s|(sizeof(long)-1)" is well motivated, but wrong - I screwed up.
    What it _should_ say is
    size_t actual_size = ((s-1)|(sizeof(long)-1)) + 1 + sizeof(long)*3;
    I think I've got the maths right now...

    The purpose behind it is to take "s", and ensure that it is rounded up to a nice multiple of sizeof(long), then add the additional space requirements of sizeof(long)*3.
    If you're always allocating memory in nice multiple of sizeof(long) then this isn't necessary, but in general it will be, as a lot of architectures really don't like you treating unaligned pointers as anything other than char*.


    Your my_malloc method looks good, except it doesn't have any checking in there - the idea is that we check that our "magic number" at the start and end hasn't been corrupted, and if it has then we throw a tantrum.
    You should probably add anti-NULL protection as well - some reallocs allow you to pass in a NULL pointer, in which case they work just like malloc, so the first thing should be "if p is NULL, return my_malloc(s)"
    The second thing should be to adjust p backwards by two sizeof(long) - remember that the last thing we do in the my_malloc is not to return the actual pointer malloc returned, but to return the pointer a bit forward of that.
    Once we've recovered the original value of "space" we had in the my_malloc method, we can then check that our first magic number is intact, i.e. that space[1] is 0x12345678.
    If that's ok, we go on to work out what the index of our last magic number is (that's where space[0] comes in - we stored the length we allocated in my_malloc there, so we can work out what value index should be) and check that.
    If that's ok _then_ we go and do the actual call to realloc etc.

    Note: In the realloc method, it's probably a good idea to set space[index] to 0 just before you call realloc, and then once you've realloced you'll need to recalculate what index should be now, and reinstate your magic number in its new location.
    0
     

    Author Comment

    by:Gezna
    Okay got I'll do that.  What does the "|" operator do?  I've only seen it used as an "or" operator.
    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    | is a bitwise OR
    || is a logical (boolean) OR

    You know that, as far as C is concerned, any non-zero value is considered to be logically "true", and any zero value is "false", e.g.
    int c = 3;
    if( c )
      printf("true\n");
    else
      printf("false\n");
    will print "true", whereas replacing "c = 4" with "c = 0" makes it say "false"

    So, if we complicate matters a little using a logical OR, we could do
    int c = 3;
    int d = 0;
    if( c || d )
      printf("true\n");
    else
      printf("false\n");
    that'll print "true" as long as either c or d is not 0.

    We can assign the results of these logical conditions to int variables, e.g.
    int c = 3;
    int d = 0;
    int e = c || d;
    if( e )
      printf("true\n");
    else
      printf("false\n");
    That'll do exactly as the previous example did.

    The fun comes when we start assigning the results of these logical conditions to int variables, e.g.
    int c = 3;
    int d = 5;
    int e = c || d;
    printf("e is %d\n",e);
    Now, as I understand it, the result of this is implementation-specific, but e will NOT be zero unless both c and d are.
    (e may well turn out to be 1 in most implementations)

    However, that's all fine & dandy, but it tells you nothing about the | operator.
    This one is simple - it works on each binary digit (bit) in the numbers on which it operates.
    int c = 3;
    int d = 5;
    int e = c | d
    printf("e is %d\n",e);
    This is not implementation specific, this will print 7.
    c, in binary, is 00000011
    d, in binary, is 00000101
    e, therefore is the result of ORing those two values together on each bit, hence
    e, in binary, is 00000111
    which is 7.


    Now, the reason for all that fiddling is because we know that "long" will be a power of 2, and if you subtract 1 from a power of 2 you get a binary number of the form 00000....1111.... - lots of zeros, followed by one or more ones.
    If you bitwise-OR any other number with that, it will end up with all the bits set at the bottom end (the least-signficant end).
    If you then add one to the result, you'll have rounded up to a nice multiple of that power-of-two we started with.

    It's a faster way of doing modulo maths (the % operator) that just takes advantage of the fact that most CPUs can do logical OR operations and increment and decrement operations a bit faster than they can do generic maths, but it only works on computer-friendly numbers.
    An alternative would be something like
    size_t actual_size = ((s-1+(sizeof(long)))/(sizeof(long)))*(sizeof(long)) + sizeof(long)*3;
    Explanation: Take s, add one less than "number", divide by it (rounding down), multiply by it - now it is a nice multiple of "number" rounded up.  Then add 3x "number".
    This is just to ensure that, if you ask for 5 bytes, the memory (assuming sizeof(long) is 4), ends up as follows
    [0,1,2,3] = (long)5
    [4,5,6,7] = (long)0x12345678
    [8,9,10,11,12] = the 5 bytes originally asked for
    [13,14,15] = additional padding required to ensure the next long is long-aligned
    [16,17,18,19] = (long)0x12345678

    There are a million and one ways of rounding numbers up - I just used the one I'm more familiar with (and still got it wrong 1st time around!).  Feel free to replace that code with a version you'll understand in 3 weeks time (hint: comments are useful...)
    0
     

    Author Comment

    by:Gezna
    Thank you very much Peter.  I'm starting to like C now thanks to your help.  However, something is not right about these new memory functions.  For some reason I get an exception about reading at memory location 0xfffffff8.  It seems as if "space" has this value on line 5 of the function my_free.  Line 5 is index = space[0]/sizeof(long)-1;

    I print out the "malloc," "realloc," and "free" each time the respective function is called and I count that the program calls malloc 6 times, realloc 2 times, malloc once, realloc twice, malloc once, realloc once, free once, and then crahes on the second call to free on line 5.

    I've pasted my memory functions.  If everything looks okay here, then I have a problem elsewhere in my code.

    void * my_malloc(size_t s)
    {
     size_t actual_size;
     long *space;
     int index;
     printf("malloc\n");
     actual_size = ((s+sizeof(long)-1)/sizeof(long))*sizeof(long) + sizeof(long)*3;//trying to round up to nearest multiple of sizeof(long) and then add 3 sizeof(long).
     space = (long*)malloc(actual_size);
     index = actual_size/sizeof(long)-1;
     if( space == NULL ) {fprintf( stderr, "NULL returned by malloc( %d )  \n", s );input();exit(1);}
     space[0] = actual_size;
     space[index] = space[1] = magicNumber;
     return &space[2];
    }

    void my_free(void *memory)
    {

     long *space = (long*)memory;
     int index;
     printf("free\n");
     space-=2;
     index = space[0]/sizeof(long)-1;
     if( space[1]!=magicNumber || space[index]!=magicNumber )
      {
        printf("Memory error in free().  Probably wrote outside of memory block!\n");
        printf("Preceding block value: %d\nFollowing Block value: %d\n",space[1],space[index]);
        input();//pauses program until user hits any key
        abort();
      }
     free( space );
    }
    void *my_realloc(void *p, size_t s)
    {
     int index;
     size_t actual_size = ((s+sizeof(long)-1)/sizeof(long))*sizeof(long) + sizeof(long)*3;//trying to round up to nearest multiple of sizeof(long); then add 3 sizeof(long).
     long *space = (long*) p;
     printf("realloc\n");
     space -= 2;
     //space points to the beginning of the real block.
     //let's do some error checking.
     index = space[0]/sizeof(long)-1;
     //both space[1] and space[index] should have magic numbers.
     if( space[1]!=magicNumber || space[index]!=magicNumber )
      {
        printf("Memory error in realloc().  Probably wrote outside of memory block!\n");
        printf("Preceding block value: %d\nFollowing Block value: %d\n",space[1],space[index]);
        input();
        abort();
      }
      //if we get here, then the magicNumbers are intact.
      space = (long *)realloc(space,actual_size);
      if( !space ) {fprintf( stderr, "NULL returned by realloc( %d )  \n", s );input();exit(1);}
      index = actual_size/sizeof(long)-1;
      space[0] = actual_size;
      space[index] = space[1] = magicNumber;
      return &space[2];
    }
    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    Oh dear "starting to like C" - it's a slippery slope you know - soon there will be no hope for you ;-)

    Ok, the my_malloc looks ok.  First (long) is the rounded-up memory size, second and last are the magic numbers, the rest is the memory we return.

    The my_free looks fine as well - good use of abort() by the way, very appropriate in this circumstance.  you might want to fprintf to stderr instead of stdout, but I think we're splitting hairs here.

    The my_realloc could probably do with over-writing space[index] with something other than the magic number just before the call to realloc.  This won't sort out the problem in hand, but when using "magic numbers" it's important not to leave them lying around when they're not required, otherwise you increase the chance of memory corruption not being detected (not by much, but...)

    All in all, those methods look fine to me, except the comments.
    First, this is C, not C++.  In C, comments are of the form /* comment */
    A lot of C compilers do not understand the C++ style "to end of line" // comment
    If you're writing C, use C comments and don't use C++ // comments.
    (But this is not going to be the cause of your problem here)
    Also, when writing C, you have to do all your variable declarations between an opening { and the first real line of code.  C++ is much friendlier and allows variable declarations whereever you like, but C isn't friendly like that.
    I'm guessing that you're using a C++ compiler, not a pure ANSI-C compiler.
    (But this isn't going to be the cause of your problem either!)


    I think the clue here is in the address "memory location 0xfffffff8".
    If you were to add two lots of sizeof(long) to that number, you would get zero, yes?
    I think you're calling my_free(NULL) somewhere in your code.
    my_free is doing the following:
     long *space = (long*)NULL;
     space-=2;
     // now space is (0 - sizeof(long)*2) = 0xfffffff8

    Now, some implementations of free() allow you to pass in a NULL pointer and simply do nothing.  In C++, it is officially legal to call delete on a NULL, hence is ok on all compilers.
    My personal opinion is that, in C, if you free(NULL) you're asking for trouble -  in code I write, I always do
      if( pointerThatMayNeedFreeing ) free(pointerThatMayNeedFreeing);
    instead of simply
      free(pointerThatMayNeedFreeing);
    however I note that, on both Solaris 2 (@work) and Linux (@home), free() and realloc() are defined as explicitly permitting a NULL pointer to be provided.

    So, I'd guess that I'm a bit behind the times and free() and realloc() are both allowed to be given NULL pointers, so we better ensure our my_free() and my_realloc() can do likewise.

    You'll need to add a line into my_free that says
      if( !memory ) return;
    before it does anything significant, e.g. just after it's printf'ed "free".

    Similarly, in my_realloc, you want to have a line that goes
      if( !p ) return my_malloc(s);


    Do that and our wrapper functions should behave just like the official malloc, free and realloc.
    Note: Are you using calloc() in your code?  If so, you'll need a wrapper around that as well - just make it call my_malloc.


    Note: If you keep this code for over a week, I'd advise you put the "round up to nearest multiple" code into a function of its own, e.g.
    size_t round_up_to_multiple(size_t size, size_t multiple)
    {
      return ((size+multiple-1)/multiple)*multiple;
    }
     size_t actual_size = round_up_to_multiple(s,sizeof(long)) + sizeof(long)*3;

    Rule of thumb: If something is used once, consider putting it in a function of its own.  If something is used twice it damn well should be in its own function.  _Never_ have duplicated code.
    You might like to apply that same logic to the "is this memory still intact" check as well ;-)
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Another possible enhancement: add another pointer-sized field where you keep  the address of the previously allocated block.  That way you'll have a linked-list of all the allocated blocks which gives you the ability (carefully) to walk the list at any time so you can: (1) Display the number of allocated blocks and their sizes.  (2) Check them all for good magic numbers.  Comes in VERY handy sometimes.

    Another suggestion: Instead of all the subscripting to get to the magic fields, define a pair of structs for the header and trailer info, something like:

    typedef   int32    Magic;
    #define  MagicNumber   0x12345678

    typedef struct {
            Magic  HdrPw1;   /* to validate header base */
            void * Prev;         /* points to previous malloc()'ed block */
             uint32  BlockSize; /* Tells us where the end of block is */
             void * AllocatedAt; /* tells us where this was malloced() from in the code */
            Magic  HdrPw2;   /*  validates end of header block */
       } HdrRec;

    typedef struct {
        Magic TrailerPw1;  /* validates trailer */
        void * PtrToBegin;  /* ptr to block start */
        Magic  TrailerPw2; /* just to be sure */
    }  TrlRec;

    --------------------------------

    This makes the code a LOT clearer, as you're using symbolic names instead of integer offsets.

    Just my 2 cents...

    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    The linked-list stuff, agreed.
    This would then allow _any_ code to call upon the "have we corrupted our memory yet" checking function at any time.  Would be nice.
    Structures, yes, it does make the code clearer (using long arrays is a quick & dirty solution, but it _was_ intended as a quick fix just to diagnose a single bug, rather than as a generic programming tool).

    As a generic programming tool this definately the way to go.
    I just hope the existing wrappers let Gezna find the (original) bug before we have to go that far!


    I'm not sure how one would go about using the "AllocatedAt" field tho' - I've often wanted to do just this in C code, but I've never found a way to walk back up the stack - I've had to settle for using the __FILE__ and __LINE__ macros, and storing a char * (__FILE__) and integer (__LINE__) which have to be passed in by the caller.

    Whilst we're on the subject of generic extentions to this idea, one thing I'd do if I needed this long-term is to let the caller specify an "object type" string, so that each module that allocated/deallocated memory could say "this is for an XXX object" etc, e.g. my_malloc(sizeof(struct mystructure), "MyStructure main storage", __FILE__, __LINE__) - it'd be easier on the brain than just filenames and line numbers, especially if a single file uses malloc for more than one purpose.
    Oh yes, and put the __FILE__ and __LINE__ into a #define macro so the client code doesn't need to remember to do it...
    And perhaps the ability to have different magic numbers used by different sections of the code, thus allowing us to detect when one bit of code gets memcpy'ed over another...


    However, if we introduce too much complexity into these malloc wrappers, we're in danger of taking more time getting this "tool" working than we would take hunting for the bug using other methods (it's all highly instructive stuff tho' ;-)
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Yep, the perfect is the enemy of the good-enough.

    But if you have a BIG program you may eventually need all the bells and whistles.  I stared out with a simple 0x12345678 at each end of the block and ended up with all of the above.

    It IS hard to get the calling address, you can do it sometimes by indexing a few dwords past your last local variable, but that is of course very compiler-dependent and compile-flags dependent.

    And it is nice to add a text message tag so you know what kind of block each one is.

    BTW  doesnt realloc() have to copy the payload to the new block?

    Another fribble:  have free() write 0xFFFF into the freed block, just to catch cases where you use the pointer after it is freed.

    Yet Another (hazardous) feature:  have your malloc return the pointer into a parameter, not as the function result, so malloc() can save away the address of that variable, so free() can set it to NULL!    Hazardous of course as you may malloc() into a local variable that may be out of scope when you do the free().  But if you know this is a restriction, you can avoid doing so.






    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    In every implementation of realloc() I've met, the data gets copied by realloc(), so all you have to do is ensure that your end marker is at the "new" end of the data instead of where it used to end.

    Very good point about free() tho' - I'd completely overlooked that.
    A good idea, in the my_free() method would be to put
      memset(space,0xFF,space[0]);
    just before the call to free() - that way no data will survive being freed.

    Might be a good idea to do likewise for memory that's just been allocated (maybe set that to 0xEE instead so we can tell the difference when we next explode).  I know some malloc()s zero memory before returning it, but I wouldn't advise relying on it.  Doing that should help guard against using uninitialised memory.
    0
     

    Author Comment

    by:Gezna
    Peter, you were right on the free(NULL).  I found a little redundency where I freed something twice.  The first time I set the pointer to NULL.  So that's one thing fixed, but I doubt it was the cause of my original problem, though.

    If these memory routines are correct then I'm definitely writing outside the allocated block.  I have to track down where/why this is happening.  In free() the preceding block is -1, and the following block is 20 rather than the magicNumber.
    0
     
    LVL 2

    Expert Comment

    by:Peter-Darton
    Good stuff - this is, after all, progress.

    I think the next stage is to put the "is this block of memory a valid, uncorrupted, block of memory with intact magic numbers" check into a separate function that your main code can call.  e.g.

    int my_memory_is_intact( void *memory )
    {
      long *space = (long*)memory;
      int index;
      space-=2;
      index = space[0]/sizeof(long)-1;
      if( space[1]!=magicNumber || space[index]!=magicNumber )
        return 0;
      else
        return 1;
    }
    #define CHECK_MEMORY_BLOCK(p) if( p & !my_memory_is_intact(p) ) { fprintf(stderr,"Memory corruption found %s:%u",__FILE__,__LINE__); fflush(stderr); abort(); }


    Then you call this method (using this macro) from your main program call at the entry and exit of each of your functions on all data the function uses.  In the case of allocated memory being used to store pointers to other allocated memory, check the outer "layer" first, then use it to determine the inner pointers and then check those etc.
    i.e. each function should check any data it uses for "intactness" both before it does anything, and just before it returns.

    Ok, so this will slow your program down a lot, but it might well then tell you where the problem is

    But only if the function that's at fault is only corrupting it's "own" memory.  If the function that's causing all these problems is causing damage elsewhere then we'll need to take grg99's advice and go the whole-hog on this wrapper concept - linked lists & everything.
    0
     

    Author Comment

    by:Gezna
    Sounds good to me.  I deeply appreciate your help.

    I'm going to award the points now, and open another question when (maybe if?) I need further help on this.  I think you've put enough useful info out there to earn those points.  If it were $500, maybe I'd keep stringing you along a bit more!  Keep a look out for me and my next question which will be for 500 points in the C programming section.  It may not get posted until tomorrow, or if I'm lucky and solve this, it may not get posted ever!  Let's hope!
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    In our object-oriented world the class is a minimal unit, a brick for constructing our applications. It is an abstraction and we know well how to use it. In well-designed software we are not usually interested in knowing how objects look in memory. …
    Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
    The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
    The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

    846 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    10 Experts available now in Live!

    Get 1:1 Help Now