Link to home
Start Free TrialLog in
Avatar of nigel5
nigel5

asked on

delete blocking at a semaphore...

I am compiling an multithread app under solaris 2.5.1 fully patched, and running on solaris 7.

The app locks up, including all other threads with zero CPU usage.

I allocate a char array, then use it, then I free it.

Is there a known problem with 2.5.1 compilation, and solaris7 running? if not, how do I fix this bug? It is not readilly repeatable.

The following is a GDB attach...

Symbols already loaded for /usr/lib/libthread.so.1
0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
(gdb) bt
#0  0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
#1  0xdf61cc61 in _park () from /usr/lib/libthread.so.1
#2  0xdf61c8d7 in _swtch () from /usr/lib/libthread.so.1
#3  0xdf61df52 in _mutex_adaptive_lock () from /usr/lib/libthread.so.1
#4  0xdf61dd6c in mutex_lock () from /usr/lib/libthread.so.1
#5  0xdf6ce8a5 in free () from /usr/lib/libc.so.1
#6  0xdfaf259c in __builtin_delete () from /gems/lib/liblogging.so.1
#7  0xdfaf25e7 in __builtin_vec_delete () from /gems/lib/liblogging.so.1
#8  0x819639c in CallShell::activate (this=0x82a98f8) at CallShell.cpp:160


Avatar of nietod
nietod

Is it possible that you've corrupted the heap?  Look for cases where you might write to memory past the ends of a allocated block of memory (not necessarily this block)

Is the code short enough to post?
Avatar of nigel5

ASKER

I have a CppString class that has been unit tested to death, so there is nothing in there. I do use heap allocations, but the only place I allocate an array of chars is in the function outlined.

This code has not changed in ages, and it has been called lots, so this is a very recent bug.

The code (fragmented)..

bool CallShell::activate()
{
   vector<CppString> args = getArgs();  // returns the command line to call
   unsigned num = args.size() + 1;      // plus one for the NULL
   char** sys = new char*[num];
   int stat_avail = 0;                  // set when command completed
   int status = 1;                      // exit status of the command

   // fill system command line
   for(int i = 0; i < num; i++)
   {
      // CppString has a char* operator, passing back its reference
      sys[i] = args[i];
   }
   sys[num] = NULL;

   my_bexec(sys[0], sys, &stat_avail, &stat);
   while(stat_avail == 0)
   {
      // sleep for a 1/10 second waiting for child
      my_usleep(10000);
   }

   // no need to delete sys[0-num] as destructor of vector deletes its
   // elements, this deletes the buffers stored in this array.
   // delete will call destructors on each element, but this is a char*
   // for which there is no destructor.
   delete [] sys;

   // rturn exit status.
   return status == 0;
}
Avatar of nigel5

ASKER

Oh, the stack trace provided in the original question occurs in the line...

   delete [] sys;
You are corrupting the heap.

char** sys = new char*[num];
    *   *   *
sys[num] = NULL;

the allocation allocates num items that are indexed from 0 to num-1.  but then you set the "numth" one to NULL.

You need to allocate num+1 items.
Avatar of nigel5

ASKER

Oops, sorry, typo, the chunk of code (cut and pasted this time)...

l is the list of args
ssd is the secure shell directory
call() returns the command

      int num = l.size() + 1; // plus 1 for sys command
      char** sys = new char* [num + 1]; // plus 1 for NULL terminator
      CppString c = ssd + "/" + call();
      sys[0] = c;

      for(int i = 1; i < num; i++)
      {
         if(super_debug) cout << "adding arg to call args : " << l[i-1] << endl;
         sys[i] = l[i-1];
      }
      sys[num] = NULL;
What is my_bexec?
Avatar of nigel5

ASKER

my_bexec is a wrapper round vfork, it has a whole management thing around it, waiting for SIGCHLD and all that.

Signals are wrapped in the way outlined by the POSIX standard.
Could it be messing uop memory?  What does it do with sys?

We're not psychics so we can't find the bug without seeing the code an knowing a lot more about yoru program.  You'll either have to provide more info or work on debugging it yourself.

You might try commenting out other portions of the program and seee fi the bug can be made to go away, then add parts back again and see if it appears and in this way home in on the error.
Avatar of nigel5

ASKER

The bug is very *VERY* intermitent, in fact it is the first time it has happened. The reason I am asking is because new and delete have kernel dependant functionality, and I was wondering if there was a known incompatibility between compiling on a lower version of Solaris, and running on a different one. new and delete should never fail but for corruption (which should be picked up with a profiler i.e. purify/insure++), or due to some other inconsistency in OS.

there currently is a memory leak under investigation, using a C interface, but the fragment where this falls over has been checked, and is clean, there are no frees in this section. why would a mutex stay locked if a free deletes memory allocated with new?

my_bexec does nothing with the passed char** arg but run a vfork, and then a bexec to start another process. The status, and stat_avail, are stored in a list. When the child process exits, it sends a SIGCHLD, message, the exit status value is then put into status, and sta_avail set to 1.
ASKER CERTIFIED SOLUTION
Avatar of nietod
nietod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nigel5

ASKER

Thanks.