Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

delete blocking at a semaphore...

Posted on 2000-02-16
11
Medium Priority
?
248 Views
Last Modified: 2010-04-02
I am compiling an multithread app under solaris 2.5.1 fully patched, and running on solaris 7.

The app locks up, including all other threads with zero CPU usage.

I allocate a char array, then use it, then I free it.

Is there a known problem with 2.5.1 compilation, and solaris7 running? if not, how do I fix this bug? It is not readilly repeatable.

The following is a GDB attach...

Symbols already loaded for /usr/lib/libthread.so.1
0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
(gdb) bt
#0  0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
#1  0xdf61cc61 in _park () from /usr/lib/libthread.so.1
#2  0xdf61c8d7 in _swtch () from /usr/lib/libthread.so.1
#3  0xdf61df52 in _mutex_adaptive_lock () from /usr/lib/libthread.so.1
#4  0xdf61dd6c in mutex_lock () from /usr/lib/libthread.so.1
#5  0xdf6ce8a5 in free () from /usr/lib/libc.so.1
#6  0xdfaf259c in __builtin_delete () from /gems/lib/liblogging.so.1
#7  0xdfaf25e7 in __builtin_vec_delete () from /gems/lib/liblogging.so.1
#8  0x819639c in CallShell::activate (this=0x82a98f8) at CallShell.cpp:160


0
Comment
Question by:nigel5
  • 6
  • 5
11 Comments
 
LVL 22

Expert Comment

by:nietod
ID: 2526645
Is it possible that you've corrupted the heap?  Look for cases where you might write to memory past the ends of a allocated block of memory (not necessarily this block)

Is the code short enough to post?
0
 

Author Comment

by:nigel5
ID: 2527937
I have a CppString class that has been unit tested to death, so there is nothing in there. I do use heap allocations, but the only place I allocate an array of chars is in the function outlined.

This code has not changed in ages, and it has been called lots, so this is a very recent bug.

The code (fragmented)..

bool CallShell::activate()
{
   vector<CppString> args = getArgs();  // returns the command line to call
   unsigned num = args.size() + 1;      // plus one for the NULL
   char** sys = new char*[num];
   int stat_avail = 0;                  // set when command completed
   int status = 1;                      // exit status of the command

   // fill system command line
   for(int i = 0; i < num; i++)
   {
      // CppString has a char* operator, passing back its reference
      sys[i] = args[i];
   }
   sys[num] = NULL;

   my_bexec(sys[0], sys, &stat_avail, &stat);
   while(stat_avail == 0)
   {
      // sleep for a 1/10 second waiting for child
      my_usleep(10000);
   }

   // no need to delete sys[0-num] as destructor of vector deletes its
   // elements, this deletes the buffers stored in this array.
   // delete will call destructors on each element, but this is a char*
   // for which there is no destructor.
   delete [] sys;

   // rturn exit status.
   return status == 0;
}
0
 

Author Comment

by:nigel5
ID: 2527943
Oh, the stack trace provided in the original question occurs in the line...

   delete [] sys;
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 22

Expert Comment

by:nietod
ID: 2528105
You are corrupting the heap.

char** sys = new char*[num];
    *   *   *
sys[num] = NULL;

the allocation allocates num items that are indexed from 0 to num-1.  but then you set the "numth" one to NULL.

You need to allocate num+1 items.
0
 

Author Comment

by:nigel5
ID: 2528143
Oops, sorry, typo, the chunk of code (cut and pasted this time)...

l is the list of args
ssd is the secure shell directory
call() returns the command

      int num = l.size() + 1; // plus 1 for sys command
      char** sys = new char* [num + 1]; // plus 1 for NULL terminator
      CppString c = ssd + "/" + call();
      sys[0] = c;

      for(int i = 1; i < num; i++)
      {
         if(super_debug) cout << "adding arg to call args : " << l[i-1] << endl;
         sys[i] = l[i-1];
      }
      sys[num] = NULL;
0
 
LVL 22

Expert Comment

by:nietod
ID: 2528171
What is my_bexec?
0
 

Author Comment

by:nigel5
ID: 2530718
my_bexec is a wrapper round vfork, it has a whole management thing around it, waiting for SIGCHLD and all that.

Signals are wrapped in the way outlined by the POSIX standard.
0
 
LVL 22

Expert Comment

by:nietod
ID: 2530808
Could it be messing uop memory?  What does it do with sys?

We're not psychics so we can't find the bug without seeing the code an knowing a lot more about yoru program.  You'll either have to provide more info or work on debugging it yourself.

You might try commenting out other portions of the program and seee fi the bug can be made to go away, then add parts back again and see if it appears and in this way home in on the error.
0
 

Author Comment

by:nigel5
ID: 2531886
The bug is very *VERY* intermitent, in fact it is the first time it has happened. The reason I am asking is because new and delete have kernel dependant functionality, and I was wondering if there was a known incompatibility between compiling on a lower version of Solaris, and running on a different one. new and delete should never fail but for corruption (which should be picked up with a profiler i.e. purify/insure++), or due to some other inconsistency in OS.

there currently is a memory leak under investigation, using a C interface, but the fragment where this falls over has been checked, and is clean, there are no frees in this section. why would a mutex stay locked if a free deletes memory allocated with new?

my_bexec does nothing with the passed char** arg but run a vfork, and then a bexec to start another process. The status, and stat_avail, are stored in a list. When the child process exits, it sends a SIGCHLD, message, the exit status value is then put into status, and sta_avail set to 1.
0
 
LVL 22

Accepted Solution

by:
nietod earned 200 total points
ID: 2531950
>> I was wondering if there was a known
>> incompatibility between compiling on a
>> lower version of Solaris, and running on
>> a different one
That would be EXTREMELY unlikely.  Programs are designed fundamentally to be forward compatible because you never know what version of the OS it will run under.  And if a mistake was made in thid design, it would almost certainly be detected as soon as the OS was released, so it is unlikely tha you would stumble upon it.
0
 

Author Comment

by:nigel5
ID: 2531964
Thanks.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question