nigel5
asked on
delete blocking at a semaphore...
I am compiling an multithread app under solaris 2.5.1 fully patched, and running on solaris 7.
The app locks up, including all other threads with zero CPU usage.
I allocate a char array, then use it, then I free it.
Is there a known problem with 2.5.1 compilation, and solaris7 running? if not, how do I fix this bug? It is not readilly repeatable.
The following is a GDB attach...
Symbols already loaded for /usr/lib/libthread.so.1
0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
(gdb) bt
#0 0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
#1 0xdf61cc61 in _park () from /usr/lib/libthread.so.1
#2 0xdf61c8d7 in _swtch () from /usr/lib/libthread.so.1
#3 0xdf61df52 in _mutex_adaptive_lock () from /usr/lib/libthread.so.1
#4 0xdf61dd6c in mutex_lock () from /usr/lib/libthread.so.1
#5 0xdf6ce8a5 in free () from /usr/lib/libc.so.1
#6 0xdfaf259c in __builtin_delete () from /gems/lib/liblogging.so.1
#7 0xdfaf25e7 in __builtin_vec_delete () from /gems/lib/liblogging.so.1
#8 0x819639c in CallShell::activate (this=0x82a98f8) at CallShell.cpp:160
The app locks up, including all other threads with zero CPU usage.
I allocate a char array, then use it, then I free it.
Is there a known problem with 2.5.1 compilation, and solaris7 running? if not, how do I fix this bug? It is not readilly repeatable.
The following is a GDB attach...
Symbols already loaded for /usr/lib/libthread.so.1
0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
(gdb) bt
#0 0xdf6c25d1 in _lwp_sema_wait () from /usr/lib/libc.so.1
#1 0xdf61cc61 in _park () from /usr/lib/libthread.so.1
#2 0xdf61c8d7 in _swtch () from /usr/lib/libthread.so.1
#3 0xdf61df52 in _mutex_adaptive_lock () from /usr/lib/libthread.so.1
#4 0xdf61dd6c in mutex_lock () from /usr/lib/libthread.so.1
#5 0xdf6ce8a5 in free () from /usr/lib/libc.so.1
#6 0xdfaf259c in __builtin_delete () from /gems/lib/liblogging.so.1
#7 0xdfaf25e7 in __builtin_vec_delete () from /gems/lib/liblogging.so.1
#8 0x819639c in CallShell::activate (this=0x82a98f8) at CallShell.cpp:160
ASKER
I have a CppString class that has been unit tested to death, so there is nothing in there. I do use heap allocations, but the only place I allocate an array of chars is in the function outlined.
This code has not changed in ages, and it has been called lots, so this is a very recent bug.
The code (fragmented)..
bool CallShell::activate()
{
vector<CppString> args = getArgs(); // returns the command line to call
unsigned num = args.size() + 1; // plus one for the NULL
char** sys = new char*[num];
int stat_avail = 0; // set when command completed
int status = 1; // exit status of the command
// fill system command line
for(int i = 0; i < num; i++)
{
// CppString has a char* operator, passing back its reference
sys[i] = args[i];
}
sys[num] = NULL;
my_bexec(sys[0], sys, &stat_avail, &stat);
while(stat_avail == 0)
{
// sleep for a 1/10 second waiting for child
my_usleep(10000);
}
// no need to delete sys[0-num] as destructor of vector deletes its
// elements, this deletes the buffers stored in this array.
// delete will call destructors on each element, but this is a char*
// for which there is no destructor.
delete [] sys;
// rturn exit status.
return status == 0;
}
This code has not changed in ages, and it has been called lots, so this is a very recent bug.
The code (fragmented)..
bool CallShell::activate()
{
vector<CppString> args = getArgs(); // returns the command line to call
unsigned num = args.size() + 1; // plus one for the NULL
char** sys = new char*[num];
int stat_avail = 0; // set when command completed
int status = 1; // exit status of the command
// fill system command line
for(int i = 0; i < num; i++)
{
// CppString has a char* operator, passing back its reference
sys[i] = args[i];
}
sys[num] = NULL;
my_bexec(sys[0], sys, &stat_avail, &stat);
while(stat_avail == 0)
{
// sleep for a 1/10 second waiting for child
my_usleep(10000);
}
// no need to delete sys[0-num] as destructor of vector deletes its
// elements, this deletes the buffers stored in this array.
// delete will call destructors on each element, but this is a char*
// for which there is no destructor.
delete [] sys;
// rturn exit status.
return status == 0;
}
ASKER
Oh, the stack trace provided in the original question occurs in the line...
delete [] sys;
delete [] sys;
You are corrupting the heap.
char** sys = new char*[num];
* * *
sys[num] = NULL;
the allocation allocates num items that are indexed from 0 to num-1. but then you set the "numth" one to NULL.
You need to allocate num+1 items.
char** sys = new char*[num];
* * *
sys[num] = NULL;
the allocation allocates num items that are indexed from 0 to num-1. but then you set the "numth" one to NULL.
You need to allocate num+1 items.
ASKER
Oops, sorry, typo, the chunk of code (cut and pasted this time)...
l is the list of args
ssd is the secure shell directory
call() returns the command
int num = l.size() + 1; // plus 1 for sys command
char** sys = new char* [num + 1]; // plus 1 for NULL terminator
CppString c = ssd + "/" + call();
sys[0] = c;
for(int i = 1; i < num; i++)
{
if(super_debug) cout << "adding arg to call args : " << l[i-1] << endl;
sys[i] = l[i-1];
}
sys[num] = NULL;
l is the list of args
ssd is the secure shell directory
call() returns the command
int num = l.size() + 1; // plus 1 for sys command
char** sys = new char* [num + 1]; // plus 1 for NULL terminator
CppString c = ssd + "/" + call();
sys[0] = c;
for(int i = 1; i < num; i++)
{
if(super_debug) cout << "adding arg to call args : " << l[i-1] << endl;
sys[i] = l[i-1];
}
sys[num] = NULL;
What is my_bexec?
ASKER
my_bexec is a wrapper round vfork, it has a whole management thing around it, waiting for SIGCHLD and all that.
Signals are wrapped in the way outlined by the POSIX standard.
Signals are wrapped in the way outlined by the POSIX standard.
Could it be messing uop memory? What does it do with sys?
We're not psychics so we can't find the bug without seeing the code an knowing a lot more about yoru program. You'll either have to provide more info or work on debugging it yourself.
You might try commenting out other portions of the program and seee fi the bug can be made to go away, then add parts back again and see if it appears and in this way home in on the error.
We're not psychics so we can't find the bug without seeing the code an knowing a lot more about yoru program. You'll either have to provide more info or work on debugging it yourself.
You might try commenting out other portions of the program and seee fi the bug can be made to go away, then add parts back again and see if it appears and in this way home in on the error.
ASKER
The bug is very *VERY* intermitent, in fact it is the first time it has happened. The reason I am asking is because new and delete have kernel dependant functionality, and I was wondering if there was a known incompatibility between compiling on a lower version of Solaris, and running on a different one. new and delete should never fail but for corruption (which should be picked up with a profiler i.e. purify/insure++), or due to some other inconsistency in OS.
there currently is a memory leak under investigation, using a C interface, but the fragment where this falls over has been checked, and is clean, there are no frees in this section. why would a mutex stay locked if a free deletes memory allocated with new?
my_bexec does nothing with the passed char** arg but run a vfork, and then a bexec to start another process. The status, and stat_avail, are stored in a list. When the child process exits, it sends a SIGCHLD, message, the exit status value is then put into status, and sta_avail set to 1.
there currently is a memory leak under investigation, using a C interface, but the fragment where this falls over has been checked, and is clean, there are no frees in this section. why would a mutex stay locked if a free deletes memory allocated with new?
my_bexec does nothing with the passed char** arg but run a vfork, and then a bexec to start another process. The status, and stat_avail, are stored in a list. When the child process exits, it sends a SIGCHLD, message, the exit status value is then put into status, and sta_avail set to 1.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks.
Is the code short enough to post?