avi_india
asked on
Real vague question on a linux core
Hi All
I am on RedHat Linux advanced server. My program recieves and responds to multiple clients using sockets. Program has been built for concurrent conection requests.
(This program is a huge 500K lines of code and works on Linux, HP, solaris and windows)
Now,
As long as only one client is sending the requets (consider it as oad teting with one user) - program works fine. as soon as the load is increased to two simultaneous clients, program crashes!!!!!!!!!!!!!
(Now you'll say - so??? solve it)..
Ofcourse I am trying to solve it but :
1. Backtrace of the core file is not consistant. Ihave atleast 20 different backtraces.
2. If code is in debug mode, crash does not happen.
2. All the back traces have atlease one function common. Say funca(). As soon as I start putting printfs to get good idea about the crash location, crash does not happen.
So the question is : How do I solve this issue?
I am on linux advanced server 2.1 My code has been compiled using gcc 2.96.
Any kind of help/pointers would be appreciated.
Thanks
-Ajay
I am on RedHat Linux advanced server. My program recieves and responds to multiple clients using sockets. Program has been built for concurrent conection requests.
(This program is a huge 500K lines of code and works on Linux, HP, solaris and windows)
Now,
As long as only one client is sending the requets (consider it as oad teting with one user) - program works fine. as soon as the load is increased to two simultaneous clients, program crashes!!!!!!!!!!!!!
(Now you'll say - so??? solve it)..
Ofcourse I am trying to solve it but :
1. Backtrace of the core file is not consistant. Ihave atleast 20 different backtraces.
2. If code is in debug mode, crash does not happen.
2. All the back traces have atlease one function common. Say funca(). As soon as I start putting printfs to get good idea about the crash location, crash does not happen.
So the question is : How do I solve this issue?
I am on linux advanced server 2.1 My code has been compiled using gcc 2.96.
Any kind of help/pointers would be appreciated.
Thanks
-Ajay
In my experience, you are likely experiencing one of these three problems:
1) You have a multithreaded program with unprotected (non-threadsafe)
access to one or more shared data items. This could even be as a result
of uses of non-threadsafe library calls (like strtok). Adding printf statements
injects console i/o into the flow of control. the i/o triggers a thread context
switch, possibly rearranging the context switches such that you avoid concurrent
access.
2) You are corrupting the stack. Probably in funca() or one of the functions it
calls. Again, adding printf() statements changes how the stack is utilized.
3) There is a compiler optimization problem. If running in "debug mode" means
you compiled with -g -O0, rebuild with -g and your normal optimization level.
this will give you symbols into the the offending code (although linenumbers might
not match up). Then see if the debugger traps.
1) You have a multithreaded program with unprotected (non-threadsafe)
access to one or more shared data items. This could even be as a result
of uses of non-threadsafe library calls (like strtok). Adding printf statements
injects console i/o into the flow of control. the i/o triggers a thread context
switch, possibly rearranging the context switches such that you avoid concurrent
access.
2) You are corrupting the stack. Probably in funca() or one of the functions it
calls. Again, adding printf() statements changes how the stack is utilized.
3) There is a compiler optimization problem. If running in "debug mode" means
you compiled with -g -O0, rebuild with -g and your normal optimization level.
this will give you symbols into the the offending code (although linenumbers might
not match up). Then see if the debugger traps.
Take a look at Valgrind (http://valgrind.kde.org/) to resolve any stack (and in general memory) related problems. The advantage of Valgrind is that you don't have to modify your code, so you can run it on the app that crashes.
Also, Helgrind (part of the Valgrind suite of tools) allows you to find potential race conditions in multithreaded apps.
Also, Helgrind (part of the Valgrind suite of tools) allows you to find potential race conditions in multithreaded apps.
> If code is in debug mode, crash does not happen.
in 99.9xx% this is a problem with allocated memory, pointer problem, memory leak, whatever ..
I'd go vi Valgrind, Mpatrol, Electric Fence ...
http://www.cbmamiga.demon.co.uk/mpatrol/
http://perens.com/FreeeSoftware
http://valgrind.kde.org/
in 99.9xx% this is a problem with allocated memory, pointer problem, memory leak, whatever ..
I'd go vi Valgrind, Mpatrol, Electric Fence ...
http://www.cbmamiga.demon.co.uk/mpatrol/
http://perens.com/FreeeSoftware
http://valgrind.kde.org/
ASKER
Thanks guys.. I m trying to use valgrind. Will update the status as soon as i find something.
(only problem which I can forsee is that - my code also uses smart heap. valgring might clash with this)
(only problem which I can forsee is that - my code also uses smart heap. valgring might clash with this)
ASKER
Nothing helped here. bug has been marked as deffered for next release of the product.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Are you doing some sort of signalling ... Try building signal handler for all catchable signals and print the signal number and originator ... you will have to use sigaction and not signal interface for this