Solved

Trapping delphi app hang

Posted on 2009-07-06
13
584 Views
Last Modified: 2013-12-04
I have a multithreaded Winform App written in Delphi 5, which randomly (under load) is hanging.

I want to narrow down my search for the last procedure called (whether on main form or one of the threads) before the hangs (whilst running thru IDE debugger) but have been unable to do so -- the app hangs, but nothing is reported in the IDE. The IDE then reports its own error, requiring a restart.

I have tried with EurekaLog too, with no success.  Ultimately I have to kill the app using Task Mgr, with nothing reported.

Help. Any advice?
0
Comment
Question by:brenlex
  • 4
  • 4
  • 3
  • +1
13 Comments
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
what does the app do ?
how many threads ?
interaction between threads ?
data exchange between threads ? locked access to the data (critical section) ?

it's a bit needle in haystack

or add logging to what happens
0
 
LVL 21

Expert Comment

by:developmentguru
Comment Utility
 I have approached this problem a number of different ways.

  One way is with Logging.  Since you are dealing with multiple threads would need to be sure to either log the time (processor cycle count would be a good way) or make sure the logging is done in a way that is serialized (such as creating a separate service for the logging).  You then put a log entry on the entry and exit of each and every method on the form and in the thread(s).  This can be a time consuming exercise.  If you are able to look at your log and see that the last method was entered but never left, then you know where to look.

  Another approach I used when a random access violation would occur on exit of the program.  I called it "Binary deconstruction".  You remove a good sized portion of the forms in the application (about half), compile (making sure to remove references as needed), and run.  If you remove half the forms, but leave the form that is "causing" the issue and it no longer happens then you would know that it was an issue between that form and another.  If you remove half the forms and the issue continues then you have removed those forms from consideration as to what is causing the problem.  If you can get it down to the last form then you continue to remove half the code until the error no longer occurs.  This process makes the application point out to you where the problem is, but it is also time consuming.

  From personal experience it sounds like you are stuck in a loop that is eating up memory until you get an out of memory error.  You should start to look at the code that was changed before the error started.  Hopefully this only started happening recently.

  Experiment a bit and see what more you can learn.  Let me know if you need more.
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
1 way of adding logging is using a profiler like prodelphi
http://www.torry.net/pages.php?id=1525

you don't need to do much, just install it, it puts a start and end code in every proc/func
and even gives the time spent in each
0
 
LVL 4

Expert Comment

by:JonasMalmsten
Comment Utility
Needle in a haystack was my thought exactely... I don't have a solution either, but maybe a few useful tricks.

If by require restart you mean you need to restart the computer, try killing Delphi32.exe using taskmgr instead of killing your application. This will inherently kill your application aswell and you probably will not need to restart the computer.

What is the error message you get from Delphi? I assume you have looked at the StackTrace (ctrl+alt+s) at this point. If one of the threads is using alot of CPU you can probably catch what it is doing by breaking the program using F2 from the IDE. The debugger will pause the programm immediately and display the CPU window (which for the most part is not very useful). From here, close the CPU window and press F7, program execution will resume and the debugger will break again on the next line of Pascal/Delphi code that is executed. By randomly doing this you can get a good idea about where the program is spending alot of time. You can ofcourse also use a profiler too, but I usually try this first because most profilers will mess around with the code more than I like.
0
 

Author Comment

by:brenlex
Comment Utility
The problem I have with tools like Prodelphi and EurekaLog, is that they do not dump their findings to file unless the app is closed down in a controlled fashion, which I cannot achieve as the app is locking up.

When attempting to debug from the IDE and the hang occurs, I get inconsistent behaviour from the IDE with regards to being able to debug (sometimes an access violation in comctl32.dll, sometimes non-responsive - requiring IDE restart).

I decided to negate this inconsistency and opted to log to file on each processing thread -- and have found something very strange... if I have "optimisation" enabled in my compiler options, then the app truly just hangs (all threads stop) ... game over.  However with "optimisation" disabled, my threads continue to process in the background (polling db tables for queue) except the app gives the appearance that it has hung (form is unresponsive) but ONLY for a period of time.  In my main processing thread, I report to a log file with each iteration of my Execute method, to let me know it is still going -- on occassion however, it stops writing all together for a random amount of time (sometimes up to 5 mins), and then suddenly comes alive again !?!

Is there something I should be aware of with regard to my threads when optimisation is ON?

How can a thread stop processing for a random amount of time when they are never implicitly suspended?
0
 
LVL 21

Expert Comment

by:developmentguru
Comment Utility
In your logging check your memory availability.  If Windows believes it is running low on memory it will pause to swap memory content out to disk, freeing up memory.  This process is not only slow, but the juggling of memory to disk afterwards can also be very time consuming.  Memory may not be the issue but it is a good resource to target first.  Have you been able to determine which method it stops in during these lapses?

An important thing to keep in mind is this: You are seeing multiple issues.  For now I would continue to run the program without optimization until the other problems have been addressed.  Only when the rest of it is running like you believe it should would I go back and try to re-enable optimization and see what the effects are.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 4

Accepted Solution

by:
JonasMalmsten earned 200 total points
Comment Utility
To my knowledge optimization should not affect threads other than that they may execute slightly faster, which I suppose can change things a bit when multithreading and there is a problem with concurrency or synchronization. Another thing optimization does is remove variables that are never used so for example if there is a bug writing data to memory where it shouldn't write data, optimization may change things (think working with pointers or arrays where there is no bounds checking).

I think what you said about "form is unresponsive) but ONLY for a period of time" could be a key. This means your main thread is busy (the main thread is easier to debug). By main thread here I refer to the one thread that is processing the windows message queue in your main form. What method are you using to synchronize your threads, I mean to make sure no two threads are writing to the same logfile at the same time etc? If there is only one thread writing to the logfile, how do you fetch the information from the worker threads in order to write the logfile to make sure the worker thread is not updating that same information as you are retreiving it?

When you use for example the Synchronize method, the thread is implicitly suspended while the main thread is executing the method you want to synchronize. If a call to Synchronize takes a long time then your main form will appear to freeze. If for example two threads call synchronize at the same time and in the synchronize method you are waiting for the other thread to do something, you will have a dead-lock (your main form can be frozen for a very long time).

When using threads to pull data from tables (I assume this is from some database). Which components are you using to access the tables, are they thread safe? For example the component library I use for accessing oracle database have a boolean property called "ThreadSafe" that needs to be set = True (default = False). The property is on the TOracleSession component (corresponding to the TDataBase), which is usually shared by several threads. This may vary depending on which set of components you are using.

If it is not a huge amount of code, it would be easier to help you if we had a few snippets to look at. Specially any snippets dealing with synchronization between threads.
0
 

Author Comment

by:brenlex
Comment Utility
Correction -- the background threads only continue to write to their dedicated log files when I run the app from the IDE debugger.  The main processing thread still appears to pause for an indeterminable period of time, before jumping back into life.  If I run the app as a standalone exe, all threads stop reporting to files and the app has officialy hung.

Unfortunately there is just too much code to post, but I believe I have found the culprit, though I am not sure WHY it is occuring...

Essentially any of my threads can call a globally declared MyDBService object.  This unit contains methods to pull specific info back from the MySQL DB.  ALL of MyDBService's  methods contain critical sections (see code snippet).

fMyTable represents an instance of a class which encapsulates access to a particular table, and is created/destroyed in the TMyDBService constructor/destructor respectively.  When I put an exception handler into the MyTable class's GetValByMyCode I have discovered that I randomly experience  
"Access violation at address 10011F04 in module 'libmySQL.DLL'. Write of address FFFFFF04" -- from there on in, the app starts to disintegrate.

Methods within the forementioned global instance of MyClass are hit from all threads, and valid values are passed in (prmCode) always.

Are there other rules I should be aware of when implementing critical sections?

function TMyDBService.ValFromMyCode(prmCode: string): string;

begin

EnterCriticalSection(fLockSection);

  try

    Result := fMyTable.GetValByMyCode(prmCode);

  finally

         LeaveCriticalSection(fLockSection);

  end; {try..finally}

end;

Open in new window

0
 
LVL 21

Assisted Solution

by:developmentguru
developmentguru earned 200 total points
Comment Utility
I was just checking on the status of the question and had another thought.  You should include a timestamp on each line of your log.  This could show you the spot where it is taking all this time.  If you want to do this approach, be prepared to wait a very long time.

Another thought regards the use of the synchronize method of TThread.  When your threads update the display are they always using a Synchronize(MyUpdateMethod) approach?  The behavior you are seeing may indicate a thread trying to update the display without running through Synchronize.
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
do your threads use this database object all at the same time ?
and this prmcode is allways protected for writing to too ?
0
 

Author Comment

by:brenlex
Comment Utility
developmentguru -- yes, the Synchronize method is used in all instances of display update.

Geert -- yes, the threads will call methods contained in this global db object at the same time, but probably not the same inidividual method.  I thought the use of a critical section is approriate in itself, unless there is something else I need to consider?  How do I protect prmcode? I thought each call to [ValFromMyCode] would create its own instance of prmcode on the stack anyway...no?
0
 
LVL 36

Assisted Solution

by:Geert Gruwez
Geert Gruwez earned 100 total points
Comment Utility
yes, that's right, that don't seem to be the problem
hmmm i can't really come up with anything more
we had this too, it was a sendmessage sent at closure.  
sometimes it didn't work, took a time to find too
0
 

Author Closing Comment

by:brenlex
Comment Utility
Put down to fault in third party components.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

What my article will show is if you ever had to do processing to a listbox without being able to just select all the items in it. My software Visual Studio 2008 crystal report v11 My issue was I wanted to add crystal report to a form and show…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now