Trapping delphi app hang

I have a multithreaded Winform App written in Delphi 5, which randomly (under load) is hanging.

I want to narrow down my search for the last procedure called (whether on main form or one of the threads) before the hangs (whilst running thru IDE debugger) but have been unable to do so -- the app hangs, but nothing is reported in the IDE. The IDE then reports its own error, requiring a restart.

I have tried with EurekaLog too, with no success.  Ultimately I have to kill the app using Task Mgr, with nothing reported.

Help. Any advice?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Geert GOracle dbaCommented:
what does the app do ?
how many threads ?
interaction between threads ?
data exchange between threads ? locked access to the data (critical section) ?

it's a bit needle in haystack

or add logging to what happens
 I have approached this problem a number of different ways.

  One way is with Logging.  Since you are dealing with multiple threads would need to be sure to either log the time (processor cycle count would be a good way) or make sure the logging is done in a way that is serialized (such as creating a separate service for the logging).  You then put a log entry on the entry and exit of each and every method on the form and in the thread(s).  This can be a time consuming exercise.  If you are able to look at your log and see that the last method was entered but never left, then you know where to look.

  Another approach I used when a random access violation would occur on exit of the program.  I called it "Binary deconstruction".  You remove a good sized portion of the forms in the application (about half), compile (making sure to remove references as needed), and run.  If you remove half the forms, but leave the form that is "causing" the issue and it no longer happens then you would know that it was an issue between that form and another.  If you remove half the forms and the issue continues then you have removed those forms from consideration as to what is causing the problem.  If you can get it down to the last form then you continue to remove half the code until the error no longer occurs.  This process makes the application point out to you where the problem is, but it is also time consuming.

  From personal experience it sounds like you are stuck in a loop that is eating up memory until you get an out of memory error.  You should start to look at the code that was changed before the error started.  Hopefully this only started happening recently.

  Experiment a bit and see what more you can learn.  Let me know if you need more.
Geert GOracle dbaCommented:
1 way of adding logging is using a profiler like prodelphi

you don't need to do much, just install it, it puts a start and end code in every proc/func
and even gives the time spent in each
Become a CompTIA Certified Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Needle in a haystack was my thought exactely... I don't have a solution either, but maybe a few useful tricks.

If by require restart you mean you need to restart the computer, try killing Delphi32.exe using taskmgr instead of killing your application. This will inherently kill your application aswell and you probably will not need to restart the computer.

What is the error message you get from Delphi? I assume you have looked at the StackTrace (ctrl+alt+s) at this point. If one of the threads is using alot of CPU you can probably catch what it is doing by breaking the program using F2 from the IDE. The debugger will pause the programm immediately and display the CPU window (which for the most part is not very useful). From here, close the CPU window and press F7, program execution will resume and the debugger will break again on the next line of Pascal/Delphi code that is executed. By randomly doing this you can get a good idea about where the program is spending alot of time. You can ofcourse also use a profiler too, but I usually try this first because most profilers will mess around with the code more than I like.
brenlexAuthor Commented:
The problem I have with tools like Prodelphi and EurekaLog, is that they do not dump their findings to file unless the app is closed down in a controlled fashion, which I cannot achieve as the app is locking up.

When attempting to debug from the IDE and the hang occurs, I get inconsistent behaviour from the IDE with regards to being able to debug (sometimes an access violation in comctl32.dll, sometimes non-responsive - requiring IDE restart).

I decided to negate this inconsistency and opted to log to file on each processing thread -- and have found something very strange... if I have "optimisation" enabled in my compiler options, then the app truly just hangs (all threads stop) ... game over.  However with "optimisation" disabled, my threads continue to process in the background (polling db tables for queue) except the app gives the appearance that it has hung (form is unresponsive) but ONLY for a period of time.  In my main processing thread, I report to a log file with each iteration of my Execute method, to let me know it is still going -- on occassion however, it stops writing all together for a random amount of time (sometimes up to 5 mins), and then suddenly comes alive again !?!

Is there something I should be aware of with regard to my threads when optimisation is ON?

How can a thread stop processing for a random amount of time when they are never implicitly suspended?
In your logging check your memory availability.  If Windows believes it is running low on memory it will pause to swap memory content out to disk, freeing up memory.  This process is not only slow, but the juggling of memory to disk afterwards can also be very time consuming.  Memory may not be the issue but it is a good resource to target first.  Have you been able to determine which method it stops in during these lapses?

An important thing to keep in mind is this: You are seeing multiple issues.  For now I would continue to run the program without optimization until the other problems have been addressed.  Only when the rest of it is running like you believe it should would I go back and try to re-enable optimization and see what the effects are.
To my knowledge optimization should not affect threads other than that they may execute slightly faster, which I suppose can change things a bit when multithreading and there is a problem with concurrency or synchronization. Another thing optimization does is remove variables that are never used so for example if there is a bug writing data to memory where it shouldn't write data, optimization may change things (think working with pointers or arrays where there is no bounds checking).

I think what you said about "form is unresponsive) but ONLY for a period of time" could be a key. This means your main thread is busy (the main thread is easier to debug). By main thread here I refer to the one thread that is processing the windows message queue in your main form. What method are you using to synchronize your threads, I mean to make sure no two threads are writing to the same logfile at the same time etc? If there is only one thread writing to the logfile, how do you fetch the information from the worker threads in order to write the logfile to make sure the worker thread is not updating that same information as you are retreiving it?

When you use for example the Synchronize method, the thread is implicitly suspended while the main thread is executing the method you want to synchronize. If a call to Synchronize takes a long time then your main form will appear to freeze. If for example two threads call synchronize at the same time and in the synchronize method you are waiting for the other thread to do something, you will have a dead-lock (your main form can be frozen for a very long time).

When using threads to pull data from tables (I assume this is from some database). Which components are you using to access the tables, are they thread safe? For example the component library I use for accessing oracle database have a boolean property called "ThreadSafe" that needs to be set = True (default = False). The property is on the TOracleSession component (corresponding to the TDataBase), which is usually shared by several threads. This may vary depending on which set of components you are using.

If it is not a huge amount of code, it would be easier to help you if we had a few snippets to look at. Specially any snippets dealing with synchronization between threads.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
brenlexAuthor Commented:
Correction -- the background threads only continue to write to their dedicated log files when I run the app from the IDE debugger.  The main processing thread still appears to pause for an indeterminable period of time, before jumping back into life.  If I run the app as a standalone exe, all threads stop reporting to files and the app has officialy hung.

Unfortunately there is just too much code to post, but I believe I have found the culprit, though I am not sure WHY it is occuring...

Essentially any of my threads can call a globally declared MyDBService object.  This unit contains methods to pull specific info back from the MySQL DB.  ALL of MyDBService's  methods contain critical sections (see code snippet).

fMyTable represents an instance of a class which encapsulates access to a particular table, and is created/destroyed in the TMyDBService constructor/destructor respectively.  When I put an exception handler into the MyTable class's GetValByMyCode I have discovered that I randomly experience  
"Access violation at address 10011F04 in module 'libmySQL.DLL'. Write of address FFFFFF04" -- from there on in, the app starts to disintegrate.

Methods within the forementioned global instance of MyClass are hit from all threads, and valid values are passed in (prmCode) always.

Are there other rules I should be aware of when implementing critical sections?

function TMyDBService.ValFromMyCode(prmCode: string): string;
    Result := fMyTable.GetValByMyCode(prmCode);
  end; {try..finally}

Open in new window

I was just checking on the status of the question and had another thought.  You should include a timestamp on each line of your log.  This could show you the spot where it is taking all this time.  If you want to do this approach, be prepared to wait a very long time.

Another thought regards the use of the synchronize method of TThread.  When your threads update the display are they always using a Synchronize(MyUpdateMethod) approach?  The behavior you are seeing may indicate a thread trying to update the display without running through Synchronize.
Geert GOracle dbaCommented:
do your threads use this database object all at the same time ?
and this prmcode is allways protected for writing to too ?
brenlexAuthor Commented:
developmentguru -- yes, the Synchronize method is used in all instances of display update.

Geert -- yes, the threads will call methods contained in this global db object at the same time, but probably not the same inidividual method.  I thought the use of a critical section is approriate in itself, unless there is something else I need to consider?  How do I protect prmcode? I thought each call to [ValFromMyCode] would create its own instance of prmcode on the stack
Geert GOracle dbaCommented:
yes, that's right, that don't seem to be the problem
hmmm i can't really come up with anything more
we had this too, it was a sendmessage sent at closure.  
sometimes it didn't work, took a time to find too
brenlexAuthor Commented:
Put down to fault in third party components.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Development

From novice to tech pro — start learning today.