I have written an application that crawl HTML documents and apply pre-defined regular expression on them (I can't change regular expressions). The application starts the crawling and apply these patterns and store the parsed results in DB.
Now the problem comes when a regular expression is applied on doc it get stuck and whole application goes to Not Responding. The application is a windows service and we don't monitor it all the time.
What I did to solve it by adding the regular expression matching code into another thread. There is another thread that monitor this regex thread if it get stuck for more than 30 secs. Then it Abort that regex.
Sometimes this approach works and most of the time the application again goes into Not Responding state. After analyzing the code I come to this point that the MatchCollection object that is passed into the regex thread when returns display this message in the debugger "Function evaluation disabled because a previous function evaluation timed out.... "
When we try to access any method or attribute of MatchCol object the application goes to Not Responding.
What should be the better work around for this? What should be the proper way to get the MatchCollection object from the regex thread? Should this object defined as global or it should be passed as reference? Should this object be returned from the thread?
What is the best way to tackle regex in threads?
I look forward to get a solution for this problem.