I have encountered a tough thread handling issue, and need help!
I am programming an HTML parser in Nutch, which is a multi-thread environment. That said, when a web page is identified as having the content type of "html/text", it will be sent to my HTML parser for processing. No doubt is that so many web pages will be processed in that way and in a multi-thread way.
In the HTML parser, I will need to read and write some Berkeley database (I decided not to use mysql due to performance issues) to in order to do a customized parsing. The way I am current doing is that:
1. create a Berkeley database environment at the very beginning;
2. create a few database instances following on (since I have to use multiple databases I create one instance for each of the databases);
3. use the instances to read / write database accordingly, and close the instances after the operations are finished. Since some database operations are dependent on others, there are some embedded database operations. For example, writing to table A in database 1 is caused as result of table B in database 2 having some specific values;
4. repeat 2-3 till all the operations are finished;
5. close the environment.
I have thought of the above solutions as workable, however, it is not. Some exhibited errors are:
A. "Attempt to use database while environment is closed." This error applies to 2, 3 and 4 all the way;
B. "Attempt to read / write database while database is closed." This error applies to 3 in particular;
C. "Attempt to close environment while some database is still open." This error applies to 5.
Please help me with designing a better strategy to solve the concurrent issue. Many thanks in advance.