• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 240
  • Last Modified:

Advice on Java threads

Hi,

I have encountered a tough thread handling issue, and need help!

I am programming an HTML parser in Nutch, which is a multi-thread environment. That said, when a web page is identified as having the content type of "html/text", it will be sent to my HTML parser for processing. No doubt is that so many web pages will be processed in that way and in a multi-thread way.

In the HTML parser, I will need to read and write some Berkeley database (I decided not to use mysql due to performance issues) to  in order to do a customized parsing. The way I am current doing is that:

1. create a Berkeley database environment at the very beginning;

2. create a few database instances following on (since I have to use multiple databases I create one instance for each of the databases);

3. use the instances to read / write database accordingly, and close the instances after the operations are finished. Since some database operations are dependent on others, there are some embedded database operations. For example, writing to table A in database 1 is caused as result of table B in database 2 having some specific values;

4. repeat 2-3 till all the operations are finished;

5. close the environment.

I have thought of the above solutions as workable, however, it is not. Some exhibited errors are:

A. "Attempt to use database while environment is closed." This error applies to 2, 3 and 4 all the way;

B. "Attempt to read / write database while database is closed." This error applies to 3 in particular;

C. "Attempt to close environment while some database is still open." This error applies to 5.

Please help me with designing a better strategy to solve the concurrent issue. Many thanks in advance.

0
wsyy
Asked:
wsyy
  • 3
  • 2
1 Solution
 
objectsCommented:
Sounds like you need to synchonise access to the database so that only one thread at a time is accessing the database
the other option would be to only have a single thread updating the database. With that scenario you would use a queue to store all the updates and the database thread would pull updates from that queue.

opening and closing the database for each operation may also be excessive and keeping the connection open may be a simpler approach.

surprised you had performance issues with mysql, we've used it on some very high throughput applications without any problems.
0
 
wsyyAuthor Commented:
objects, could you be more specific about the two solutions?

For the synchronizing database access, could you provide some example? More importantly, do you mean that I should synchronize 1 and 2 mentioned above?

For the other option, you mentioned a single thread and a queue. How can I implement it in an already multi-thread environment? Please provide more details about where, say 2 or 3, and how to implement them.

Thanks!
0
 
objectsCommented:
> do you mean that I should synchronize 1 and 2 mentioned above?

no 2 and 3

public synchronized void updateDatebase(StuffToAddToDatabase data)

> How can I implement it in an already multi-thread environment?

when a thread has an update it adds it to a (thread safe) queue

The database update thread pulls updates off the queue and applies them to the database
0
 
wsyyAuthor Commented:
object, I attached the code of some database operation below. I use Berkeley DB's recommended way to handle the concurrent issue. If I use "synchronized" in front of the method, do I still have to use the following code:

public void insertRecord(String url, UrlValue dv){
            DatabaseEntry keyEntry = new DatabaseEntry();
        DatabaseEntry dataEntry = new DatabaseEntry();
       
        StringBinding.stringToEntry(url.toLowerCase(), keyEntry);
        dataBinding.objectToEntry(dv, dataEntry);
       
        Transaction txn = env.beginTransaction(null, null);
       
        int retry_count = 1;
        while(retry_count<max_trial_num){
              try{
                status = primaryDB.put(txn, keyEntry, dataEntry);
                if (status != OperationStatus.SUCCESS) {
                              txn.commit();
                              break;
                }
              }catch (LockConflictException le) {
                    try {
                          if (txn != null) txn.abort();
                          retry_count++;
                          if (retry_count >= max_trial_num) {
                                break;
                          }
                    }catch(DatabaseException ae){
                          break;
                    }
              }catch (DatabaseException e) {
                    try {
                          if (txn != null)txn.abort();
                    } catch (DatabaseException ae) {
                    }
                    break;
              }
        }
}

As to the the thread safe queue, could you provide an example?


public void insertRecord(String url, UrlValue dv){
		DatabaseEntry keyEntry = new DatabaseEntry();
        DatabaseEntry dataEntry = new DatabaseEntry();
        
        StringBinding.stringToEntry(url.toLowerCase(), keyEntry);
        dataBinding.objectToEntry(dv, dataEntry);
        
        Transaction txn = env.beginTransaction(null, null);
        
        int retry_count = 1;
        while(retry_count<max_trial_num){
        	try{
                status = primaryDB.put(txn, keyEntry, dataEntry);
                if (status != OperationStatus.SUCCESS) {
         public void insertRecord(String url, UrlValue dv){
		DatabaseEntry keyEntry = new DatabaseEntry();
        DatabaseEntry dataEntry = new DatabaseEntry();
        
        StringBinding.stringToEntry(url.toLowerCase(), keyEntry);
        dataBinding.objectToEntry(dv, dataEntry);
        
        Transaction txn = env.beginTransaction(null, null);
        
        int retry_count = 1;
        while(retry_count<max_trial_num){
        	try{
                status = primaryDB.put(txn, keyEntry, dataEntry);
                if (status != OperationStatus.SUCCESS) {
					txn.commit();
					break;
                }
        	}catch (LockConflictException le) {
        		try {
        			if (txn != null) txn.abort();
        			retry_count++;
        			if (retry_count >= max_trial_num) {
        				break;
        			}
        		}catch(DatabaseException ae){
        			break;
        		}
        	}catch (DatabaseException e) {
        		try {
        			if (txn != null)txn.abort();
        		} catch (DatabaseException ae) {
        		}
        		break;
        	}
        }
	}              }else{
                	logger.debug("¿¿¿¿¿¿¿¿¿:url = " + url 
                			     + " code = " + dv.getCode()
                			     + " timestamp = " + dv.getLastCrawlTime());
                }
                txn.commit();
                break;
        	}catch (LockConflictException le) {
        		try {
        			if (txn != null) txn.abort();
        			retry_count++;
        			if (retry_count >= max_trial_num) {
        				logger.error("¿¿¿¿¿¿¿¿¿¿¿¿!url = " + url + le.toString());
        				break;
        			}
        		}catch(DatabaseException ae){
        			logger.error("¿¿¿transaction¿¿¿¿!url = " + url + le.toString());
        			break;
        		}
        	}catch (DatabaseException e) {
        		try {
        			if (txn != null)txn.abort();
        		} catch (DatabaseException ae) {
        			logger.error("¿¿¿transaction¿¿¿¿!url = " + url + ae.toString());
        		}
        		break;
        	}
        }
	}

Open in new window

0
 
objectsCommented:
> I use Berkeley DB's recommended way to handle the concurrent issue.

the errors you are getting would suggest the code is not thread safe

> As to the the thread safe queue, could you provide an example?

http://pguides.net/java/concurrency-producer-consumer
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now