Link to home
Start Free TrialLog in
Avatar of Tolgar
Tolgar

asked on

How to re-write this code with httpclient and/or httpURLconnection instead of httpunit?

Hi,
I have the following code which uses com.meterware.httpunit package.

How can I rewrite this code using httpclient and/or httpURLconnection instead of httpunit package?

Note: This code parses 2nd table in the web page and generates something like:

text1 --> texta
text2 --> textb
text3 --> textc

The source of this page looks like this:
		   *  <tr>
		   *  <td><a href="somepath/check.html#check_del_dir">del_dir</a>
		   *  <td><a href="report.check.log.html#del_dir"><font color=green>pass</font></a>
		   *  <tr>
		   *  <td><a href="somepath/check.html#check_type">type</a>
		   *  <td><a href="report.check.log.html#type"><font color=red>fail</font></a>

Open in new window



			try {
    		    HttpUnitOptions.setScriptingEnabled(false);
    		    WebConversation wc = new WebConversation();
    		    // Check if the URL is valid
    		    if (checkReportPath.startsWith("http://") && (checkReportPath.indexOf("/check/report.html") > -1)){
	    		    WebResponse wr = wc.getResponse(checkReportPath);
	    		    WebTable table = wr.getTables()[2];
	    		    java.lang.String[][] cells = table.asText();
	    		    
	    		    for(java.lang.String[] row : cells) {
	    		    	boolean column = true;
	    				for(java.lang.String cell : row) {
	    					if (column){
	        			    	checkActivity.append(cell).append(" --> ");
	        			    	column = false;
	    					}
	    					else {
	    						checkActivity.append(cell);
	    					}
	    				}
	    				checkActivity.append("\n");
	    			}	
    		    }
    		}

Open in new window

Avatar of for_yan
for_yan
Flag of United States of America image

I recently did just that.
This is how it was (I now commented it out):

//           HttpUnitOptions.setScriptingEnabled(true);
    //     HttpUnitOptions.setExceptionsThrownOnScriptError( false );
    //     WebConversation wc = new WebConversation();
    //    String urlString = "http://your_target_address.com";
//   WebRequest req = new GetMethodWebRequest( urlString );
           //       response = wc.getResponse( req );

              //    String result =  response.getText();

Open in new window


This is how it is now without HttpUnit:

                    String result = "";
                     URL my_source = new URL("http://your_target_address.com");
          URLConnection yc = yahoo.openConnection();
          BufferedReader in = new BufferedReader(
                                  new InputStreamReader(
                                  yc.getInputStream()));
          String inputLine;

          while ((inputLine = in.readLine()) != null)
            //  System.out.println(inputLine);
                     result += inputLine + System.getProperty("line.separator");
          in.close();

Open in new window


I guess, your main point is how to parse the tables.
And HttpUnit does provide some conveniences there, that is true,
but with HttpUnit I recently experienced OutOfMemory
errors after some time of operation,
and I had to switch to standard Java classes.

There is a number of HTML parsers around, perhaps they can help to manipulate
the output and get table data; there was recently a question with a list
of such parsers. Let me try to find it.
Avatar of Tolgar
Tolgar

ASKER

Sorry,  I couldn't get why you used yahoo.openConnection in this example.

Thanks,
Avatar of Tolgar

ASKER

Yes, I need to parse the second table. If you can find a better parser it would be great.

Thanks,
I'm sorry:
yahoo.openConnection()
should of course  be changed to
my_source.openConnection()
in this context (I changed in one place but noty in another)

I'm still looking for this question where there were links to many parsers, I should find it



For some reason cannot find that question (I never can find anything using EE search - don't know why),
but these are some links with discussions of different parsers:

http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers

http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Why do you want to go away from HttpUnit - are you experiencing something similar
which I had  with OutOfMemory issue?

Avatar of Tolgar

ASKER

No. It is just a design decision. The reason is it may overload the code with some unnecessary stuff. It looks like it was written for unit testing.


OK, then you probably don't want to use any parser, and want to somehow
parse it yourself.
I actually thought that parsing piece of HTttpUnit was fine, and I wouyld have been happy with it, but I had
this issue with OutOfMemory
Avatar of CEHJ
Why do you want to avoid HttpUnit?
Avatar of Tolgar

ASKER

It was just a design decision.

I thought it may be so heavy for just parsing a source code/

Thanks,
SOLUTION
Avatar of Mick Barry
Mick Barry
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial