Need help with HttpUnit(sourceforge)

Hi,

I am using httpunit from sourceforge.net to extract links from webpages.

I am getting an exception  for "http://javaworld.pricegrabber.com".

  at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
       at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1

The above lines are repeating multiple times.

Can anyone help me.

My code is

import com.meterware.httpunit.*;
public class Test
{
public static void main(String args[]) throws Exception
{
     try
     {
if(args.length != 1)
{   System.out.println(" Usage : java Test URL");
     System.exit(0);
}

WebConversation wc = new WebConversation();
WebResponse   resp = wc.getResponse(args[0]);
WebRequest refreshRequest;
if((refreshRequest = resp.getRefreshRequest()) == null)
{
     WebLink w[] = resp.getLinks();
     for(int i = 0; i < w.length;i++)
     {
          if(w[i].getRequest().getURL().toString().startsWith("http"))
          System.out.println(w[i].getRequest().getURL());
     }
}
else
{   WebConversation wc1 = new WebConversation();
   // WebRequest refreshRequest = resp.getRefreshRequest(); // you can use this to send another request (based on the refresh value)

//    System.out.println(refreshRequest.getURL());
     WebResponse   resp1 = wc1.getResponse(refreshRequest.getURL().toString());
     
     WebLink w[] = resp1.getLinks();
     for(int i = 0; i < w.length;i++)
     {
          if(w[i].getRequest().getURL().toString().startsWith("http"))
          System.out.println(w[i].getRequest().getURL());
     }
     System.out.println("this" + w[1].getRequest().getURL());
}
}catch(Exception e)
     {
     System.out.println(e);
     e.printStackTrace();
     }
}
}
sumantedlaAsked:
Who is Participating?
 
aozarovCommented:
This site has a nasty 302 redirect machanisim (hops many times to detect spiders).
If you want you can disable auto redirect by adding:

WebConversation wc = new WebConversation();
wc.getClientProperties().setAutoRedirect(false); // Auto redirect in case of HTTP 302 (page moved)
// Uncomment this if you want
//wc.getClientProperties().setAutoRefresh(boolean autoRedirect); // Auto Refresh based on META REFRESH TAG

If you do want the auto-redirect to take place tell me and I will look at httpunit to see what breaks them in that case.
0
 
ucoolCommented:
I ran it using "http://www.google.com" instead of
 "http://javaworld.pricegrabber.com"
It works. I guess the other link may have some restriction since it has nothing come back.
0
 
Peter KwanAnalyst ProgrammerCommented:
Hi, sumantedla,

Can you please post the whole stack trace? Only the above three lines cannot tell you and us anything about what has happen.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
sumantedlaAuthor Commented:
What I want is,

whatever the url is , the program has to fetch the links from that webpage,even if there is a server side redirection or a client side redirection(using meta tag - refresh).

So I dont think I can compromise on

wc.getClientProperties().setAutoRedirect(false); // Auto redirect in case of HTTP 302 (page moved)

I want the redirection in all possible cases.

The stack trace is same as the above, it repeated many times.

at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
       at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1
       at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
       at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1
       at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
       at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1
       at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
       at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1

Thanks.
0
 
aozarovCommented:
>> So I dont think I can compromise on
Thats ok, I am just about to fix it ;-)
0
 
sumantedlaAuthor Commented:
I think I can improve my above code by setting these properties to true. Like,

WebConversation wc = new WebConversation();
wc.getClientProperties().setAutoRedirect(true); // Auto redirect in case of HTTP 302 (page moved)
wc.getClientProperties().setAutoRefresh(true); // Auto Refresh based on META REFRESH TAG
WebResponse   resp = wc.getResponse(args[0]);

   WebLink w[] = resp.getLinks();
   for(int i = 0; i < w.length;i++)
   {
         if(w[i].getRequest().getURL().toString().startsWith("http"))
         System.out.println(w[i].getRequest().getURL());
   }
Am I right??
0
 
aozarovCommented:
autoRedirect is already wc.getClientProperties().setAutoRedirect(true) true (otherwise it will not have the problem).
But you are correct regarding the auto refresh and skipping the logic that applies it manually.
0
 
aozarovCommented:
sumantedla, I need to go for 3-4 hours.
I changed the class RedirectWebRequest inside WebClient.java.

class RedirectWebRequest extends WebRequest {


    RedirectWebRequest( WebResponse response ) throws MalformedURLException {
        super( null, getRedirectURL(response), response.getFrame(), response.getFrameName() );
        if (response.getReferer() != null) setHeaderField( "Referer", response.getReferer() );
    }


    // Probably a better place to fix it is inside WebRequest (should use only _urlString if value starts with "http://")
    private static String getRedirectURL(WebResponse resonse) throws MalformedURLException
    {
          String original = response.getURL().toString();
          String location = response.getHeaderField( "Location" );
          if (location == null || (location = location.trim()) == "")
                throw new MalformedURLException("Missing Location header");
          
          String newURL = null;
          if (location.toLowerCase().startsWith("http://"))
            newURL = location;
          else
            newURL = new java.net.URL(original, location).toString();

          if (newURL.equals(original))
                throw new MalformedURLException("Location header is the same as original URL");

          return newURL;
    }

    /**
     * Returns the HTTP method defined for this request.
     **/
    public String getMethod() {
        return "GET";
    }
}

and also WebWindow.java updateWindow method (line 292) to include a printout for the redirect.
} else if (shouldFollowRedirect( response )) {
            delay( HttpUnitOptions.getRedirectDelay() );
          System.out.println("302 redirect -> URL:" + response.getURL() + " Location:" + response.getHeaderField( "Location" ));
            return getResponse( new RedirectWebRequest( response ) );
        }

The problem was that it it kept redirecting to the same page.
If you run the your program with the above changes you will see that the site keep changing the Location (strange :-8)
It seems that the site behave different based on the client (user-agent). (for example for Wget it redirects until it decides its a spider).
You can use the ClientOptions to change the UserAgent (setUserAgent(java.lang.String userAgent) ) and see it helps.
When I come back I am going to sniff the network (using ethereal) to see exactly how this site behave when using IE vs httpunit.
0
 
sumantedlaAuthor Commented:
Ok,

I will try both the options.

Thanks.
0
 
sumantedlaAuthor Commented:


>>newURL = new java.net.URL(original, location).toString();

The URL class has no constructor matching the above.So I changed the code accordingly, by making "original" of type URL.


I tried few user agents, one of them is     Googlebot/2.1 (+http://www.google.com/bot.html)

For that I got,
java.lang.RuntimeException: Error loading included script: java.net.ConnectException: Connection refused: connect
        at com.meterware.httpunit.ParsedHTML.getScript(ParsedHTML.java:341)
        at com.meterware.httpunit.ParsedHTML.interpretScriptElement(ParsedHTML.java:319)
        at com.meterware.httpunit.ParsedHTML.access$700(ParsedHTML.java:37)
        at com.meterware.httpunit.ParsedHTML$ScriptFactory.recordElement(ParsedHTML.java:489)
        at com.meterware.httpunit.ParsedHTML$2.processElement(ParsedHTML.java:702)
        at com.meterware.httpunit.NodeUtils$PreOrderTraversal.perform(NodeUtils.java:195)
        at com.meterware.httpunit.ParsedHTML.loadElements(ParsedHTML.java:718)
        at com.meterware.httpunit.ParsedHTML.getLinks(ParsedHTML.java:118)
        at com.meterware.httpunit.WebResponse.getLinks(WebResponse.java:405)
        at HtmlUtil.getOutLinks(HtmlUtil.java:60)
        at HtmlUtil.main(HtmlUtil.java:176)

I am getting the same exception for some other urls as well like
http://www.javaworld.com/news-reviews/index.shtml
http://www.javaworld.com/feedback

and for
http://www.computerweekly.com/Article138254.htm?src=rssNews
I got java.lang.RuntimeException: Error loading included script: java.net.UnknownHostException: jscript

Can we solve the problem?? :-/
0
 
sumantedlaAuthor Commented:
I tried another user agent , google.

gsa-crawler (Enterprise; GID-01422; jplastiras@google.com)


Then it was working. :)

But for the remaining urls as mentioned above, it is still the same errors,

java.lang.RuntimeException: Error loading included script: java.net.UnknownHostException: jscript
java.lang.RuntimeException: Error loading included script: java.net.ConnectException: Connection refused: connect


0
 
sumantedlaAuthor Commented:
Sorry, i tried it with the wrong url.

It was working for http://store.sun.com

but not for http://javaworld.pricegrabber.com

The exception is
java.lang.RuntimeException: Error loading included script: java.net.ConnectException: Connection refused: connect
        at com.meterware.httpunit.ParsedHTML.getScript(ParsedHTML.java:341)
        at com.meterware.httpunit.ParsedHTML.interpretScriptElement(ParsedHTML.java:319)
        at com.meterware.httpunit.ParsedHTML.access$700(ParsedHTML.java:37)
        at com.meterware.httpunit.ParsedHTML$ScriptFactory.recordElement(ParsedHTML.java:489)
        at com.meterware.httpunit.ParsedHTML$2.processElement(ParsedHTML.java:702)
        at com.meterware.httpunit.NodeUtils$PreOrderTraversal.perform(NodeUtils.java:195)
        at com.meterware.httpunit.ParsedHTML.loadElements(ParsedHTML.java:718)
        at com.meterware.httpunit.ParsedHTML.getLinks(ParsedHTML.java:118)
        at com.meterware.httpunit.WebResponse.getLinks(WebResponse.java:405)
        at HtmlUtil.getOutLinks(HtmlUtil.java:60)
        at HtmlUtil.main(HtmlUtil.java:176)
0
 
aozarovCommented:
Ok
I found the problem and I have a fix for it.
There is no need for my previous change (to revet it replace
super( null, getRedirectURL(response), response.getFrame(), response.getFrameName() );
with
super( response.getURL(),  response.getHeaderField( "Location" ), response.getFrame(), response.getFrameName() );
which is the previous line).

The problem is that this page does two things (maybe to protect itself from hackers DS or spiders).
1. it hops over several links (with 302 return code) with different cookies settings (and check those values)
2. it contains a broken link to one of its .js file (and that code is suppose to be invoked automatically)

To fix the problems add to HtmlUtil (your class) after the line "WebConversation wc = new WebConversation();"
com.meterware.httpunit.cookies.CookieProperties.setPathMatchingStrict(false);
HttpUnitOptions.setScriptingEnabled(false);

The first will fix the cookies hops problem and the second the onLoad invocation of the broken script.
There is still a problem (similar to the one you mentioned above) when you are calling getLinks() which currently
loads all the .js files and then invoke them (though the engine will not do anything if getScriptingEnabled == false)
There is no point of loading them when getScriptingEnabled is set to false as they are not going to be processsed.
Loading them cause the problem (because that link is broken). I fixed the code to not load them if getScriptingEnabled is false.
The change is in ParsedHTML.java (line 1131).
add this line: if (HttpUnitOptions.isScriptingEnabled())
before: parsedHTML.interpretScriptElement( element );

If that fix your problem and the rest seems to work fine I will report a bug for it.
0
 
aozarovCommented:
Also, there is no need to change the userAgent (at least not for this offending site).
0
 
sumantedlaAuthor Commented:
Yep, I will try and let you know.
0
 
sumantedlaAuthor Commented:
I made the changes and still it is not working.

But when I changed the user agent to "gsa-crawler (Enterprise; GID-01422; jplastiras@google.com)" it was working. It was working for all the above mentioned urls.

Then I tried with http://www.csoonline.com/ ,  and again not working. Shall we go in other way ??

My requirement is to fetch links from around 5000 pages. So I dont mind even the program is unable to get the links from few pages. But the program shouldnt block at any particular url.

My program is hanging for http://www.csoonline.com/. I think for some other urls also , I might face these sort of problems.

Is there any way that my program skips a particular url when it is blocked for more than say 10 seconds. I even dont mind an exception. But the program should run to the completion(5000 pages) without any big delays.

Approxiamately it is taking 4 minutes for 250 urls. So for 5000 urls it might take one hour.

Is there any way out???

0
 
sumantedlaAuthor Commented:


Adding to the list, I am also getting exceptions like
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
and
java.io.EOFException
        at java.util.zip.GZIPInputStream.readUByte(Unknown Source)
        at java.util.zip.GZIPInputStream.readUShort(Unknown Source)
        at java.util.zip.GZIPInputStream.readHeader(Unknown Source)
        at java.util.zip.GZIPInputStream.<init>(Unknown Source)
        at java.util.zip.GZIPInputStream.<init>(Unknown Source)
        at com.meterware.httpunit.WebResponse.defineRawInputStream(WebResponse.java:810)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:60)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:67)
        at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:76)
        at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:165)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:145)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:145)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:145)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:102)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:87)
        at HtmlUtil.getOutLinks(HtmlUtil.java:34)
        at BaseSet.main(BaseSet.java:112)
0
 
aozarovCommented:
>I made the changes and still it is not working.
Are you sure you applied all the changes.
It was working fine for me (at least for both http://store.sun.com and http://javaworld.pricegrabber.com)

> Then I tried with http://www.csoonline.com/ ,  and again not working. Shall we go in other way ??
What are the symptoms for this one?

> My program is hanging for http://www.csoonline.com/. I think for some other urls also , I might face these sort of problems.
> Approxiamately it is taking 4 minutes for 250 urls. So for 5000 urls it might take one hour.
Make sure you applied the above changes correctly (when I get home I will try it against http://www.csoonline.com/)
For scalability try to create several threads each time to process the sites concurrently.
You can also have a "WatchDog" thread that will stop a thread if it is running to long.

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Run your program with -Xmx<bigger memory size>
(e.g
java -Xmx1024m HtmlUtil
To run it with 1G of memory, assuming you have it available)
Also, adding memory will definetly help for your performance.
0
 
aozarovCommented:
A bit more about such architecture.
You can have a Thread Pool (set of threads but not to many of them [50 sounds good])  and a queue for tasks.
you add the requests for the links in the queue of tasks.
Each thread is run fovever and keeps taking tasks from that queue.
The thread should process that request and extract all the links from it. The thread can either put the links in the "results space" or even post the links back to the queue
if you want to do something similar to recursive spider.
Basically the thread logic can be:
while (true)
{
try
{
String url = takeURLFromTheQueue()
notify WatchDog thread for start
process URL (as you do now)
add results to output
notify WatchDog thread for end
}
catch (Exception ex)
{
// printout but don't terminate the thread [so you can process other requests]
}
}

Once the WatchDog thread kills a thread that was too slow it should create another one instead.
Tell me if you need more info about the architecture.

0
 
sumantedlaAuthor Commented:
For http://www.csoonline.com/ , the exceptions are the below. Those are repeating multiple times.

at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
 at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
 at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:141)
 at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
 at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
 at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:141)
 at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
 at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
 at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:141)
0
 
sumantedlaAuthor Commented:

Yep, it was working for both http://store.sun.com and http://javaworld.pricegrabber.com

0
 
sumantedlaAuthor Commented:
What are the timings you will be monitoring EE?? Just to synchronize my clock with your hours. :)
0
 
aozarovCommented:
I am on EST time. I read and reply when I am at work but serious stuff I do from home (get there at about 18:00 EST).
0
 
aozarovCommented:
It does working for me with http://www.csoonline.com/ 
Let's compare
1. I don't set the userAgent
2. replace back (in RedirectWebRequest)
super( null, getRedirectURL(response), response.getFrame(), response.getFrameName() );
to
super( response.getURL(),  response.getHeaderField( "Location" ), response.getFrame(), response.getFrameName() );
3. add to your Main class (after WebConversation wc = new WebConversation();")
com.meterware.httpunit.cookies.CookieProperties.setPathMatchingStrict(false);
HttpUnitOptions.setScriptingEnabled(false);
4. add the following line:
if (HttpUnitOptions.isScriptingEnabled())
before parsedHTML.interpretScriptElement( element ); [ParsedHTML.java (line 1131).]
So that statement will not be executed if scripting is not enabled.

If all match and  you still get the error then send HtmlUtil and I will check it here (as I am using a test class based on your earlier file).
I will be home in about 2 hours.

0
 
sumantedlaAuthor Commented:
thanks, I am on CST. I am not that good at multithreading. Is there any resource where the multithreading is dealt in depth??
0
 
sumantedlaAuthor Commented:
>> before parsedHTML.interpretScriptElement( element ); [ParsedHTML.java (line 1131).]

In my files, the code is from ParsedHTML. the line number is not 1131 but 489.

I cahnged in the following class.

    static class ScriptFactory extends HTMLElementFactory {

        HTMLElement toHTMLElement( NodeUtils.PreOrderTraversal pot, ParsedHTML parsedHTML, Element element ) {
            return null;
        }

        void recordElement( NodeUtils.PreOrderTraversal pot, Element element, ParsedHTML parsedHTML ) {
      if (HttpUnitOptions.isScriptingEnabled())
                      parsedHTML.interpretScriptElement( element );
         }
    }
0
 
aozarovCommented:
>> if (HttpUnitOptions.isScriptingEnabled())
>>                      parsedHTML.interpretScriptElement( element );

Right, this is the needed change.
I think I am using the last httpunit version, which version are you using?
And do you get this exception even with this line?
0
 
sumantedlaAuthor Commented:
I am using 1.6

I am getting the exception even after chaning the line.

I checked my previous output files, amazingly there was no problem with  http://www.csoonline.com/
 I dont know why its not working now.
0
 
sumantedlaAuthor Commented:
I made a change now and it was working.

previously :

      WebConversation wc = new WebConversation();
      com.meterware.httpunit.cookies.CookieProperties.setPathMatchingStrict(false);
      HttpUnitOptions.setScriptingEnabled(false);
               
                // this line is the main difference.
      wc.getClientProperties().setAutoRefresh(true); // Auto Refresh based on META REFRESH TAG
      WebResponse  response = wc.getResponse(url);
                  
      WebLink links[];

      links = response.getLinks();


Now:

      WebConversation wc = new WebConversation();
      com.meterware.httpunit.cookies.CookieProperties.setPathMatchingStrict(false);
      HttpUnitOptions.setScriptingEnabled(false);
               WebResponse  response = wc.getResponse(url);
               WebRequest refreshRequest;
               WebLink links[];
      refreshRequest = response.getRefreshRequest();
      if (refreshRequest != null)
      {      response = wc.getResponse(refreshRequest.getURL().toString());
      }
      //      Grab links.
      links = response.getLinks();
0
 
sumantedlaAuthor Commented:
I  have the 5000 urls in a hashmap. For these 5000 urls I have to do the multithreading. After each url is processed, the results should be written to someother data strucure say array of 5000 vectors to store the outgoing links of each url.

how the WatchDog should be implemented??

What are the data structures that fits for this??

Thanks
0
 
aozarovCommented:
>> I  have the 5000 urls in a hashmap....
For so much data you might want to consider saving the intermediate output to disk.

>> how the WatchDog should be implemented??
A simplest aproach is to use java.util.Timer. each worker/Thread will add a TimerTask to stop itself (after x seconds) before it starts
processing a request. Once the request was completed the worker will cancel the TimerTask (so it will not be stoped).
If for some reason the worker is working too long the TimerTask will kick in and stop the worker thread (the same task should also create a new worker thread). see: http://javaalmanac.com/egs/java.util/ScheduleLater.html


>> What are the data structures that fits for this??
A list (LinkedList or similar) to store the tasks (Url to retreive). a HashMap to store the result. Each woker thread will remove
from the list a url (to avoid other to process the same url - note access to this should be synchronized). then it will process it
and save the result in the result map (key = url of the site, value= List of links).
0
 
sumantedlaAuthor Commented:
Is this basic design sufficient?? I created 50 threads and the urlpool has the 5000 urls in it. Url pool is a static member.

Will this work or does it need any modifications??

public static void main(String args[])
{
for(int j = 0; j < 50; j++)
{
      ThreadUtil bg = new ThreadUtil();   // creating 50 threads
      bg.start();
      bg.join();   // i dont know whether this is good or not
}// end of for
}


public void run()
{
      try
      {
      while(!(urlPool.isEmpty()))  // urlPool is static member
            {
            int numberOfMillisecondsInTheFuture = 10000; // 10 sec
                Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
                Timer timer = new Timer();
                timer.schedule(new TimerTask() {
                  public void run() {
                                                    // kill this thread and create a new thread
                                                     // ignore the url
                                                           }
                                                                                                                  }, timeToRun);

            String url = (String)urlPoolDup.remove(0);
            HashMap links = HtmlUtil.getOutLinks(url);
            //Update the result space with links
            }
      }
      catch(Exception e)
            {
            
            }
}
0
 
aozarovCommented:
>>   bg.join();   // i dont know whether this is good or not
No it is not as, at least not here as it will wait at this point until thread is completed, so in effect only one thread will run.
You can save the threads in a list and then after the invocation iterate again and join to each one.

>> Timer timer = new Timer();
Make that static so all will share the same one (because each Timer instance creats a thread).

>> HashMap links = HtmlUtil.getOutLinks(url);
Save the TimerTask and after the above line call cancel on it (otherwise it will terminate your thread).

>> String url = (String)urlPoolDup.remove(0);
what if at that point the pool is empty? check it to avoid runtime exceptions.
0
 
aozarovCommented:
sumantedla, which timezone are you from? was expecting for some results ... ;-)
0
 
sumantedlaAuthor Commented:
I am in CST.

I am not comfotable with threads, so it is taking some time for me to learn and implement this.  The httpunit is working pretty fine now eventhough there are some exceptions coming up.

Now my concern is the performance. So I am working on threads and let you know the result without asking more questions by tomorrow , i mean 10th( I will try my best for not asking more questions :)  )
0
 
aozarovCommented:
>> I am not comfotable with threads
http://java.sun.com/docs/books/tutorial/essential/threads/

>> I will try my best for not asking more questions :)  
Don't worry I am fine with that ;-)

>> Now my concern is the performance
Increasing the memory and/or dumping temporary results to file may help a lot.
0
 
sumantedlaAuthor Commented:
Thanks , thats a good tutorial.
Some more quesitons for you.
Now this is my code.
Some of my static declarations in class BaseSet are
      static LinkedList baseList = new LinkedList();  // contains the 5000 urls
      static List list;  
      static  Timer timer = new Timer();
      static Hashtable graph = new Hashtable(5000);  // result space


list = Collections.synchronizedList(baseList);        // baseList is a LinkedList with all urls
                                                                       // list is static List list;  in class BaseSet
LinkedList threadPool = new LinkedList();
for(int i = 0; i < 50; i++)
{      
      Thread t = new ThreadUtil();
      threadPool.add(t);
      t.start();
                // where and when to call join()
}

class ThreadUtil extends Thread
{
public void run()
{
      while(!(BaseSet.list.isEmpty()))
      {      int numberOfMillisecondsInTheFuture = 10000; // 10 sec
            Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
            TimerTask task = new TimerTask(){
            public void run()
            {
            // How to kill the Thread here.
            //Thread.currentThread().destroy();
            // Does this work?? I dont think so.
            }};
            BaseSet.timer.schedule(task, timeToRun);
      String url;
      HashMap links;
      if(!(BaseSet.list.isEmpty()))
      {      url = (String)(BaseSet.list.remove(0));
            links = HtmlUtil.getOutLinks(url);
            task.cancel();
                      BaseSet.graph.put(url,links);
                                // graph is a hashtable declared in BaseSet as static Hashtable graph = new Hashtable(5000);
      }
}
}
};


(1) How to kill the thread and create a new one and update the threadPool??
(2) When to call join on these threads??
(3) Are the datastructures appropriate??
(4) Did I miss any synchronization problems??

If I missed some info, let me know.
0
 
aozarovCommented:
>> static Hashtable graph = new Hashtable(5000);  // result space
For 5000 items make it (int) (5000 * 1.5) as hashtable have the loadfactor element.

(1) How to kill the thread and create a new one and update the threadPool??
// {
          // How to kill the Thread here.
          //Thread.currentThread().destroy();
          // Does this work?? I dont think so.
          No, that is no good. (do the one bellow)
          ThreadUtil.this.stop(); // though deprecated it is what you need here.
          ThreadUtil newOneInstead = new ThreadUtil();
          threadPool.add(newOneInstead); // Make threadPool also synchronized and expose it to this method (e.g as you do for BaseSet.list)
}};

2) When to call join on these threads??

LinkedList threadPool = Collections.synchronizedList(new LinkedList());
for(int i = 0; i < 50; i++)
{    
     Thread t = new ThreadUtil();
     threadPool.add(t);
     t.start();
}

while (!threadPool.isEmpty())
{
ThreadUtil thread = (ThreadUtil) threadPool.remove(0);
thread.join();
}

(3) Are the datastructures appropriate??
(4) Did I miss any synchronization problems??
See my changes (didn't compile it though)

public String getURLTask()
{
synchronized (BaseSet.list)
{
if (BaseSet.list.isEmpty())
return null;

return BaseSet.list.remove(0);
}
}

public void run()
{
     String urlTask;
     while((urlTask = getURLTask()) != null)
     {     int numberOfMillisecondsInTheFuture = 10000; // 10 sec
          Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
          TimerTask task = new TimerTask(){
          public void run()
          {
                   ThreadUtil.this.stop(); // though deprecated it is what you need here.
                   ThreadUtil newOneInstead = new ThreadUtil();
                   threadPool.add(newOneInstead); // Make threadPool also synchronized and expose it to this method (e.g as you do for BaseSet.list)
          }};          
         BaseSet.timer.schedule(task, timeToRun);
         try
         {
                 HashMap links = HtmlUtil.getOutLinks(urlTask); // Why HashMap? Is List not good (what is the key what is the value)?
                 BaseSet.graph.put(urlTask,links);
         }
         catch (Exception ex)
         {
             // Log or do whatever when getOutLinks throws an exception
         }
         finally
         {
                task.cancel();
         }
     }
}
}
0
 
sumantedlaAuthor Commented:


The method getOutLinks is written in such a way that it returns a hashmap with keys as urls and the count of occurences of that particular url in a webpage as value( This value has no importance).

 Why I used a hashmap is to filter the duplicate links in a webpage. Is there any other efficient way to do this.
0
 
aozarovCommented:
>> Why I used a hashmap is to filter the duplicate links in a webpage. Is there any other efficient way to do this.
Put it in a HashSet.
0
 
sumantedlaAuthor Commented:
I will change it to HashSet.
I created a synchronized threadList and added threads to it.

list = Collections.synchronizedList(baseList);

LinkedList threadPool = new LinkedList();
for(int i = 0; i < 50; i++)
{      
      Thread t = new ThreadUtil();
      threadPool.add(t);
      t.start();
}
threadList = Collections.synchronizedList(threadPool); // threadList is static
while (!threadPool.isEmpty())
{      ThreadUtil thread = (ThreadUtil) threadPool.remove(0);    
                 // do we need to delete them here?? Any specific reason    
      thread.join();
}

class ThreadUtil extends Thread
{
      public void run()
      {   String urlTask;
           while((urlTask = getURLTask()) != null)
       {      int numberOfMillisecondsInTheFuture = 10000; // 10 sec
            Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
            TimerTask task = new TimerTask(){
            public void run()
            {      ThreadUtil.this.stop();
                    ThreadUtil newOneInstead = new ThreadUtil();
                     BaseSet.threadList.add(newOneInstead);        
            // here we are adding new thread to list , but not starting the thread.
           // should i say , newOneInstead.start() and newOneInstead.join() here itself???
            }};
                                 BaseSet.timer.schedule(task, timeToRun);
            try
            {      HashMap links = HtmlUtil.getOutLinks(urlTask);
                  BaseSet.graph.put(url,links);
            }
            catch(Exception ex)
             {      e.printStackTrace();
            }
            finally
             {
                  task.cancel();
                                                // is this ok to put here
                                                // or need to put after HashMap links = HtmlUtil.getOutLinks(urlTask);
                                                // why I am asking is BaseSet.graph.put(url,links); can take few milliseconds and
                                                 //  inthe meanwhile the time migth elapse
                                                // is it not??
             }
            }//      while
      }// run
}//      ThreadUtil
0
 
sumantedlaAuthor Commented:
one more change,

I made getURLTask() static in BaseSet class.


            static public String getURLTask()
            {      synchronized (BaseSet.list)
                  {      if (BaseSet.list.isEmpty())
                        return null;
                        return (String)BaseSet.list.remove(0);
                  }
            }
0
 
aozarovCommented:
// here we are adding new thread to list , but not starting the thread.
           // should i say , newOneInstead.start() and newOneInstead.join() here itself???
Right I forgot the newOneInstead.start() [ no need for newOneInstead.join()  though]

// why I am asking is BaseSet.graph.put(url,links); can take few milliseconds and
                                                 //  inthe meanwhile the time migth elapse
                                                // is it not??
Why should it take long? isn't it just adding it to a hashtable (that operation speed is neglegable).
In anycase if you want to do it after then you need to declare HashMap outside the map and then check if not null before adding to map.
0
 
sumantedlaAuthor Commented:

I executed the program. I am getting exceptions of type(multiple exceptions)

java.lang.reflect.InvocationTargetException
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at com.meterware.httpunit.parsing.NekoHTMLParser.parse(NekoHTMLParser.java:41)
        at com.meterware.httpunit.HTMLPage.parse(HTMLPage.java:255)
        at com.meterware.httpunit.WebResponse.getReceivedPage(WebResponse.java:1126)
        at com.meterware.httpunit.WebResponse.getLinks(WebResponse.java:405)
        at HtmlUtil.getOutLinks(HtmlUtil.java:67)
        at ThreadUtil.run(BaseSet.java:168)

Exception in thread "Thread-42" java.lang.NoClassDefFoundError
        at com.sun.crypto.provider.AESCipher.<init>(DashoA6275)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at java.lang.Class.newInstance0(Unknown Source)
        at java.lang.Class.newInstance(Unknown Source)
        at java.security.Provider$Service.newInstance(Unknown Source)
        at javax.crypto.Cipher.a(DashoA12275)
        at javax.crypto.Cipher.init(DashoA12275)
        at javax.crypto.Cipher.init(DashoA12275)
        at com.sun.net.ssl.internal.ssl.CipherBox.initCipher(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherBox.newCipherBox(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.newCipher(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.isAvailable(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.isAvailable(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuite.isAvailable(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuiteList.buildAvailableCache(Unknown Source)
        at com.sun.net.ssl.internal.ssl.CipherSuiteList.getSupported(Unknown Source)
        at com.sun.net.ssl.internal.ssl.SSLSocketFactoryImpl.getSupportedCipherSuites(Unknown Source)
        at com.sun.net.ssl.internal.ssl.ExportControl.checkCipherSuites(Unknown Source)
        at javax.net.ssl.SSLSocketFactory.getDefault(Unknown Source)
        at com.sun.net.ssl.HttpsURLConnection.getDefaultSSLSocketFactory(Unknown Source)
        at com.sun.net.ssl.HttpsURLConnection.<init>(Unknown Source)
        at com.sun.net.ssl.internal.www.protocol.https.HttpsURLConnectionOldImpl.<init>(Unknown Source)
        at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
        at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
        at java.net.URL.openConnection(Unknown Source)
        at com.meterware.httpunit.WebConversation.openConnection(WebConversation.java:111)
        at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:67)
        at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:164)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:102)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:87)
        at HtmlUtil.getOutLinks(HtmlUtil.java:39)
        at ThreadUtil.run(BaseSet.java:169)


0
 
sumantedlaAuthor Commented:
I missed some exceptions.

java.lang.reflect.InvocationTargetException
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at com.meterware.httpunit.parsing.NekoHTMLParser.parse(NekoHTMLParser.java:41)


 at com.meterware.httpunit.HTMLPage.parse(HTMLPage.java:255)
 at com.meterware.httpunit.WebResponse.getReceivedPage(WebResponse.java:1126)
 at com.meterware.httpunit.WebResponse.getLinks(WebResponse.java:405)
 at HtmlUtil.getOutLinks(HtmlUtil.java:67)
 at ThreadUtil.run(BaseSet.java:170)

java.lang.reflect.InvocationTargetException
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at com.meterware.httpunit.parsing.NekoHTMLParser.parse(NekoHTMLParser.java:41)
        at com.meterware.httpunit.HTMLPage.parse(HTMLPage.java:255)
        at com.meterware.httpunit.WebResponse.getReceivedPage(WebResponse.java:1126)
        at com.meterware.httpunit.WebResponse.getLinks(WebResponse.java:405)
        at HtmlUtil.getOutLinks(HtmlUtil.java:67)
        at ThreadUtil.run(BaseSet.java:170)

If you need more info, let me know.In case you want to see the run method,

class ThreadUtil extends Thread
{
      public void run()
      {
                 try{
           String urlTask;
            while((urlTask = BaseSet.getURLTask()) != null)
                {      int numberOfMillisecondsInTheFuture = 10000; // 10 sec
            Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
            TimerTask task = new TimerTask(){
            public void run()
            {      ThreadUtil.this.stop(); // though deprecated it is what you need here.
                  ThreadUtil newOneInstead = new ThreadUtil();
                  BaseSet.threadList.add(newOneInstead);
                  newOneInstead.start();
                  }};
                  BaseSet.timer.schedule(task, timeToRun);
                  System.out.println("the url is : " + urlTask);
                  try
                  {      HashMap links = HtmlUtil.getOutLinks(urlTask);
                        if (links!=null)
                              BaseSet.graph.put(urlTask,links);
                        else
                              BaseSet.graph.put(urlTask,"");
                  }
                  catch(Exception ex)
                   {      ex.printStackTrace();
                        }
                  finally
                   {
                        task.cancel();
                   }
                  }//      while
      }catch(Exception e)
       {
            e.printStackTrace();
       }
      }// run
}//      ThreadUtil
0
 
aozarovCommented:
Looks like something related to SSL (java JSSE implementation) and a missing/not configured cipher suite (AESCipher)
This is a java issue and not httpunit. I need to go now (will be back in 3-4 hours)
Search the web for "Exception" and  "com.sun.crypto.provider.AESCipher.<init>" and see what you find.
I will help you when I get back.
0
 
sumantedlaAuthor Commented:
when I commented out the code in TimerTask run method,

/*ThreadUtil.this.stop(); // though deprecated it is what you need here.
ThreadUtil newOneInstead = new ThreadUtil();
BaseSet.threadList.add(newOneInstead);
newOneInstead.start();*/

it seems to working fine. But the program is not coming to an end after the main method. Is there any need to do some clean up like stuff. Its hanging up.
0
 
aozarovCommented:
I don't see how commenting out the code above should fixed the exception you described.
This code should kick in to terminate long running requests. without it requests with certian problem may run forever (which might be
your problem).
Also when you create the java.util.Timer create it with true as constructor argument (as it needs to be a deamon thread which also
cause the main not to terminate).
0
 
sumantedlaAuthor Commented:

I created the Timer using the constructor which accepts boolean(true).


Now I am not getting any exceptions but the program is hanging. I uncommented the previous code.

ThreadUtil.this.stop(); // though deprecated it is what you need here.
ThreadUtil newOneInstead = new ThreadUtil();
BaseSet.threadList.add(newOneInstead);
newOneInstead.start();

0
 
sumantedlaAuthor Commented:
When I traced the program,

                  LinkedList threadPool = new LinkedList();
                  for(int i = 0; i < 10; i++)
                  {      
                        ThreadUtil t = new ThreadUtil();
                        threadPool.add(t);
                        t.start();
                  }
                  threadList = Collections.synchronizedList(threadPool);
                  while (!threadPool.isEmpty())
                  {      ThreadUtil thread = (ThreadUtil) threadPool.remove(0);
                        thread.join();
                  }

                                               System.out.println(" Never Reaching"); // this line is not getting printed
0
 
sumantedlaAuthor Commented:
I executed the program like 25 times.

But a couple of times, infact 3 times the program terminated.
0
 
sumantedlaAuthor Commented:
Tried in many ways. But no use. Did you get any idea on why it is happening. :(
0
 
aozarovCommented:
If you run it windows command line when you think the program should terminte press CTR+BREAK
This will produce a thread dump so we can see which thread is stuck with what. provide that dump here).
0
 
sumantedlaAuthor Commented:
This is the thread dump.

Full thread dump Java HotSpot(TM) Client VM (1.5.0_01-b08 mixed mode):

"DestroyJavaVM" prio=5 tid=0x0b176e10 nid=0x95c waiting on condition [0x00000000..0x0007fae8]

"Java2D Disposer" daemon prio=10 tid=0x0ad96300 nid=0xd14 in Object.wait() [0x0b00f000..0x0b00fbe8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0302f328> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x0302f328> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at sun.java2d.Disposer.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"AWT-Windows" daemon prio=7 tid=0x0ad90490 nid=0xf04 runnable [0x0af8f000..0x0af8fc68]
        at sun.awt.windows.WToolkit.eventLoop(Native Method)
        at sun.awt.windows.WToolkit.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Keep-Alive-Timer" daemon prio=9 tid=0x0ad4fe30 nid=0x97c waiting on condition [0x0af0f000..0x0af0fd68]
        at java.lang.Thread.sleep(Native Method)
        at sun.net.www.http.KeepAliveCache.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Timer-0" prio=5 tid=0x00ab3ef0 nid=0xec in Object.wait() [0x0accf000..0x0accf9e8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x02fcb4c0> (a java.util.TaskQueue)
        at java.lang.Object.wait(Unknown Source)
        at java.util.TimerThread.mainLoop(Unknown Source)
        - locked <0x02fcb4c0> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Unknown Source)

"Low Memory Detector" daemon prio=5 tid=0x00a91888 nid=0xb34 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=10 tid=0x00a90460 nid=0x890 waiting on condition [0x00000000..0x0ac0f8c0]

"Signal Dispatcher" daemon prio=10 tid=0x00a8f7e8 nid=0xaa0 waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=9 tid=0x00a86b58 nid=0xcf4 in Object.wait() [0x0ab8f000..0x0ab8fc68]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x02fcb650> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x02fcb650> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x00a856c8 nid=0xf34 in Object.wait() [0x0ab4f000..0x0ab4fce8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x02fcb6d0> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Unknown Source)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x02fcb6d0> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00a81580 nid=0xad4 runnable

"VM Periodic Task Thread" prio=10 tid=0x00a92a60 nid=0xee0 waiting on condition
0
 
sumantedlaAuthor Commented:
Just in case you want to see the program

                static List threadList;
      static LinkedList baseList = new LinkedList();
      static List list;
      static Timer timer = new Timer(true);
      static Hashtable graph = new Hashtable(7000);
      static int i = 0;


      public static void main(String[] args)
      {      try
            {      HttpUnitOptions.setScriptingEnabled(false);
                  long start = System.currentTimeMillis();
                  if(args.length != 2)
                  {   System.out.println(" Usage : java BaseSet Input-filename Outout-filename");
                        System.exit(0);
                  }
                  

                  // get the baseset
                  HashMap baseSet = BaseSet.buildBaseSet(args[0]);
                                                // this contains the urls as keys. values are not important.
                  System.out.println("The total number of links in Baseset are : " + baseSet.size());

                  
                  // convert the hashmap into linkedlist, to allow the threads working on the list.
                  Set keys = baseSet.keySet();
                  Iterator iterator = keys.iterator();
                  while (iterator.hasNext())
                  {      String key = (String) iterator.next();
                        baseList.add(key);
                  }
                  list = Collections.synchronizedList(baseList);
                  
                  LinkedList threadPool = new LinkedList();
                  for(int i = 0; i < 5; i++)
                  {      
                        ThreadUtil t = new ThreadUtil();
                        threadPool.add(t);
                        t.start();
                  }
                  threadList = Collections.synchronizedList(threadPool);
                  while (!threadList.isEmpty())
                  {      /*ThreadUtil thread = (ThreadUtil) threadPool.remove(0);
                        thread.join();*/
                        ((ThreadUtil) threadList.remove(0)).join();
                        System.out.println("this is reaching : " + ++i);                  
                  }
                  
                  System.out.println("the size of graph is : "  + graph.size());

                  long end = System.currentTimeMillis();
                  long elapsed = (end - start)/1000;
                  System.out.println("the total number of seconds is : " + elapsed);
            

            }//      try
            catch(Exception e)
            {
                  e.printStackTrace();
            }
      }

            static public String getURLTask()
            {      synchronized (BaseSet.list)
                  {      if (BaseSet.list.isEmpty())
                        return null;
                        return (String)BaseSet.list.remove(0);
                  }
            }

}


class ThreadUtil extends Thread
{
      public void run()
      {      try{
                 String urlTask;
                   while((urlTask = BaseSet.getURLTask()) != null)
                   {      int numberOfMillisecondsInTheFuture = 10000; // 10 sec
                        Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
                        System.out.println("the size is" + BaseSet.list.size());
                        TimerTask task = new TimerTask(){
                              public void run()
                              {      ThreadUtil.this.stop(); // though deprecated it is what you need here.
                                    ThreadUtil newOneInstead = new ThreadUtil();
                                    BaseSet.threadList.add(newOneInstead);
                                    newOneInstead.start();
                                    System.out.println(" Kill it");
                              }};
                  BaseSet.timer.schedule(task, timeToRun);
                  System.out.println("the url is : " + urlTask);
                  try
                  {      HashMap links = HtmlUtil.getOutLinks(urlTask);
                        if (links!=null)
                              BaseSet.graph.put(urlTask,links);
                        else
                              BaseSet.graph.put(urlTask,"");
                  }
                  catch(Exception ex)
                   {      ex.printStackTrace();
                        }
                  finally
                   {
                        task.cancel();
                   }
                  }//      while
      }catch(Exception e)
       {
            e.printStackTrace();
       }
      }// run
}//      ThreadUtil
0
 
aozarovCommented:
>> "Timer-0" prio=5 tid=0x00ab3ef0 nid=0xec in Object.wait() [0x0accf000..0x0accf9e8]
Are you sure you compiled after changing ( static Timer timer = new Timer(true);) as this thread seems not be be daemon thread.

Also you can call timer.cancel(); after
 System.out.println("the total number of seconds is : " + elapsed);
0
 
sumantedlaAuthor Commented:
I dont know why, but all of a sudden it started working. I didnt make any changes.

Do u see any thing wrong in the thread dump.

When I execute my program, some times(not always)  I am getting exceptions like

(1)Exception in thread "Thread-14" java.lang.NoClassDefFoundError
        at com.sun.crypto.provider.AESCipher.<init>(DashoA6275)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)


 (2)   at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at java.lang.Class.newInstance0(Unknown Source)
    at java.lang.Class.newInstance(Unknown Source)
    at java.security.Provider$Service.newInstance(Unknown Source)
    at javax.crypto.Cipher.a(DashoA12275)
    at javax.crypto.Cipher.init(DashoA12275)
    at javax.crypto.Cipher.init(DashoA12275)
(3)
    at com.sun.net.ssl.internal.ssl.CipherBox.initCipher(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherBox.newCipherBox(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.newCipher(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.isAvailable(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuite$BulkCipher.isAvailable(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuite.isAvailable(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuiteList.buildAvailableCache(Unknown Source)
    at com.sun.net.ssl.internal.ssl.CipherSuiteList.getSupported(Unknown Source)

(4)  at com.sun.net.ssl.internal.ssl.SSLSocketFactoryImpl.getSupportedCipherSuites(Unknown Source)
  at com.sun.net.ssl.internal.ssl.ExportControl.checkCipherSuites(Unknown Source)
  at javax.net.ssl.SSLSocketFactory.getDefault(Unknown Source)
  at com.sun.net.ssl.HttpsURLConnection.getDefaultSSLSocketFactory(Unknown Source)
  at com.sun.net.ssl.HttpsURLConnection.<init>(Unknown Source)
  at com.sun.net.ssl.internal.www.protocol.https.HttpsURLConnectionOldImpl.<init>(Unknown Source)
  at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
  at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
  at java.net.URL.openConnection(Unknown Source)
  at com.meterware.httpunit.WebConversation.openConnection(WebConversation.java:111)

(5)
Processed       at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:67)
        at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:164)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)

(6)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
       at java.lang.reflect.Constructor.newInstance(Unknown Source)
       at java.lang.Class.newInstance0(Unknown Source)
       at java.lang.Class.newInstance(Unknown Source)
       at java.security.Provider$Service.newInstance(Unknown Source)
       at javax.crypto.Cipher.a(DashoA12275)

(7)

       at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
       at com.sun.net.ssl.internal.www.protocol.https.Handler.openConnection(Unknown Source)
       at java.net.URL.openConnection(Unknown Source)
       at com.meterware.httpunit.WebConversation.openConnection(WebConversation.java:111)
       at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:67)
       at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:164)
       at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
       at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)

(8)
     at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
     at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
     at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
     at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
     at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:130)
     at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
     at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:102)
     at com.meterware.httpunit.WebClient.getResponse(WebClient.java:87)
     at HtmlUtil.getOutLinks(HtmlUtil.java:39)
     at ThreadUtil.run(BaseSet.java:204)
0
 
sumantedlaAuthor Commented:
One more observation.

All these exceptions are coming up when I increase the number of threads.

Initially it was 5,then there were no exceptions. But when I made it 25 , these exceptions are coming up.

0
 
aozarovCommented:
>> Do u see any thing wrong in the thread dump.
No, seems like all threads completed (only the Timer was there).

>>When I execute my program, some times(not always)  I am getting exceptions like
(1)Exception in thread "Thread-14" java.lang.NoClassDefFoundError
        at com.sun.crypto.provider.AESCipher.<init>(DashoA6275)

You don't have to print the STackTrace ;-)
catch(Exception ex)
                {     ex.printStackTrace();
                    }

Also you can see if it correlate to stoping the thread by calling System.out.println(" Kill it");
before the stop.
0
 
aozarovCommented:
>> Initially it was 5,then there were no exceptions. But when I made it 25 , these exceptions are coming up.
Try to increase the memory settings. also you should fine the optimal point (# of threads) for you (depends on your hardware settings).
0
 
sumantedlaAuthor Commented:
I tested with the kill it statement.

The exceptions are not correlated to the stopping of the thread.

One thing I noticed is the exceptions are coming up after the program is half way through.

One more is java.lang.reflect.InvocationTargetException

I think by the time program reaches the half way stage, the resources are getting dried and resulting in exceptions.
This is just a guess. Is there any possibility like this.

I executed like java -Xmx1024m BaseSet Input.txt
0
 
aozarovCommented:
How much memory you have? you can increase it.
java -Xmx1200m -Xms1200m BaseSet Input.txt
Also, as I suggested before, what about saving the output to a file instead of keeping it in memory?
And yes, maybe 25 is too much for your settings try to increase gradually and see where you get the peek performance.
0
 
sumantedlaAuthor Commented:

Does memory mean the RAM size or something else here. the ram size is 512 mb.

>>what about saving the output to a file instead of keeping it in memory?

Should I write the output to a file or direct the output like

java -Xmx1200m -Xms1200m BaseSet Input.txt  > output.txt
0
 
sumantedlaAuthor Commented:
How to find out the heap size??
0
 
aozarovCommented:
You mean how much memory you have?
Windows right? MyComputer/General (look at the bottom)
0
 
sumantedlaAuthor Commented:
Yes,

The processor speed is 1 Ghz and 512MB ram.
0
 
sumantedlaAuthor Commented:
So by how much I can  increase the memory??
0
 
aozarovCommented:
>> The processor speed is 1 Ghz and 512MB ram.
You have a crXpy computer. I think it is time for a change ;-)
Your max physical memory is only 512MB so don't make it higher.
You should run with -Xmx312m or something like that. (Need some space for the OS).
And with such poor computer you will probably not be able to run many threads and saving output to disk sounds more attractive.
0
 
sumantedlaAuthor Commented:
I will try all the possible variations and let you know.

Thanks.
0
 
sumantedlaAuthor Commented:
Hi,

This is all working fine now. Many thanks to you. Without your help I couldnt have done this. If any problem arises again, I will post here. Once again, thanks for helping me so much.

One last question. How you became so proficient in java?? I want to know because I want to emulate you.

Thanks Again.
0
 
aozarovCommented:
>If any problem arises again, I will post here.
Please do. :-)

>How you became so proficient in java??
Age :-(
I am using Java since it started.

Take care.
0
 
sumantedlaAuthor Commented:
Quite funny....:))

Thanks.
0
 
sumantedlaAuthor Commented:
I am sorry for asking a question again.
 When I executed my program, it again halted.

The thread dump is

Full thread dump Java HotSpot(TM) Client VM (1.5.0_01-b08 mixed mode):

"Thread-51" daemon prio=5 tid=0x1aaffa58 nid=0x260 runnable [0x1afef000..0x1afef9e8]
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        - locked <0x04d9ef00> (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
        at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
        - locked <0x04d99e40> (a sun.net.www.protocol.http.HttpURLConnection)
        at sun.net.www.protocol.http.HttpURLConnection.getHeaderFieldKey(Unknown Source)
        at com.meterware.httpunit.HttpWebResponse.loadHeaders(HttpWebResponse.java:216)
        at com.meterware.httpunit.HttpWebResponse.readHeaders(HttpWebResponse.java:198)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:56)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:67)
        at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:76)
        at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:164)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:102)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:87)
        at HtmlUtil.getOutLinks(HtmlUtil.java:37)
        at ThreadUtil.run(BaseSet.java:212)

"Keep-Alive-Timer" daemon prio=9 tid=0x1a697bc8 nid=0x954 waiting on condition [0x1b0af000..0x1b0af9e8]
        at java.lang.Thread.sleep(Native Method)
        at sun.net.www.http.KeepAliveCache.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Java2D Disposer" daemon prio=10 tid=0x1a655988 nid=0x6ec in Object.wait() [0x1a8ef000..0x1a8efbe8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0434fee8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x0434fee8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at sun.java2d.Disposer.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"AWT-Windows" daemon prio=7 tid=0x1a650330 nid=0x998 runnable [0x1a85f000..0x1a85fc68]
        at sun.awt.windows.WToolkit.eventLoop(Native Method)
        at sun.awt.windows.WToolkit.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Timer-0" daemon prio=5 tid=0x00ab3f10 nid=0x828 in Object.wait() [0x1a58f000..0x1a58f9e8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x042eb1a8> (a java.util.TaskQueue)
        at java.lang.Object.wait(Unknown Source)
        at java.util.TimerThread.mainLoop(Unknown Source)
        - locked <0x042eb1a8> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Unknown Source)

"Low Memory Detector" daemon prio=5 tid=0x00a918c0 nid=0x4f4 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=10 tid=0x00a90498 nid=0x8b0 waiting on condition [0x00000000..0x1a4cf8c0]

"Signal Dispatcher" daemon prio=10 tid=0x00a8f820 nid=0x80c waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=9 tid=0x00a86b90 nid=0x674 in Object.wait() [0x1a44f000..0x1a44fc68]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x042eb338> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x042eb338> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x00a85700 nid=0xa38 in Object.wait() [0x1a40f000..0x1a40fce8]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x042eb3b8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Unknown Source)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x042eb3b8> (a java.lang.ref.Reference$Lock)

"main" prio=5 tid=0x00036258 nid=0xd5c in Object.wait() [0x0007f000..0x0007fc38]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x048c4bd8> (a ThreadUtil)
        at java.lang.Thread.join(Unknown Source)
        - locked <0x048c4bd8> (a ThreadUtil)
        at java.lang.Thread.join(Unknown Source)
        at BaseSet.main(BaseSet.java:129)

"VM Thread" prio=10 tid=0x00a815b8 nid=0xd24 runnable

"VM Periodic Task Thread" prio=10 tid=0x00a92a98 nid=0xcd0 waiting on condition


Do I have to kill all threads explicitly?? Is that the problem??
0
 
aozarovCommented:
>> When I executed my program, it again halted.
That is because you have a thread "Thread-51" daemon prio=5 tid=0x1aaffa58 nid=0x260 runnable [0x1afef000..0x1afef9e8]" which is blocked on reading from a socket "at java.net.SocketInputStream.socketRead0(Native Method)"
That is strange as I thought you are settings the socket timeout via "sun.net.client.defaultReadTimeout". Isn’t that the case?
Also regardless the socket timeout settings I don't understand why it wasn't stop using the Timer task. Do you see any " Kill it" messages?
Do you want to provide here the final program so I will have another look at it?

>> Do I have to kill all threads explicitly?? Is that the problem??
No, it is not the problem as the threads will terminate naturally when BaseSet.getURLTask() returns null
0
 
sumantedlaAuthor Commented:
The following is BaseSet.java. I set the timeout options in HtmlUtil.java where we actually extract the links.
                  System.setProperty("sun.net.client.defaultConnectTimeout", "9000");
                  System.setProperty("sun.net.client.defaultReadTimeout", "9000");

      static List threadList;
      static LinkedList baseList = new LinkedList();
      static List list;
      static Timer timer = new Timer(true);
      static Hashtable graph = new Hashtable(7000);
      static int i = 0;
      static HashMap baseSet;

      public static void main(String[] args)
      {      try
            {      HttpUnitOptions.setScriptingEnabled(false);
                  //Thread.currentThread().setPriority(10);
                  long start = System.currentTimeMillis();
                  if(args.length != 2)
                  {   System.out.println(" Usage : java BaseSet Input-filename Outout-filename");
                        System.exit(0);
                  }
                  

                  // get the baseset
                  baseSet = BaseSet.buildBaseSet(args[0]); // this is a hashmap with string objects(urls)
                  System.out.println("The total number of links in Baseset are : " + baseSet.size());

                  // to eliminate the duplicates in the baseSet
                  // two urls are considered duplicates when they point to same webpage
                  baseSet = HashUtil.eliminateDuplicates(baseSet);
                  System.out.println("The total number of links in Baseset after elimination are : " + baseSet.size());
                  
                  // convert the hashmap into linkedlist, to allow the threads working on the list.
                  Set keys = baseSet.keySet();
                  Iterator iterator = keys.iterator();
                  while (iterator.hasNext())
                  {      String key = (String) iterator.next();
                        baseList.add(key);
                  }
                  list = Collections.synchronizedList(baseList);
                  
                  LinkedList threadPool = new LinkedList();
                  for(int i = 0; i < 15; i++)
                  {      
                        ThreadUtil t = new ThreadUtil();
                        threadPool.add(t);
                        t.start();
                  }
                  threadList = Collections.synchronizedList(threadPool);
                  while (!threadList.isEmpty())
                  {      /*ThreadUtil thread = (ThreadUtil) threadPool.remove(0);
                        thread.join();*/
                        ((ThreadUtil) threadList.remove(0)).join();
                        //System.out.println("this is reaching : " + ++i);                  
                  }
                  
                  System.out.println("the size of graph is : "  + graph.size());

                  long end = System.currentTimeMillis();
                  long elapsed = (end - start)/1000;
                  System.out.println("the total number of seconds is : " + elapsed);
                  timer.cancel();
                  System.out.println("The count is : " + count);

            }//      try
            catch(Exception e)
            {
                  e.printStackTrace();
            }
            finally
            {
      
            }
      }

            static public String getURLTask()
            {      synchronized (BaseSet.list)
                  {      if (BaseSet.list.isEmpty())
                        return null;
                        return (String)BaseSet.list.remove(0);
                  }
            }
static int count = 0;
}


class ThreadUtil extends Thread
{
      public void run()
      {      try{
                 String urlTask;
                   while((urlTask = BaseSet.getURLTask()) != null)
                   {      int numberOfMillisecondsInTheFuture = 10000; // 10 sec
                        Date timeToRun = new Date(System.currentTimeMillis()+numberOfMillisecondsInTheFuture);
                        System.out.println("the size is" + BaseSet.list.size());
                        TimerTask task = new TimerTask(){
                              public void run()
                              {      //System.out.println(" Kill it");
                                    ThreadUtil.this.stop(); // though deprecated it is what you need here.
                                    ThreadUtil newOneInstead = new ThreadUtil();
                                    BaseSet.threadList.add(newOneInstead);
                                    newOneInstead.start();
                        
                              }};
                  BaseSet.timer.schedule(task, timeToRun);
//                  System.out.println("the url is : " + urlTask);
                  try
                  {      HashMap links = HtmlUtil.getOutLinks(urlTask);
                        if (links!=null)
                        {      
                              BaseSet.graph.put(urlTask,HashUtil.intersection(links,BaseSet.baseSet));
                              BaseSet.count++;
                        }
/*                        else
                        {
                              BaseSet.graph.put(urlTask,"");
                              BaseSet.count++;
                        }
*/                  }
                  catch(Exception ex)
                   {      ex.printStackTrace();
                        }
                  finally
                   {
                        task.cancel();
                   }
                  }//      while
      }catch(Exception e)
       {
            e.printStackTrace();
       }
      }// run
}//      ThreadUtil
0
 
sumantedlaAuthor Commented:
>>Do you see any " Kill it" messages?


Yes, I could see the "kill it" messages when I execute my program.
0
 
aozarovCommented:
I suggest you to have those properties:
System.setProperty("sun.net.client.defaultConnectTimeout", "9000");
System.setProperty("sun.net.client.defaultReadTimeout", "9000");
Set only once and in your main method or something like that.

>HashUtil.eliminateDuplicates(baseSet);
Do you do anything special in this function?
Map will never have duplicates keys (if two keys are identical the last one wil override the other).

>Set keys = baseSet.keySet();
> Iterator iterator = keys.iterator();
> ...
> list = Collections.synchronizedList(baseList);

Can be replaced with one liner: list = Collections.synchronizedList(new LinkedList(baseSet.keySet()));

> LinkedList threadPool = new LinkedList();
Create the list as synchronized (so no need to make it later): LinkedList threadPool = Collections.synchronizedList(new LinkedList());

Change the order of the TimerTask run logic  (to remove the possiblity that threadList will be empty before adding new one):
                              ThreadUtil newOneInstead = new ThreadUtil();
                              BaseSet.threadList.add(newOneInstead);
                              newOneInstead.start();
                             ThreadUtil.this.stop(); // though deprecated it is what you need here.
                             System.out.println(" Kill it -> " + ThreadUtil.this.getName());

Try to add more debugging statement (e.g. when a thread takes a task) as I can't see why the TimerTask will not kick in and kill
a long running task.
0
 
sumantedlaAuthor Commented:
>HashUtil.eliminateDuplicates(baseSet);
>Do you do anything special in this function?
>Map will never have duplicates keys (if two keys are identical the last one wil override the other).

Why I did this is to eliminate duplicate urls like

http://google.com
http://google.com/

The slash is the difference. What is the best way to do this??

I will try with the suggested changes and let you know.

Thanks.

0
 
aozarovCommented:
> The slash is the difference. What is the best way to do this??
You can extract the host part from java.net.URL:
bsh % new URL("http://google.com/").getHost();
<google.com>
bsh % new URL("http://google.com").getHost();
<google.com>
0
 
sumantedlaAuthor Commented:
I made the changes, and it was working. The program is terminating properly. Infact, I executed the program without making any changes . Even then it terminated properly. The problem is something strange.

>bsh % new URL("http://google.com/").getHost();
Two or more pages can have the same host, right???  But the two pages are basically different. How I did is, I constructed URL objects and added them to a set, which eliminates the duplicates.


I had posted different exceptions in the previous messages. Are you sure that I am getting those exceptions because of low memory. I was starting with 1700 urls and  was able to retrieve links from 1550 pages only. For the remaining pages, I was getting those exceptions(and some timeout exceptions and etc) posted above.

0
 
aozarovCommented:
I beleive so though the error description is not suggesting this.
I assume you can access the same links that cause the problem by putting them first (or shirnking the list), right (did you try it)?
0
 
sumantedlaAuthor Commented:
I took me a while to figure out that the exceptions were due to the SocketTimeOut.  

One last thing,

java.io.EOFException: Unexpected end of ZLIB input stream   for URL : http://blog.kir.com/archives/2004_06.asp

Is this also related to the sockettimeout??

Thanks a lot. Without your help I couldnt have done all this.

Once again, thank you.
0
 
aozarovCommented:
>> Is this also related to the sockettimeout??
Don't think so but I will test it when I get home.
Can you try it with a different JVM in the meantime (1.5 JVM) as there are suggestions of a java bug: http://lists.canoo.com/pipermail/webtest/2003q3/000951.html
0
 
sumantedlaAuthor Commented:
I was working on JVM 1.5.  
0
 
sumantedlaAuthor Commented:
I mean, I am working on the latest JDK i.e 1.5

0
 
sumantedlaAuthor Commented:
One more Exception

For Url : http://fletcher.tufts.edu/inter_resources/dhpgenresources.html
java.util.NoSuchElementException
0
 
aozarovCommented:
Can you provide the full stackTrace for the last one?
0
 
sumantedlaAuthor Commented:
java.util.NoSuchElementException
For Url : http://fletcher.tufts.edu/inter_resources/dhpgenresources.html
java.util.NoSuchElementException
        at java.util.StringTokenizer.nextToken(Unknown Source)
        at com.meterware.httpunit.HttpUnitUtils.parseContentTypeHeader(HttpUnitUtils.java:51)
        at com.meterware.httpunit.WebResponse.readContentTypeHeader(WebResponse.java:1107)
        at com.meterware.httpunit.WebResponse.getCharacterSet(WebResponse.java:219)
        at com.meterware.httpunit.WebResponse.loadResponseText(WebResponse.java:930)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:61)
        at com.meterware.httpunit.HttpWebResponse.<init>(HttpWebResponse.java:67)
        at com.meterware.httpunit.WebConversation.newResponse(WebConversation.java:76)
        at com.meterware.httpunit.WebWindow.getResource(WebWindow.java:164)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:128)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:102)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:87)
        at HtmlUtil.getOutLinks(HtmlUtil.java:37)
        at ThreadUtil.run(BaseSet.java:233)
0
 
aozarovCommented:
Sorry for the very late response.
As for http://blog.kir.com/archives/2004_06.asp I still think its a JVM problem but you can avoid it by not accepting gziped response using HttpUnitOptions.setAcceptGzip(false); (before creating WebConversation)
As for http://fletcher.tufts.edu/inter_resources/dhpgenresources.html that page response contains an odd/wrong content-type meta tag:"<META content="" http-equiv=Content-Type>"
If you want you can change httpunit to be more reselient for such invalid value by changing HttpUnitUtils parseContentTypeHeader method to:
public static String[] parseContentTypeHeader( String header ) {
            System.out.println("This is the header '" + header + "'");
        String[] result = new String[] { "text/html", null };
        StringTokenizer st = new StringTokenizer( header, ";=" );
            if (st.hasMoreTokens())
              result[0] = st.nextToken();

        while (st.hasMoreTokens()) {
            String parameter = st.nextToken();
            if (st.hasMoreTokens()) {
                String value = stripQuotes( st.nextToken() );
                if (parameter.trim().equalsIgnoreCase( "charset" )) result[1] = value;
            }
        }
        return result;
    }

0
 
sumantedlaAuthor Commented:
Thanks. I will try them and let you know.
0
 
sumantedlaAuthor Commented:
Hi,

A simple one. For exceptions like
com.meterware.httpunit.NotHTMLException: The content type of the response is 'text/html charset': it must be 'text/html' in order to be recognized as HTML

how to handle them.

             catch(com.meterware.httpunit.NotHTMLException nothtml)
            {
                   return null;
            }
this is not working.


it says " com.meterware.httpunit.NotHTMLException is not public in com.meterware.httpunit;"

how can i overcome this.Actually i was handling com.meterware.httpunit.HttpException.
0
 
aozarovCommented:
You can change the class to public (by adding public before "class NotHTMLException...") in the NotHTMLException.java file
or catch Runtime exceptinon and then call  re.getMessage() [re is the runtime exception] and compare the return value
to "The content type of the response is ..." by using something like: if (re.getMessage().startsWith("The content type of the response is")) then this is NotHTMLException exception.
0
 
sumantedlaAuthor Commented:
Yes, It worked. I made it public.

Sorry for asking too many things here. I think the last exception is I am facing is

com.meterware.httpunit.AuthorizationRequiredException: Basic authentication required: realm="Password Protected Area"

for url : http://wnba.womensbasketballonline.com/

what could be the reason?? When I access using IE its working fine.

Once again, thanks for helping me out and sorry for asking too many questions.
0
 
aozarovCommented:
That is because this page response contains [WWW-Authenticate: Basic realm="Password Protected Area"] header
which is wrong as it indicates that this page requires authentication. The response also contains the requested page and this is why
IE shows it. You can add after "WebConversation wc = new WebConversation();"
wc.setAuthorization("anonymous", "anonymous@anonymous.com"); to provide dummy authentication.
The return page contains also a link with "news:...." which breaks w[i].getRequest().getURL() calls.
For that you can replace the call with w[i].getURLString() and check if it starts with "http" [or any other known protocol]

>Once again, thanks for helping me out and sorry for asking too many questions.
NP. :-)
0
 
sumantedlaAuthor Commented:
Hi,

Sorry to give a late reply. Thank You So much for all your help. I think there are no more problems with the httpunit.

Everything is working fine. Thank you so much. In future  I will be asking questions with new ID ;)as I dont want to continue this account which is on my original name.

Thank You Very Much!!!!!!!!!!
0
 
sumantedlaAuthor Commented:
Oh No....My account has been renewed.

Better luck next time.
0
 
aozarovCommented:
:-)
Take care.
0
 
sumantedlaAuthor Commented:
Are you still around....I have few more things to discuss
0
 
gweidnerCommented:
Hai All I am trying to test a website with  a Username,Password in  the login page and would like to test a hyperlink in second page.
However my code is working for Yahoo mail but is failing for the site we are testing. I am getting the following errors:


C:\test>java junit.textui.TestRunner STest
.ReferenceError: "Event" is not defined. (httpunit; line 30)
        at org.mozilla.javascript.NativeGlobal.constructError(NativeGlobal.java:
597)
        at org.mozilla.javascript.NativeGlobal.constructError(NativeGlobal.java:
557)
        at org.mozilla.javascript.ScriptRuntime.name(ScriptRuntime.java:1076)
        at org.mozilla.javascript.gen.c4.call(httpunit:30)
        at org.mozilla.javascript.gen.c4.exec(httpunit)
        at org.mozilla.javascript.Context.evaluateReader(Context.java:820)
        at org.mozilla.javascript.Context.evaluateString(Context.java:784)
        at com.meterware.httpunit.javascript.JavaScript$JavaScriptEngine.execute
Script(JavaScript.java:132)
        at com.meterware.httpunit.scripting.ScriptableDelegate.runScript(Scripta
bleDelegate.java:65)
        at com.meterware.httpunit.ParsedHTML.interpretScriptElement(ParsedHTML.j
ava:325)
        at com.meterware.httpunit.ParsedHTML.access$700(ParsedHTML.java:37)
        at com.meterware.httpunit.ParsedHTML$ScriptFactory.recordElement(ParsedH
TML.java:489)
        at com.meterware.httpunit.ParsedHTML$2.processElement(ParsedHTML.java:70
2)
        at com.meterware.httpunit.NodeUtils$PreOrderTraversal.perform(NodeUtils.
java:195)
        at com.meterware.httpunit.ParsedHTML.loadElements(ParsedHTML.java:718)
        at com.meterware.httpunit.ParsedHTML.getForms(ParsedHTML.java:106)
        at com.meterware.httpunit.WebResponse$Scriptable.load(WebResponse.java:6
88)
        at com.meterware.httpunit.javascript.JavaScript.load(JavaScript.java:89)

        at com.meterware.httpunit.javascript.JavaScriptEngineFactory.load(JavaSc
riptEngineFactory.java:58)
        at com.meterware.httpunit.RequestContext.runScripts(RequestContext.java:
44)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:122)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1
30)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:113)
        at Solbright.testLogin(Solbright.java:35)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at junit.framework.TestCase.runTest(TestCase.java:154)
        at junit.framework.TestCase.runBare(TestCase.java:127)
        at junit.framework.TestResult$1.protect(TestResult.java:106)
        at junit.framework.TestResult.runProtected(TestResult.java:124)
        at junit.framework.TestResult.run(TestResult.java:109)
        at junit.framework.TestCase.run(TestCase.java:118)
        at junit.framework.TestSuite.runTest(TestSuite.java:208)
        at junit.framework.TestSuite.run(TestSuite.java:203)
        at junit.textui.TestRunner.doRun(TestRunner.java:116)
        at junit.textui.TestRunner.start(TestRunner.java:172)
        at junit.textui.TestRunner.main(TestRunner.java:138)
E
Time: 13.078
There was 1 error:
1) testLogin(STest)com.meterware.httpunit.ScriptException: Script 'if (paren
t.opener) {
    parent.opener.location = location;
    parent.opener.focus();
    close();
} else if (parent.location != location) {
    parent.location = location;
}


function actOn(e) {
    if (!document.all) {
        keypressed = String.fromCharCode(e.which);
    } else {
        e=window.event;
        keypressed = String.fromCharCode(e.keyCode);
        e.cancelBubble = true;
    }
    if (keypressed == "\r") {
        doSubmit();
    }
    return true;
}

function doSubmit() {
    if (!m_isSubmitted) {
        m_isSubmitted = true;
        document.login_form.submit();
    }
}
if (!document.all) {
    window.document.captureEvents(Event.KEYPRESS);
}
document.onkeypress = actOn;

function init() {
    if (document.login_form.userid.value == "") {
        document.login_form.userid.focus();
    } else {
        document.login_form.passwd.focus();
    }

    var height = screen.height;
    var width = screen.width;
    document.login_form.height.value = height;
    document.login_form.width.value = width;
}

var m_isSubmitted = false;' failed: ReferenceError: "Event" is not defined. (htt
punit; line 30)
        at com.meterware.httpunit.javascript.JavaScript$JavaScriptEngine.handleS
criptException(JavaScript.java:202)
        at com.meterware.httpunit.javascript.JavaScript$JavaScriptEngine.execute
Script(JavaScript.java:136)
        at com.meterware.httpunit.scripting.ScriptableDelegate.runScript(Scripta
bleDelegate.java:65)
        at com.meterware.httpunit.ParsedHTML.interpretScriptElement(ParsedHTML.j
ava:325)
        at com.meterware.httpunit.ParsedHTML.access$700(ParsedHTML.java:37)
        at com.meterware.httpunit.ParsedHTML$ScriptFactory.recordElement(ParsedH
TML.java:489)
        at com.meterware.httpunit.ParsedHTML$2.processElement(ParsedHTML.java:70
2)
        at com.meterware.httpunit.NodeUtils$PreOrderTraversal.perform(NodeUtils.
java:195)
        at com.meterware.httpunit.ParsedHTML.loadElements(ParsedHTML.java:718)
        at com.meterware.httpunit.ParsedHTML.getForms(ParsedHTML.java:106)
        at com.meterware.httpunit.WebResponse$Scriptable.load(WebResponse.java:6
88)
        at com.meterware.httpunit.javascript.JavaScript.load(JavaScript.java:89)

        at com.meterware.httpunit.javascript.JavaScriptEngineFactory.load(JavaSc
riptEngineFactory.java:58)
        at com.meterware.httpunit.RequestContext.runScripts(RequestContext.java:
44)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:122)
        at com.meterware.httpunit.WebWindow.updateWindow(WebWindow.java:144)
        at com.meterware.httpunit.WebWindow.getSubframeResponse(WebWindow.java:1
30)
        at com.meterware.httpunit.WebWindow.getResponse(WebWindow.java:121)
        at com.meterware.httpunit.WebClient.getResponse(WebClient.java:113)
        at Solbright.testLogin(Solbright.java:35)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

FAILURES!!!
Tests run: 1,  Failures: 0,  Errors: 1



0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.