troubleshooting Question

Javascript pagination with HTTPUNIT

Avatar of CarlosScheidecker
CarlosScheidecker asked on
Java
16 Comments1 Solution729 ViewsLast Modified:
Hello,

I am trying to crawl a paginated table where the pagination links are Javascript links.

If I use HTTPUnit I can click on javascript links by doing the following code.

This page has a table with pagination through Javascript calls. It works great with HTTPUnit, however there are other links which are not JS and I do not want to click them.

HttpUnitOptions.setScriptingEnabled(true);
HttpUnitOptions.setExceptionsThrownOnScriptError(false);
HttpUnitOptions.setJavaScriptOptimizationLevel(9);
WebRequest req = new GetMethodWebRequest("http://someurl.com.pt/pagina.jsp?OAFunc=PON_ABSTRACT_PAGE");
resp = wc.getResponse(req);
                  
WebLink[] links = resp.getLinks();
                  
for (WebLink link: links) {
      System.out.println(link.getURLString()+" "+link.getText());
      respAux = link.click();
}

Here are my questions:

1) How do you determine with the Link object that that link is a Javascript link, not an ordinary link. Those links have # on the href but they have an onclick event. Hence, how do I do that on the object?

2) Since each link returns the content of a frame, when clicking another link it gets new content and I need to craw it. How can I guarantee that the same content is not being crawled again? I was think about adding the content to a data structure so that the content is not crawled again. So I was thinking about doing a recursive call passing the datastructure so that the same link is not crawled twice. Any ideas?

Join the community to see this answer!
Join our exclusive community to see this answer & millions of others.
Unlock 1 Answer and 16 Comments.
Join the Community
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 16 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros