Link to home
Start Free TrialLog in
Avatar of Purdue_Pete
Purdue_Pete

asked on

Bypass Validation

Hi.

I am trying to crawl a page using a web crawler. That page exists behinds a validator (struts), i.e. In order to get to the page, a button needs to be clicked. Is there anyway this can be bypassed so web crawler can get to the page without clicking this button?

Code:
<form name="loginForm" method="post" action="/check.do">
      <input type="hidden" name="forward" value="target_page">
       <input type="submit" name="org.apache.struts.taglib.html.CANCEL" value="Continue" onclick="bCancel=true;">
 </form>

Any help is appreciated. Thanks.
ASKER CERTIFIED SOLUTION
Avatar of Dan203
Dan203

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
if you have the ability to apply javascript to the page...


<script type="text/javascript">
<!--
function simulateClick(element) {
	if (element) {
		element.click();
	} else {}
}
 
subButton = document.getElementsByName("org.apache.struts.taglib.html.CANCEL")[0];
simulateClick(subButton);
-->
</script>

Open in new window

cannot do.  That is exactly why they put that input validation in, to stop you crawling their site.  Same as the warped graphics on other sites like google -- requires an input before you can get past that point -- specifically to STOP mass crawling of their websites. This is the biggest problem on the web today -- automatic site crawlers steal 100x to 200x more bandwidth than do legitimate users of their website.
Avatar of zemond
zemond

open the page in a browser and view its source, you can then copy and paste that into the w3 validator.
Avatar of Purdue_Pete

ASKER

Yes, simple JS seemed to get around Struts. I thought Struts would scrutinize more w/ these kind of issues.