asked on

Bypass Validation

Hi.

I am trying to crawl a page using a web crawler. That page exists behinds a validator (struts), i.e. In order to get to the page, a button needs to be clicked. Is there anyway this can be bypassed so web crawler can get to the page without clicking this button?

Code:
<form name="loginForm" method="post" action="/check.do">
<input type="hidden" name="forward" value="target_page">
<input type="submit" name="org.apache.struts.taglib.html.CANCEL" value="Continue" onclick="bCancel=true;">
</form>

Any help is appreciated. Thanks.

ASKER CERTIFIED SOLUTION

Dan203

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

MMDeveloper

if you have the ability to apply javascript to the page...

<script type="text/javascript">
<!--
function simulateClick(element) {
	if (element) {
		element.click();
	} else {}
}
 
subButton = document.getElementsByName("org.apache.struts.taglib.html.CANCEL")[0];
simulateClick(subButton);
-->
</script>

Open in new window

scrathcyboy

cannot do. That is exactly why they put that input validation in, to stop you crawling their site. Same as the warped graphics on other sites like google -- requires an input before you can get past that point -- specifically to STOP mass crawling of their websites. This is the biggest problem on the web today -- automatic site crawlers steal 100x to 200x more bandwidth than do legitimate users of their website.

zemond

open the page in a browser and view its source, you can then copy and paste that into the w3 validator.

Purdue_Pete

ASKER

Yes, simple JS seemed to get around Struts. I thought Struts would scrutinize more w/ these kind of issues.