Link to home
Start Free TrialLog in
Avatar of paulkramer
paulkramer

asked on

Python HTML Scraping

I need to somehow parse one parameter to an HTML form and perform a click action on the submit button and have decided I want to use Python. There are examples floating around the web on how to perform this (http://docs.python.org/3.1/howto/urllib2.html), but none offer any insight on how to deal with the onclick action I'm dealing with. See the form code below:

<button onclick="AJAX_REQUEST_STATUS=1; searchButtonPressed=true; this.form.filter_page.value=1; submitSearch(document.frm, 'searchMyRecHoldings.shtml?orerFlag=public&amp;');" id="searchButton" type="button">Search</button>

I've tried the following without success:

import urllib
params = urllib.urlencode({'genYear': 2010, 'onClick': 'AJAX_REQUEST_STATUS=1;\
                           searchButtonPressed=true; this.form.filter_page.value=1; submitSearch(document.frm, ''searchMyRecHoldings.shtml?orerFlag=public&'')'})
f = urllib.urlopen("https://www.rec-registry.gov.au/getSearchPublicRecHoldings.shtml", params)
print f.read()

Avatar of HonorGod
HonorGod
Flag of United States of America image

From your description, it would seem that in order to do that you would have to have the browser implemented in Python, or have a Python plugin for the browser.

Other than that, I don't believe that it is possible.
You should parse and execute JavaScript code your self and only substitute form parameters in urllib.open. Not JavaScript vars as you do.

To get help with it post all JavaScript code used on the page including functions code.
ASKER CERTIFIED SOLUTION
Avatar of paulkramer
paulkramer

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I can understand parsing the HTML using Beautiful Soup libraries.

What I don't understand is how you "...perform a click action on the submit button "...

Avatar of paulkramer
paulkramer

ASKER

The search button contained a piece of javascript which I then just navigated (using the IEC navigate function) to after selecting a value in a drop down menu. The search button wasn't a submit button persae and it didn't have a name which made it difficult.
Interesting.  Thanks for sharing your solution.

Good luck & have a great day.