Need help parsing html into JDK1.4.2's HTML DOM
Posted on 2004-04-21
I am writing an app that culls information from a website such as forms or pertinent information. I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website. I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components. I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for. On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results. I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.
Any help/examples would be greatly appreciated. Thanks.