Need help parsing html into JDK1.4.2's HTML DOM

I am writing an app that culls information from a website such as forms or pertinent information.  I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website.  I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components.  I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for.  On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results.  I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.

Any help/examples would be greatly appreciated. Thanks.
tigress298Asked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
CEHJConnect With a Mentor Commented:
>>but it does not support the full dom2 specification which seems to be what I am looking for.  

You would be probably better off with http://www.apache.org/~andyc/neko/doc/html/
0
 
objectsCommented:
Try HTMLEditorKit
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
tigress298Author Commented:
I can't use any third party software for this task either.
0
 
rama_krishna580Commented:
try this also...

http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode

which can parse and verify u r html document and display...you implement the webservice..have a look..

best of luck..

R.K.
0
 
tigress298Author Commented:
The webservices site is really great, but as what I'm working on will eventually go into a classified arena, I can't utilize anything web-based or 3rd Party.  I really need to get ahold of some open source code or use native api's to convert poor formed html to well formed xml, or use a native java parser to parse potentially poor formatted html directly.
0
 
CEHJCommented:
>>.  I really need to get ahold of some open source code

I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
0
 
objectsCommented:
> it does not support the full dom2 specification which seems to be what I am looking for.

have you tried HTMLEditorKit? worth trying to see how it performs.
0
 
CEHJCommented:
8-)
0
All Courses

From novice to tech pro — start learning today.