tigress298
asked on
Need help parsing html into JDK1.4.2's HTML DOM
I am writing an app that culls information from a website such as forms or pertinent information. I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website. I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components. I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for. On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results. I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.
Any help/examples would be greatly appreciated. Thanks.
Any help/examples would be greatly appreciated. Thanks.
Try HTMLEditorKit
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I can't use any third party software for this task either.
you can check here...
http://xml.apache.org/xalan-j/usagepatterns.html
http://java.sun.com/developer/TechTips/2000/tt0627.html
http://www.rgagnon.com/javadetails/java-0408.html
best of luck...
R.K.
http://xml.apache.org/xalan-j/usagepatterns.html
http://java.sun.com/developer/TechTips/2000/tt0627.html
http://www.rgagnon.com/javadetails/java-0408.html
best of luck...
R.K.
try this also...
http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode
which can parse and verify u r html document and display...you implement the webservice..have a look..
best of luck..
R.K.
http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode
which can parse and verify u r html document and display...you implement the webservice..have a look..
best of luck..
R.K.
ASKER
The webservices site is really great, but as what I'm working on will eventually go into a classified arena, I can't utilize anything web-based or 3rd Party. I really need to get ahold of some open source code or use native api's to convert poor formed html to well formed xml, or use a native java parser to parse potentially poor formatted html directly.
>>. I really need to get ahold of some open source code
I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
> it does not support the full dom2 specification which seems to be what I am looking for.
have you tried HTMLEditorKit? worth trying to see how it performs.
have you tried HTMLEditorKit? worth trying to see how it performs.
8-)