Solved

Need help parsing html into JDK1.4.2's HTML DOM

Posted on 2004-04-21
11
267 Views
Last Modified: 2013-11-23
I am writing an app that culls information from a website such as forms or pertinent information.  I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website.  I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components.  I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for.  On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results.  I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.

Any help/examples would be greatly appreciated. Thanks.
0
Comment
Question by:tigress298
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 92

Expert Comment

by:objects
Comment Utility
Try HTMLEditorKit
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
Comment Utility
>>but it does not support the full dom2 specification which seems to be what I am looking for.  

You would be probably better off with http://www.apache.org/~andyc/neko/doc/html/
0
 

Author Comment

by:tigress298
Comment Utility
I can't use any third party software for this task either.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 23

Expert Comment

by:rama_krishna580
Comment Utility
0
 
LVL 23

Expert Comment

by:rama_krishna580
Comment Utility
try this also...

http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode

which can parse and verify u r html document and display...you implement the webservice..have a look..

best of luck..

R.K.
0
 

Author Comment

by:tigress298
Comment Utility
The webservices site is really great, but as what I'm working on will eventually go into a classified arena, I can't utilize anything web-based or 3rd Party.  I really need to get ahold of some open source code or use native api's to convert poor formed html to well formed xml, or use a native java parser to parse potentially poor formatted html directly.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>.  I really need to get ahold of some open source code

I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> it does not support the full dom2 specification which seems to be what I am looking for.

have you tried HTMLEditorKit? worth trying to see how it performs.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
8-)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
The viewer will learn how to implement Singleton Design Pattern in Java.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now