Solved

Need help parsing html into JDK1.4.2's HTML DOM

Posted on 2004-04-21
11
279 Views
Last Modified: 2013-11-23
I am writing an app that culls information from a website such as forms or pertinent information.  I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website.  I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components.  I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for.  On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results.  I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.

Any help/examples would be greatly appreciated. Thanks.
0
Comment
Question by:tigress298
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 92

Expert Comment

by:objects
ID: 10883425
Try HTMLEditorKit
0
 
LVL 92

Expert Comment

by:objects
ID: 10883428
0
 
LVL 92

Expert Comment

by:objects
ID: 10883429
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10884016
>>but it does not support the full dom2 specification which seems to be what I am looking for.  

You would be probably better off with http://www.apache.org/~andyc/neko/doc/html/
0
 

Author Comment

by:tigress298
ID: 10888669
I can't use any third party software for this task either.
0
 
LVL 23

Expert Comment

by:rama_krishna580
ID: 10889487
0
 
LVL 23

Expert Comment

by:rama_krishna580
ID: 10889507
try this also...

http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode

which can parse and verify u r html document and display...you implement the webservice..have a look..

best of luck..

R.K.
0
 

Author Comment

by:tigress298
ID: 10891323
The webservices site is really great, but as what I'm working on will eventually go into a classified arena, I can't utilize anything web-based or 3rd Party.  I really need to get ahold of some open source code or use native api's to convert poor formed html to well formed xml, or use a native java parser to parse potentially poor formatted html directly.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10891762
>>.  I really need to get ahold of some open source code

I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
0
 
LVL 92

Expert Comment

by:objects
ID: 10894456
> it does not support the full dom2 specification which seems to be what I am looking for.

have you tried HTMLEditorKit? worth trying to see how it performs.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10922219
8-)
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
null output 3 43
Running JavaFX on JDeveloper 12C 1 79
login form jsp example 2 55
What browser will run Java? 7 128
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question