Solved

Need help parsing html into JDK1.4.2's HTML DOM

Posted on 2004-04-21
11
270 Views
Last Modified: 2013-11-23
I am writing an app that culls information from a website such as forms or pertinent information.  I need to grab the html, parse it into a tree structure such as JDK1.4's HTMLDocument or my own, have the app generate a gui off the model, gather user input, update the model, and submit the results back through the website.  I have looked into using regular expressions to parse the site, but am finding it to be too complex, not the parsing part if I know what I'm looking for, but in looping through the nested tables and mapping the inputs to java components.  I learned recently of the org.w3c.dom.html packages in jdk1.4.2, but it does not support the full dom2 specification which seems to be what I am looking for.  On top of that, I can't figure out how to parse the html into the htmDOM, let alone how to update the model with user input and submit the results.  I don't have any guarantees that the html is well formed, and the parsing must be pretty fast.

Any help/examples would be greatly appreciated. Thanks.
0
Comment
Question by:tigress298
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 92

Expert Comment

by:objects
ID: 10883425
Try HTMLEditorKit
0
 
LVL 92

Expert Comment

by:objects
ID: 10883428
0
 
LVL 92

Expert Comment

by:objects
ID: 10883429
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10884016
>>but it does not support the full dom2 specification which seems to be what I am looking for.  

You would be probably better off with http://www.apache.org/~andyc/neko/doc/html/
0
 

Author Comment

by:tigress298
ID: 10888669
I can't use any third party software for this task either.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 23

Expert Comment

by:rama_krishna580
ID: 10889487
0
 
LVL 23

Expert Comment

by:rama_krishna580
ID: 10889507
try this also...

http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx?op=Url2XmlNode

which can parse and verify u r html document and display...you implement the webservice..have a look..

best of luck..

R.K.
0
 

Author Comment

by:tigress298
ID: 10891323
The webservices site is really great, but as what I'm working on will eventually go into a classified arena, I can't utilize anything web-based or 3rd Party.  I really need to get ahold of some open source code or use native api's to convert poor formed html to well formed xml, or use a native java parser to parse potentially poor formatted html directly.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10891762
>>.  I really need to get ahold of some open source code

I thought you couldn't use 3rd-party apis? What you've just described is a perfect description of what lies at the link i posted!
0
 
LVL 92

Expert Comment

by:objects
ID: 10894456
> it does not support the full dom2 specification which seems to be what I am looking for.

have you tried HTMLEditorKit? worth trying to see how it performs.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10922219
8-)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now