html to xml

hi
I want to convert html pages to xml . is there any way that we can achieve this using java programming.

TIA
kousisAsked:
Who is Participating?
 
BreadstickCommented:
I'm not how you want the data converted, or if you're thinking about this the right way.
"XML was designed to describe data and to focus on what data is."
"HTML was designed to display data and to focus on how data looks."

http://www.w3schools.com/xml/xml_whatis.asp


Here's some tutorials on how to process XML with Java:
http://www.cafeconleche.org/books/xmljava/
http://www.javaworld.com/jw-03-2000/jw-03-xmlsax.html
http://www.bearcave.com/software/java/xml/
0
 
CEHJCommented:
You can convert html pages to xhtml (a kind of xml) using JTidy
0
 
arataniCommented:
HTML is a form of XML if you think of it since it have opening and closing tags. There are some tags in HTML that don't close; like <img> and <br>. So, to overcome this there is a new form of HTML coming up where everything is well-formed ie XHTML.

Why would you want to do this though?

AJ
0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

 
CEHJCommented:
>>HTML is a form of XML

No, it isn't actually. xhtml *is* though
0
 
MogalManicCommented:
Look in to tidy (http://www.w3.org/People/Raggett/tidy/).  It is a tool that cleans up html files.  To convert the html files into XHTML just issue the following command:
   tidy -asxhtml file.html
The product is available in many forms (including JTIDY which is the Java version).  The product is not perfect and you will still have to manually edit the files.

If you want to convert the data contained in the HTML, here is one process that might work (assuming the data is in tabular form).

  1) load the HTML pages into excel
  2) Remove unnecessary rows/columns
  3) Save the file as CSV
  4) Write a process to convert the CSV to XML format.
0
 
CEHJCommented:
JTidy has already been mentioned ;-)
0
 
kousisAuthor Commented:
my html pages changes, is ti possible to write code to generate xml pages.
0
 
arataniCommented:
you could probably issue the jtidy command above by MogalManic on the fly to dynamically generally XHTML content.

AJ
0
 
CEHJCommented:
>>is ti possible to write code to generate xml pages.

Yes, but what have you got in mind?

You could also look at the Neko html parser:

http://www.apache.org/~andyc/neko/doc/html/
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.