Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 243
  • Last Modified:

html to xml

hi
I want to convert html pages to xml . is there any way that we can achieve this using java programming.

TIA
0
kousis
Asked:
kousis
1 Solution
 
BreadstickCommented:
I'm not how you want the data converted, or if you're thinking about this the right way.
"XML was designed to describe data and to focus on what data is."
"HTML was designed to display data and to focus on how data looks."

http://www.w3schools.com/xml/xml_whatis.asp


Here's some tutorials on how to process XML with Java:
http://www.cafeconleche.org/books/xmljava/
http://www.javaworld.com/jw-03-2000/jw-03-xmlsax.html
http://www.bearcave.com/software/java/xml/
0
 
CEHJCommented:
You can convert html pages to xhtml (a kind of xml) using JTidy
0
 
arataniCommented:
HTML is a form of XML if you think of it since it have opening and closing tags. There are some tags in HTML that don't close; like <img> and <br>. So, to overcome this there is a new form of HTML coming up where everything is well-formed ie XHTML.

Why would you want to do this though?

AJ
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
CEHJCommented:
>>HTML is a form of XML

No, it isn't actually. xhtml *is* though
0
 
MogalManicCommented:
Look in to tidy (http://www.w3.org/People/Raggett/tidy/).  It is a tool that cleans up html files.  To convert the html files into XHTML just issue the following command:
   tidy -asxhtml file.html
The product is available in many forms (including JTIDY which is the Java version).  The product is not perfect and you will still have to manually edit the files.

If you want to convert the data contained in the HTML, here is one process that might work (assuming the data is in tabular form).

  1) load the HTML pages into excel
  2) Remove unnecessary rows/columns
  3) Save the file as CSV
  4) Write a process to convert the CSV to XML format.
0
 
CEHJCommented:
JTidy has already been mentioned ;-)
0
 
kousisAuthor Commented:
my html pages changes, is ti possible to write code to generate xml pages.
0
 
arataniCommented:
you could probably issue the jtidy command above by MogalManic on the fly to dynamically generally XHTML content.

AJ
0
 
CEHJCommented:
>>is ti possible to write code to generate xml pages.

Yes, but what have you got in mind?

You could also look at the Neko html parser:

http://www.apache.org/~andyc/neko/doc/html/
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now