problem in JTIdy

I am using a jtidy to convert an HTML page to XML document. But the coverted page header looks like this:
<meta name="generator" content="HTML Tidy, see" />
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />

so I can not parse it as regular XML doucument  because it did not contain a regular xml header??

I am using the following java code to use Jtidy

import org.w3c.tidy.Tidy;

public class TestHTML2XML{
private String url;
private String outFileName;
private String errOutFileName;

public TestHTML2XML(String url, String outFileName, String
errOutFileName) {

this.url = url;
this.outFileName = outFileName;
this.errOutFileName = errOutFileName;

public void convert() {
URL u;
BufferedInputStream in;
FileOutputStream out;

Tidy tidy = new Tidy();

//Tell Tidy to convert HTML to XML

try {
//Set file for error messages
tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
u = new URL(url);

//Create input and output streams
in = new BufferedInputStream(u.openStream());
out = new FileOutputStream(outFileName);

//Convert files
tidy.parse(in, out);

//Clean up

} catch (IOException e) {
System.out.println(this.toString() + e.toString());
public static void main(String[] args) {
* Parameters are:
* URL of HTML file
* Filename of output file
* Filename of error file
TestHTML2XML t = new TestHTML2XML(args[0], args[1], args[2]);

Who is Participating?
Mayank SConnect With a Mentor Associate Director - Product EngineeringCommented:
It will use DOM. I meant to ask if you tried with the plain and simple DOM parser without using JTidy
Mayank SAssociate Director - Product EngineeringCommented:
>> so I can not parse it as regular XML doucument  because it did not contain a regular xml header??

It will have an <html> root node. You should be able to parse it.

Do you mean it does not contain the <?xml version....> header?

Did you try parsing it as it is?
badour_maAuthor Commented:
yes I traied and it give my an error!!
Mayank SAssociate Director - Product EngineeringCommented:
Which parser did you use? I guess DOM parser will not give you that error
badour_maAuthor Commented:
I do not know which parser i use because i use Jtidy classes only
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.