Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

problem in JTIdy

Posted on 2007-03-30
7
Medium Priority
?
292 Views
Last Modified: 2013-11-19
I am using a jtidy to convert an HTML page to XML document. But the coverted page header looks like this:
"<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />
"

so I can not parse it as regular XML doucument  because it did not contain a regular xml header??

I am using the following java code to use Jtidy

import java.net.URL;
import java.io.*;
import org.w3c.tidy.Tidy;

public class TestHTML2XML{
private String url;
private String outFileName;
private String errOutFileName;




public TestHTML2XML(String url, String outFileName, String
errOutFileName) {

this.url = url;
this.outFileName = outFileName;
this.errOutFileName = errOutFileName;
}



public void convert() {
URL u;
BufferedInputStream in;
FileOutputStream out;

Tidy tidy = new Tidy();

//Tell Tidy to convert HTML to XML
tidy.setXmlOut(true);

try {
//Set file for error messages
tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
u = new URL(url);

//Create input and output streams
in = new BufferedInputStream(u.openStream());
out = new FileOutputStream(outFileName);

//Convert files
tidy.parse(in, out);

//Clean up
in.close();
out.close();

} catch (IOException e) {
System.out.println(this.toString() + e.toString());
}
}
public static void main(String[] args) {
/*
* Parameters are:
* URL of HTML file
* Filename of output file
* Filename of error file
*/
TestHTML2XML t = new TestHTML2XML(args[0], args[1], args[2]);
t.convert();
}
}

0
Comment
Question by:badour_ma
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
7 Comments
 
LVL 30

Expert Comment

by:Mayank S
ID: 18825702
>> so I can not parse it as regular XML doucument  because it did not contain a regular xml header??

It will have an <html> root node. You should be able to parse it.

Do you mean it does not contain the <?xml version....> header?

Did you try parsing it as it is?
0
 

Author Comment

by:badour_ma
ID: 18828375
yes I traied and it give my an error!!
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18829120
Which parser did you use? I guess DOM parser will not give you that error
0
 

Author Comment

by:badour_ma
ID: 18835348
I do not know which parser i use because i use Jtidy classes only
0
 
LVL 30

Accepted Solution

by:
Mayank S earned 2000 total points
ID: 19046648
It will use DOM. I meant to ask if you tried with the plain and simple DOM parser without using JTidy
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question