?
Solved

URL stream to xhtml

Posted on 2009-07-16
10
Medium Priority
?
376 Views
Last Modified: 2012-05-07
How would I modify the below to convert a URL stream into an xhtml string, rather than an html file into an xhtml file?
import org.w3c.tidy.Tidy;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.w3c.dom.Document;
 
public class HTML_to_XHTML{
   public static void main(String[] args){
      try{
         FileInputStream FIS=new FileInputStream("C://test.html");
         FileOutputStream FOS=new FileOutputStream("C://testXHTML.xml");   
         Tidy T=new Tidy();
         Document D=T.parseDOM(FIS,FOS);
         }
      catch (java.io.FileNotFoundException e)
         {System.out.println(e.getMessage());}   
      }
   }
}

Open in new window

0
Comment
Question by:arichexe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 24871675
0
 
LVL 92

Expert Comment

by:objects
ID: 24875255
        InputStream FIS=url.getInputStream();
         StringWriter FOS=new StringWriter();  
         Tidy T=new Tidy();
         T.parseDOM(FIS,FOS);
         String xhtml = FOS.toString();
0
 

Author Comment

by:arichexe
ID: 24899921
I'm getting a "Tidy cannot be resolved to a type" error.
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.xml.sax.*,org.w3c.tidy.*" %>
<%
URL url = new URL(MyUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type","text/xml");
conn.setDoOutput(true);
OutputStream os = conn.getOutputStream();
os.flush();
os.close();
 
InputStream is = conn.getInputStream();
StringWriter ox = new StringWriter();
Tidy T=new Tidy();
T.parseDOM(is,ox);
String xhtml = ox.toString();
 
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xhtml)));
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 
if (nodeList.getLength() > 0) {
  for (int i = 0; i < nodeList.getLength(); i++) {
    out.print("msg: " + nodeList.item(i).toString());
  }
}else{
  out.print("msg: not found");
}
%>

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 92

Expert Comment

by:objects
ID: 24900113
make sure you have the tidy jar in your webapps lib directory
0
 
LVL 92

Expert Comment

by:objects
ID: 24900125
you should also be able to simplify your code to the following

InputStream is = conn.getInputStream();
StringWriter ox = new StringWriter();
Tidy T=new Tidy();
Document document = T.parseDOM(is,ox);
String xhtml = ox.toString();
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 
if (nodeList.getLength() > 0) {
  for (int i = 0; i < nodeList.getLength(); i++) {
    out.print("msg: " + nodeList.item(i).toString());
  }
}else{
  out.print("msg: not found");
}
0
 

Author Comment

by:arichexe
ID: 24900831
Now I'm getting "The method parseDOM(InputStream, OutputStream) in the type Tidy is not applicable for the arguments (InputStream, StringWriter)."
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.w3c.tidy.*,org.xml.sax.*" %>
<%
URL url = new URL(MyUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type","text/html");
conn.setDoOutput(true);
OutputStream os = conn.getOutputStream();
os.flush();
os.close();
 
InputStream is = conn.getInputStream();
StringWriter ox = new StringWriter();
Tidy T=new Tidy();
Document document = T.parseDOM(is,ox);
String xhtml = ox.toString();
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 
if (nodeList.getLength() > 0) {
  for (int i = 0; i < nodeList.getLength(); i++) {
    out.print("msg: " + nodeList.item(i).toString());
  }
}else{
  out.print("msg: not found");
}
%>

Open in new window

0
 
LVL 92

Expert Comment

by:objects
ID: 24901038
you don't actually need to create the string at all, try this:

InputStream is = conn.getInputStream();
Tidy T=new Tidy();
Document document = T.parseDOM(is, null);
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
0
 

Author Comment

by:arichexe
ID: 24901110
Now it returns this weird string "msg: org.w3c.tidy.DOMTextImpl@9674b2d" and the last 7 chars change when I hit refresh.  No error, though.  Strange.
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.w3c.tidy.*,org.xml.sax.*" %>
<%
URL url = new URL("http://MyUrl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
InputStream is = conn.getInputStream();
Tidy T=new Tidy();
Document document = T.parseDOM(is,null);
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 
if (nodeList.getLength() > 0) {
  for (int i = 0; i < nodeList.getLength(); i++) {
    out.print("msg: " + nodeList.item(i).toString());
  }
}else{
  out.print("msg: not found");
}
%>

Open in new window

0
 
LVL 92

Accepted Solution

by:
objects earned 2000 total points
ID: 24901121
>     out.print("msg: " + nodeList.item(i).toString());

thats because Node doesn't have a toString(), try instead getNodeValue()

    out.print("msg: " + nodeList.item(i).getNodeValue());
0
 

Author Closing Comment

by:arichexe
ID: 31604332
Thanks!
0

Featured Post

New benefit for Premium Members - Upgrade now!

Ready to get started with anonymous questions today? It's easy! Learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
Suggested Courses
Course of the Month11 days, 14 hours left to enroll

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question