Solved

URL stream to xhtml

Posted on 2009-07-16
10
366 Views
Last Modified: 2012-05-07
How would I modify the below to convert a URL stream into an xhtml string, rather than an html file into an xhtml file?
import org.w3c.tidy.Tidy;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import org.w3c.dom.Document;
 

public class HTML_to_XHTML{

   public static void main(String[] args){

      try{

         FileInputStream FIS=new FileInputStream("C://test.html");

         FileOutputStream FOS=new FileOutputStream("C://testXHTML.xml");   

         Tidy T=new Tidy();

         Document D=T.parseDOM(FIS,FOS);

         }

      catch (java.io.FileNotFoundException e)

         {System.out.println(e.getMessage());}   

      }

   }

}

Open in new window

0
Comment
Question by:arichexe
  • 5
  • 4
10 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 24871675
0
 
LVL 92

Expert Comment

by:objects
ID: 24875255
        InputStream FIS=url.getInputStream();
         StringWriter FOS=new StringWriter();  
         Tidy T=new Tidy();
         T.parseDOM(FIS,FOS);
         String xhtml = FOS.toString();
0
 

Author Comment

by:arichexe
ID: 24899921
I'm getting a "Tidy cannot be resolved to a type" error.
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.xml.sax.*,org.w3c.tidy.*" %>

<%

URL url = new URL(MyUrl);

HttpURLConnection conn = (HttpURLConnection) url.openConnection();

conn.setRequestMethod("POST");

conn.setRequestProperty("Content-Type","text/xml");

conn.setDoOutput(true);

OutputStream os = conn.getOutputStream();

os.flush();

os.close();
 

InputStream is = conn.getInputStream();

StringWriter ox = new StringWriter();

Tidy T=new Tidy();

T.parseDOM(is,ox);

String xhtml = ox.toString();
 

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setValidating(false);

factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder builder = factory.newDocumentBuilder();

Document document = builder.parse(new InputSource(new StringReader(xhtml)));

document.getDocumentElement().normalize();

XPath xpath = XPathFactory.newInstance().newXPath();

NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 

if (nodeList.getLength() > 0) {

  for (int i = 0; i < nodeList.getLength(); i++) {

    out.print("msg: " + nodeList.item(i).toString());

  }

}else{

  out.print("msg: not found");

}

%>

Open in new window

0
 
LVL 92

Expert Comment

by:objects
ID: 24900113
make sure you have the tidy jar in your webapps lib directory
0
 
LVL 92

Expert Comment

by:objects
ID: 24900125
you should also be able to simplify your code to the following

InputStream is = conn.getInputStream();
StringWriter ox = new StringWriter();
Tidy T=new Tidy();
Document document = T.parseDOM(is,ox);
String xhtml = ox.toString();
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 
if (nodeList.getLength() > 0) {
  for (int i = 0; i < nodeList.getLength(); i++) {
    out.print("msg: " + nodeList.item(i).toString());
  }
}else{
  out.print("msg: not found");
}
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:arichexe
ID: 24900831
Now I'm getting "The method parseDOM(InputStream, OutputStream) in the type Tidy is not applicable for the arguments (InputStream, StringWriter)."
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.w3c.tidy.*,org.xml.sax.*" %>

<%

URL url = new URL(MyUrl);

HttpURLConnection conn = (HttpURLConnection) url.openConnection();

conn.setRequestMethod("POST");

conn.setRequestProperty("Content-Type","text/html");

conn.setDoOutput(true);

OutputStream os = conn.getOutputStream();

os.flush();

os.close();
 

InputStream is = conn.getInputStream();

StringWriter ox = new StringWriter();

Tidy T=new Tidy();

Document document = T.parseDOM(is,ox);

String xhtml = ox.toString();

document.getDocumentElement().normalize();

XPath xpath = XPathFactory.newInstance().newXPath();

NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 

if (nodeList.getLength() > 0) {

  for (int i = 0; i < nodeList.getLength(); i++) {

    out.print("msg: " + nodeList.item(i).toString());

  }

}else{

  out.print("msg: not found");

}

%>

Open in new window

0
 
LVL 92

Expert Comment

by:objects
ID: 24901038
you don't actually need to create the string at all, try this:

InputStream is = conn.getInputStream();
Tidy T=new Tidy();
Document document = T.parseDOM(is, null);
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
0
 

Author Comment

by:arichexe
ID: 24901110
Now it returns this weird string "msg: org.w3c.tidy.DOMTextImpl@9674b2d" and the last 7 chars change when I hit refresh.  No error, though.  Strange.
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.w3c.tidy.*,org.xml.sax.*" %>

<%

URL url = new URL("http://MyUrl.com");

HttpURLConnection conn = (HttpURLConnection) url.openConnection();

InputStream is = conn.getInputStream();

Tidy T=new Tidy();

Document document = T.parseDOM(is,null);

document.getDocumentElement().normalize();

XPath xpath = XPathFactory.newInstance().newXPath();

NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
 

if (nodeList.getLength() > 0) {

  for (int i = 0; i < nodeList.getLength(); i++) {

    out.print("msg: " + nodeList.item(i).toString());

  }

}else{

  out.print("msg: not found");

}

%>

Open in new window

0
 
LVL 92

Accepted Solution

by:
objects earned 500 total points
ID: 24901121
>     out.print("msg: " + nodeList.item(i).toString());

thats because Node doesn't have a toString(), try instead getNodeValue()

    out.print("msg: " + nodeList.item(i).getNodeValue());
0
 

Author Closing Comment

by:arichexe
ID: 31604332
Thanks!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now