arichexe
asked on
html to xhtml
How would I convert the html string to xhtml before parsing it. The below works fine, unless I take out the quotes around 'white', making it non-xhtml compliant. Would Tidy do the job? If so, how would I code such?
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.xml.sax.*" %>
<%
String htm;
htm = "<html>" +
"<body bgcolor='white'>" +
"<head>" +
"<title>Hello World</title>" +
"</head>" +
"</body>" +
"</html>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(htm)));
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
if (nodeList.getLength() > 0) {
for (int i = 0; i < nodeList.getLength(); i++) {
out.print(nodeList.item(i).toString());
}
}else{
out.print("not found");
}
%>
ASKER
How would I modify my code to utilize Tidy?
<%@ page import="java.io.*,java.net.*,java.text.*,java.util.*,javax.xml.parsers.*,javax.xml.xpath.*,org.w3c.dom.*,org.w3c.dom.*,org.xml.sax.*" %>
<%
String htm;
htm = "<html>" +
"<body bgcolor='white'>" +
"<head>" +
"<title>Hello World</title>" +
"</head>" +
"</body>" +
"</html>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(htm)));
document.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//title/text()",document,XPathConstants.NODESET);
if (nodeList.getLength() > 0) {
for (int i = 0; i < nodeList.getLength(); i++) {
out.print(nodeList.item(i).toString());
}
}else{
out.print("not found");
}
%>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
:-)
Yes,
See:
http://jtidy.sourceforge.net/apidocs/org/w3c/tidy/Tidy.html#setXHTML(boolean)