mos
asked on
How to remove whitespace text-nodes from XML DOM
Hello,
I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.
What is the simplest way to remove these nodes from my DOM?
I think there have to be a routine out there for this.
I'm using Java 1.3 and Apache xerces 2.5
Thanks
mos
I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.
What is the simplest way to remove these nodes from my DOM?
I think there have to be a routine out there for this.
I'm using Java 1.3 and Apache xerces 2.5
Thanks
mos
Above code may help you a little bit. But you should find white space related nodes
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Try setting
http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setIgnoringElementContentWhitespace(boolean)
on the parser factory
http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setIgnoringElementContentWhitespace(boolean)
on the parser factory
ASKER
CEHJ: That doesn't help, because I have a ready state DOM-Object and there can't use the DocumentBuilderFactory anymore, right?!
Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
AFAIK, that is the solution. Anyhow I try give the other solution if I get
Regards
Sudha
Regards
Sudha
setIgnoringElementContentW hitespace( ) simply does not work if no DTD is specified!
>>That doesn't help, because I have a ready state DOM-Object
I see. Then you'll have to visit the nodes as sudhakar has mentioned
I see. Then you'll have to visit the nodes as sudhakar has mentioned
ASKER
Hi sudhakar,
I tried you code, but it doesn't work.
Reason:
You hold the child nodes with NodeList list = node.getChildNodes();
Then you remove nodes with node.getParentNode().remov eChild(nod e);
When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
I tried you code, but it doesn't work.
Reason:
You hold the child nodes with NodeList list = node.getChildNodes();
Then you remove nodes with node.getParentNode().remov
When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
You could also try to turn them both into Strings and do something like
s = s .replaceAll(">\\s+|\\s+<", "");
s = s .replaceAll(">\\s+|\\s+<",
// Parses an XML file and returns a DOM document.
// If validating is true, the contents is validated against the DTD
// specified in the file.
public static Document parseXmlFile(String filename, boolean validating) {
try {
// Create a builder factory
DocumentBuilderFactory factory = DocumentBuilderFactory.new
factory.setValidating(vali
// Create the builder and parse the file
Document doc = factory.newDocumentBuilder
return doc;
} catch (SAXException e) {
// A parsing error occurred; the xml input is not valid
} catch (ParserConfigurationExcept
} catch (IOException e) {
}
return null;
}
public void remove()
{
Document doc = parseXmlFile("infilename.x
// Remove all <junk> elements
removeAll(doc, Node.ELEMENT_NODE, "junk");
// Remove all comment nodes
removeAll(doc, Node.COMMENT_NODE, null);
// Normalize the DOM tree to combine all adjacent text nodes
doc.normalize();
}
// This method walks the document and removes all nodes
// of the specified type and specified name.
// If name is null, then the node is removed if the type matches.
public static void removeAll(Node node, short nodeType, String name) {
if (node.getNodeType() == nodeType &&
(name == null || node.getNodeName().equals(
node.getParentNode().remov
} else {
// Visit the children
NodeList list = node.getChildNodes();
for (int i=0; i<list.getLength(); i++) {
removeAll(list.item(i), nodeType, name);
}
}
}