[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3703
  • Last Modified:

How to remove whitespace text-nodes from XML DOM

Hello,

I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.

What is the simplest way to remove these nodes from my DOM?

I think there have to be a routine out there for this.

I'm using Java 1.3 and Apache xerces 2.5

Thanks
mos
0
mos
Asked:
mos
  • 5
  • 3
  • 2
1 Solution
 
sudhakar_koundinyaCommented:


  // Parses an XML file and returns a DOM document.
        // If validating is true, the contents is validated against the DTD
        // specified in the file.
        public static Document parseXmlFile(String filename, boolean validating) {
            try {
                // Create a builder factory
                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                factory.setValidating(validating);
   
                // Create the builder and parse the file
                Document doc = factory.newDocumentBuilder().parse(new File(filename));
                return doc;
            } catch (SAXException e) {
                // A parsing error occurred; the xml input is not valid
            } catch (ParserConfigurationException e) {
            } catch (IOException e) {
            }
            return null;
        }




public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

   
    // Remove all <junk> elements
    removeAll(doc, Node.ELEMENT_NODE, "junk");
   
    // Remove all comment nodes
    removeAll(doc, Node.COMMENT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeName().equals(name))) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
 
sudhakar_koundinyaCommented:
Above code may help you a little bit. But you should find white space related nodes
0
 
sudhakar_koundinyaCommented:
public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

    // Remove all comment nodes
    removeAll(doc, Node.TEXT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeValue().trim().equals(name)==false)) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
mosAuthor Commented:
CEHJ: That doesn't help, because I have a ready state DOM-Object and there can't use the DocumentBuilderFactory anymore, right?!

Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
0
 
sudhakar_koundinyaCommented:
AFAIK, that is the solution. Anyhow I try give the other solution if I get

Regards
Sudha
0
 
sudhakar_koundinyaCommented:
setIgnoringElementContentWhitespace() simply does not work if no DTD is specified!
0
 
CEHJCommented:
>>That doesn't help, because I have a ready state DOM-Object

I see. Then you'll have to visit the nodes as sudhakar has mentioned
0
 
mosAuthor Commented:
Hi sudhakar,

I tried you code, but it doesn't work.

Reason:

You hold the child nodes with NodeList list = node.getChildNodes();

Then you remove nodes with node.getParentNode().removeChild(node);

When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
0
 
CEHJCommented:
You could also try to turn them both into Strings and do something like

s = s .replaceAll(">\\s+|\\s+<", "");
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now