Solved

How to remove whitespace text-nodes from XML DOM

Posted on 2004-09-22
10
3,540 Views
Last Modified: 2013-11-23
Hello,

I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.

What is the simplest way to remove these nodes from my DOM?

I think there have to be a routine out there for this.

I'm using Java 1.3 and Apache xerces 2.5

Thanks
mos
0
Comment
Question by:mos
  • 5
  • 3
  • 2
10 Comments
 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility


  // Parses an XML file and returns a DOM document.
        // If validating is true, the contents is validated against the DTD
        // specified in the file.
        public static Document parseXmlFile(String filename, boolean validating) {
            try {
                // Create a builder factory
                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                factory.setValidating(validating);
   
                // Create the builder and parse the file
                Document doc = factory.newDocumentBuilder().parse(new File(filename));
                return doc;
            } catch (SAXException e) {
                // A parsing error occurred; the xml input is not valid
            } catch (ParserConfigurationException e) {
            } catch (IOException e) {
            }
            return null;
        }




public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

   
    // Remove all <junk> elements
    removeAll(doc, Node.ELEMENT_NODE, "junk");
   
    // Remove all comment nodes
    removeAll(doc, Node.COMMENT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeName().equals(name))) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility
Above code may help you a little bit. But you should find white space related nodes
0
 
LVL 14

Accepted Solution

by:
sudhakar_koundinya earned 400 total points
Comment Utility
public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

    // Remove all comment nodes
    removeAll(doc, Node.TEXT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeValue().trim().equals(name)==false)) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
0
 

Author Comment

by:mos
Comment Utility
CEHJ: That doesn't help, because I have a ready state DOM-Object and there can't use the DocumentBuilderFactory anymore, right?!

Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility
AFAIK, that is the solution. Anyhow I try give the other solution if I get

Regards
Sudha
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility
setIgnoringElementContentWhitespace() simply does not work if no DTD is specified!
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
>>That doesn't help, because I have a ready state DOM-Object

I see. Then you'll have to visit the nodes as sudhakar has mentioned
0
 

Author Comment

by:mos
Comment Utility
Hi sudhakar,

I tried you code, but it doesn't work.

Reason:

You hold the child nodes with NodeList list = node.getChildNodes();

Then you remove nodes with node.getParentNode().removeChild(node);

When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
You could also try to turn them both into Strings and do something like

s = s .replaceAll(">\\s+|\\s+<", "");
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now