Solved

How to remove whitespace text-nodes from XML DOM

Posted on 2004-09-22
10
3,552 Views
Last Modified: 2013-11-23
Hello,

I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.

What is the simplest way to remove these nodes from my DOM?

I think there have to be a routine out there for this.

I'm using Java 1.3 and Apache xerces 2.5

Thanks
mos
0
Comment
Question by:mos
  • 5
  • 3
  • 2
10 Comments
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12126342


  // Parses an XML file and returns a DOM document.
        // If validating is true, the contents is validated against the DTD
        // specified in the file.
        public static Document parseXmlFile(String filename, boolean validating) {
            try {
                // Create a builder factory
                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                factory.setValidating(validating);
   
                // Create the builder and parse the file
                Document doc = factory.newDocumentBuilder().parse(new File(filename));
                return doc;
            } catch (SAXException e) {
                // A parsing error occurred; the xml input is not valid
            } catch (ParserConfigurationException e) {
            } catch (IOException e) {
            }
            return null;
        }




public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

   
    // Remove all <junk> elements
    removeAll(doc, Node.ELEMENT_NODE, "junk");
   
    // Remove all comment nodes
    removeAll(doc, Node.COMMENT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeName().equals(name))) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12126375
Above code may help you a little bit. But you should find white space related nodes
0
 
LVL 14

Accepted Solution

by:
sudhakar_koundinya earned 400 total points
ID: 12126459
public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

    // Remove all comment nodes
    removeAll(doc, Node.TEXT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeValue().trim().equals(name)==false)) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Expert Comment

by:CEHJ
ID: 12126495
0
 

Author Comment

by:mos
ID: 12131023
CEHJ: That doesn't help, because I have a ready state DOM-Object and there can't use the DocumentBuilderFactory anymore, right?!

Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12131373
AFAIK, that is the solution. Anyhow I try give the other solution if I get

Regards
Sudha
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12131406
setIgnoringElementContentWhitespace() simply does not work if no DTD is specified!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12131430
>>That doesn't help, because I have a ready state DOM-Object

I see. Then you'll have to visit the nodes as sudhakar has mentioned
0
 

Author Comment

by:mos
ID: 12131479
Hi sudhakar,

I tried you code, but it doesn't work.

Reason:

You hold the child nodes with NodeList list = node.getChildNodes();

Then you remove nodes with node.getParentNode().removeChild(node);

When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12131585
You could also try to turn them both into Strings and do something like

s = s .replaceAll(">\\s+|\\s+<", "");
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
eclipse console opening separately 2 38
Notify sent to other threads in Java 9 43
Selenium docs api java index 3 69
Website checklist for browser compatibility? 2 38
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question