?
Solved

How to remove whitespace text-nodes from XML DOM

Posted on 2004-09-22
10
Medium Priority
?
3,581 Views
Last Modified: 2013-11-23
Hello,

I have two DOM objects and because I would like to have them comparable, I need to remove any text-node that only contains whitespace characters.

What is the simplest way to remove these nodes from my DOM?

I think there have to be a routine out there for this.

I'm using Java 1.3 and Apache xerces 2.5

Thanks
mos
0
Comment
Question by:mos
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
10 Comments
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12126342


  // Parses an XML file and returns a DOM document.
        // If validating is true, the contents is validated against the DTD
        // specified in the file.
        public static Document parseXmlFile(String filename, boolean validating) {
            try {
                // Create a builder factory
                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                factory.setValidating(validating);
   
                // Create the builder and parse the file
                Document doc = factory.newDocumentBuilder().parse(new File(filename));
                return doc;
            } catch (SAXException e) {
                // A parsing error occurred; the xml input is not valid
            } catch (ParserConfigurationException e) {
            } catch (IOException e) {
            }
            return null;
        }




public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

   
    // Remove all <junk> elements
    removeAll(doc, Node.ELEMENT_NODE, "junk");
   
    // Remove all comment nodes
    removeAll(doc, Node.COMMENT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeName().equals(name))) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12126375
Above code may help you a little bit. But you should find white space related nodes
0
 
LVL 14

Accepted Solution

by:
sudhakar_koundinya earned 800 total points
ID: 12126459
public void remove()
{

Document doc = parseXmlFile("infilename.xml", false);
   

    // Remove all comment nodes
    removeAll(doc, Node.TEXT_NODE, null);
   
    // Normalize the DOM tree to combine all adjacent text nodes
    doc.normalize();
 }  
    // This method walks the document and removes all nodes
    // of the specified type and specified name.
    // If name is null, then the node is removed if the type matches.
    public static void removeAll(Node node, short nodeType, String name) {
        if (node.getNodeType() == nodeType &&
                (name == null || node.getNodeValue().trim().equals(name)==false)) {
            node.getParentNode().removeChild(node);
        } else {
            // Visit the children
            NodeList list = node.getChildNodes();
            for (int i=0; i<list.getLength(); i++) {
                removeAll(list.item(i), nodeType, name);
            }
        }
    }
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:mos
ID: 12131023
CEHJ: That doesn't help, because I have a ready state DOM-Object and there can't use the DocumentBuilderFactory anymore, right?!

Sudhakar: Thanks for the code. This seems to me the manuell way that could be a little slow and makes problem for large XML-Docs. Isn't there a API call for this? I think a lot of people needs this functionality...
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12131373
AFAIK, that is the solution. Anyhow I try give the other solution if I get

Regards
Sudha
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
ID: 12131406
setIgnoringElementContentWhitespace() simply does not work if no DTD is specified!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12131430
>>That doesn't help, because I have a ready state DOM-Object

I see. Then you'll have to visit the nodes as sudhakar has mentioned
0
 

Author Comment

by:mos
ID: 12131479
Hi sudhakar,

I tried you code, but it doesn't work.

Reason:

You hold the child nodes with NodeList list = node.getChildNodes();

Then you remove nodes with node.getParentNode().removeChild(node);

When you come back to iterate the NodeList, one element of the list is removed and
the index pointed at a wrong document. :(
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12131585
You could also try to turn them both into Strings and do something like

s = s .replaceAll(">\\s+|\\s+<", "");
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses
Course of the Month15 days, 2 hours left to enroll

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question