Solved

quick way to remove elements and text from xml

Posted on 2007-11-24
8
1,458 Views
Last Modified: 2013-11-18
Hi,

I have a large xml document, which contains some elements and text that I need to remove. For example,

<element attribute="new">text placed here</element>

The element and attribute are repeated often throughout the xml document, but with different text. Is there a quick way to remove all of these elements and the associated text and then output or produce the original xml document minus these elements and text?

If I had to do this by hand suffice to say it would take weeks.

Any help appreciated.

Thanks
0
Comment
Question by:nhay59
  • 4
  • 3
8 Comments
 
LVL 7

Expert Comment

by:jax79sg
Comment Utility
You may consider getting an xml editor to help you out with this.
A very good one would be XMLSpy. If this is a one time job, you can consider downloading the trial version and see if it fits your purpose.

Another method would be writing a small program, coupled with an xml parser. Then code that program to remove all tags you need to remove.
0
 
LVL 60

Expert Comment

by:Geert Bormans
Comment Utility
Unless you need to manually evaluate each and every element before deleting it,
it would be ridiculous to try to do this by hand
... and by the way ... XMLSpy is about the worst commercial XML editor out there, for many valid reasons

If there is a certain logic in the elements that you want to remove,
no doubt that XSLT is the way to go.
XSLT is meant for transforming an XML document into another XML document
XSLT is easy and you can quickly describe the logic for removal

You are not very explicit in your question, so I can't be very explicit in my code
But your Stylesheet will contain a template that makes a deep copy of each element
and another template that describes the rule for deleting certain elements

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="*[@attribute = 'new']"></xsl:template>

</xsl:stylesheet>

In this example
the template starting with <xsl:template match="node()">
is known as the basis of an identity copy. If the stylesheet only had this template, your result would be exactly the ame as your source

the second template
 <xsl:template match="*[@attribute = 'new']"></xsl:template>
is empty and will do no processing on each element that has an attribute with name 'attribute' and value 'new'
(practically removing it from the tree)

If you can describe the reasons for deleting an element, I can translate that into code

For executing the XSLT,
- you can automate that inside Java, .Net, Python,...  (any language will do)
- execute it command line (eg. by using Saxon, google for "download saxon xslt")
- or use a decent IDE (integrated development environment) for XML
     + www.oxygenxml.com
     + www.stylusstudio.com
both are cheaper and a lot better than XML SPY, Spy is an expensive toy

happy to help more

Geert
0
 

Author Comment

by:nhay59
Comment Utility
Hi,

Thanks for the reply. I basically just need to remove all occurrences of a particular element and the text associated with that element, so I have the same XML document minus that element and its text.

For example, I need to remove all occurrences of the following element,

<word colour="blue">text for the element.....</word>

where the text for the element will, of course, be different for each occurrence of the element and attribute. Therefore, I know the element and attribute I want to remove, but don't know the text for that element.

Thanks for all the help.
0
 

Author Comment

by:nhay59
Comment Utility
Hi,

Also, I use Oxygen for creating the XML and can remove all occurrences of the element and attribute from the document. However, I have no idea how to remove the text from within the element as well without specifying the text manually per element, which again would take a very long time. eg: I can remove all occurrences of the following element and attribute,

<word colour="blue"></word>

but the text 'text for the element' and any other text in other occurrences of this element and attribute remain in the XML. I need to remove the element, attribute and text for the element.

Any help appreciated.

Thanks
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
Comment Utility
So here is the XSLT you would need

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="word[@colour = 'blue']"></xsl:template>

</xsl:stylesheet>

Now, in Oxygen open a new XSLT document
push the button that has XSL over it
Make sure that your XML document is open an select it using the left drop down
your xslt can be Untitled1.xsl*, select that in the second dropdown
push the blue arrow, which starts the transform
in the third (right hand) pane click right (mouseclick) and "Save Results"
This new file is what you need

cheers

Geert
0
 

Author Closing Comment

by:nhay59
Comment Utility
Hi,

Thank you so much, works perfectly. Have a great weekend.

Thanks
0
 

Author Comment

by:nhay59
Comment Utility
Hi,

Thanks for all the help. The solution works exactly as required, and was very well explained.

Thanks.
0
 
LVL 60

Expert Comment

by:Geert Bormans
Comment Utility
welcome
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

What is Node.js? Node.js is a server side scripting language much like PHP or ASP but is used to implement the complete package of HTTP webserver and application framework. The difference is that Node.js’s execution engine is asynchronous and event…
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now