Solved

quick way to remove elements and text from xml

Posted on 2007-11-24
8
1,462 Views
Last Modified: 2013-11-18
Hi,

I have a large xml document, which contains some elements and text that I need to remove. For example,

<element attribute="new">text placed here</element>

The element and attribute are repeated often throughout the xml document, but with different text. Is there a quick way to remove all of these elements and the associated text and then output or produce the original xml document minus these elements and text?

If I had to do this by hand suffice to say it would take weeks.

Any help appreciated.

Thanks
0
Comment
Question by:nhay59
  • 4
  • 3
8 Comments
 
LVL 7

Expert Comment

by:jax79sg
ID: 20344659
You may consider getting an xml editor to help you out with this.
A very good one would be XMLSpy. If this is a one time job, you can consider downloading the trial version and see if it fits your purpose.

Another method would be writing a small program, coupled with an xml parser. Then code that program to remove all tags you need to remove.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345075
Unless you need to manually evaluate each and every element before deleting it,
it would be ridiculous to try to do this by hand
... and by the way ... XMLSpy is about the worst commercial XML editor out there, for many valid reasons

If there is a certain logic in the elements that you want to remove,
no doubt that XSLT is the way to go.
XSLT is meant for transforming an XML document into another XML document
XSLT is easy and you can quickly describe the logic for removal

You are not very explicit in your question, so I can't be very explicit in my code
But your Stylesheet will contain a template that makes a deep copy of each element
and another template that describes the rule for deleting certain elements

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="*[@attribute = 'new']"></xsl:template>

</xsl:stylesheet>

In this example
the template starting with <xsl:template match="node()">
is known as the basis of an identity copy. If the stylesheet only had this template, your result would be exactly the ame as your source

the second template
 <xsl:template match="*[@attribute = 'new']"></xsl:template>
is empty and will do no processing on each element that has an attribute with name 'attribute' and value 'new'
(practically removing it from the tree)

If you can describe the reasons for deleting an element, I can translate that into code

For executing the XSLT,
- you can automate that inside Java, .Net, Python,...  (any language will do)
- execute it command line (eg. by using Saxon, google for "download saxon xslt")
- or use a decent IDE (integrated development environment) for XML
     + www.oxygenxml.com
     + www.stylusstudio.com
both are cheaper and a lot better than XML SPY, Spy is an expensive toy

happy to help more

Geert
0
 

Author Comment

by:nhay59
ID: 20345302
Hi,

Thanks for the reply. I basically just need to remove all occurrences of a particular element and the text associated with that element, so I have the same XML document minus that element and its text.

For example, I need to remove all occurrences of the following element,

<word colour="blue">text for the element.....</word>

where the text for the element will, of course, be different for each occurrence of the element and attribute. Therefore, I know the element and attribute I want to remove, but don't know the text for that element.

Thanks for all the help.
0
 

Author Comment

by:nhay59
ID: 20345313
Hi,

Also, I use Oxygen for creating the XML and can remove all occurrences of the element and attribute from the document. However, I have no idea how to remove the text from within the element as well without specifying the text manually per element, which again would take a very long time. eg: I can remove all occurrences of the following element and attribute,

<word colour="blue"></word>

but the text 'text for the element' and any other text in other occurrences of this element and attribute remain in the XML. I need to remove the element, attribute and text for the element.

Any help appreciated.

Thanks
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 20345395
So here is the XSLT you would need

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="word[@colour = 'blue']"></xsl:template>

</xsl:stylesheet>

Now, in Oxygen open a new XSLT document
push the button that has XSL over it
Make sure that your XML document is open an select it using the left drop down
your xslt can be Untitled1.xsl*, select that in the second dropdown
push the blue arrow, which starts the transform
in the third (right hand) pane click right (mouseclick) and "Save Results"
This new file is what you need

cheers

Geert
0
 

Author Closing Comment

by:nhay59
ID: 31410819
Hi,

Thank you so much, works perfectly. Have a great weekend.

Thanks
0
 

Author Comment

by:nhay59
ID: 20345438
Hi,

Thanks for all the help. The solution works exactly as required, and was very well explained.

Thanks.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345469
welcome
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Problem to event 3 76
Coldfusion- Create and save form elements in Database 7 61
Convert XML to excel12book 5 25
Get the parent node - XMLTYPE 9 56
Most of the sites are being standardized with W3C Web Standards. W3C provides lot of web standard services to the web. They have the web specification, process and documentation for all the web standards. You can apply HTML, CSS and Accessibility st…
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now