?
Solved

quick way to remove elements and text from xml

Posted on 2007-11-24
8
Medium Priority
?
1,475 Views
Last Modified: 2013-11-18
Hi,

I have a large xml document, which contains some elements and text that I need to remove. For example,

<element attribute="new">text placed here</element>

The element and attribute are repeated often throughout the xml document, but with different text. Is there a quick way to remove all of these elements and the associated text and then output or produce the original xml document minus these elements and text?

If I had to do this by hand suffice to say it would take weeks.

Any help appreciated.

Thanks
0
Comment
Question by:nhay59
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 7

Expert Comment

by:jax79sg
ID: 20344659
You may consider getting an xml editor to help you out with this.
A very good one would be XMLSpy. If this is a one time job, you can consider downloading the trial version and see if it fits your purpose.

Another method would be writing a small program, coupled with an xml parser. Then code that program to remove all tags you need to remove.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345075
Unless you need to manually evaluate each and every element before deleting it,
it would be ridiculous to try to do this by hand
... and by the way ... XMLSpy is about the worst commercial XML editor out there, for many valid reasons

If there is a certain logic in the elements that you want to remove,
no doubt that XSLT is the way to go.
XSLT is meant for transforming an XML document into another XML document
XSLT is easy and you can quickly describe the logic for removal

You are not very explicit in your question, so I can't be very explicit in my code
But your Stylesheet will contain a template that makes a deep copy of each element
and another template that describes the rule for deleting certain elements

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="*[@attribute = 'new']"></xsl:template>

</xsl:stylesheet>

In this example
the template starting with <xsl:template match="node()">
is known as the basis of an identity copy. If the stylesheet only had this template, your result would be exactly the ame as your source

the second template
 <xsl:template match="*[@attribute = 'new']"></xsl:template>
is empty and will do no processing on each element that has an attribute with name 'attribute' and value 'new'
(practically removing it from the tree)

If you can describe the reasons for deleting an element, I can translate that into code

For executing the XSLT,
- you can automate that inside Java, .Net, Python,...  (any language will do)
- execute it command line (eg. by using Saxon, google for "download saxon xslt")
- or use a decent IDE (integrated development environment) for XML
     + www.oxygenxml.com
     + www.stylusstudio.com
both are cheaper and a lot better than XML SPY, Spy is an expensive toy

happy to help more

Geert
0
 

Author Comment

by:nhay59
ID: 20345302
Hi,

Thanks for the reply. I basically just need to remove all occurrences of a particular element and the text associated with that element, so I have the same XML document minus that element and its text.

For example, I need to remove all occurrences of the following element,

<word colour="blue">text for the element.....</word>

where the text for the element will, of course, be different for each occurrence of the element and attribute. Therefore, I know the element and attribute I want to remove, but don't know the text for that element.

Thanks for all the help.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:nhay59
ID: 20345313
Hi,

Also, I use Oxygen for creating the XML and can remove all occurrences of the element and attribute from the document. However, I have no idea how to remove the text from within the element as well without specifying the text manually per element, which again would take a very long time. eg: I can remove all occurrences of the following element and attribute,

<word colour="blue"></word>

but the text 'text for the element' and any other text in other occurrences of this element and attribute remain in the XML. I need to remove the element, attribute and text for the element.

Any help appreciated.

Thanks
0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 2000 total points
ID: 20345395
So here is the XSLT you would need

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="word[@colour = 'blue']"></xsl:template>

</xsl:stylesheet>

Now, in Oxygen open a new XSLT document
push the button that has XSL over it
Make sure that your XML document is open an select it using the left drop down
your xslt can be Untitled1.xsl*, select that in the second dropdown
push the blue arrow, which starts the transform
in the third (right hand) pane click right (mouseclick) and "Save Results"
This new file is what you need

cheers

Geert
0
 

Author Closing Comment

by:nhay59
ID: 31410819
Hi,

Thank you so much, works perfectly. Have a great weekend.

Thanks
0
 

Author Comment

by:nhay59
ID: 20345438
Hi,

Thanks for all the help. The solution works exactly as required, and was very well explained.

Thanks.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345469
welcome
0

Featured Post

Cloud Training Guides

FREE GUIDES: In-depth and hand-crafted Linux, AWS, OpenStack, DevOps, Azure, and Cloud training guides created by Linux Academy instructors and the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface This article introduces an authentication and authorization system for a website.  It is understood by the author and the project contributors that there is no such thing as a "one size fits all" system.  That being said, there is a certa…
Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question