Solved

quick way to remove elements and text from xml

Posted on 2007-11-24
8
1,467 Views
Last Modified: 2013-11-18
Hi,

I have a large xml document, which contains some elements and text that I need to remove. For example,

<element attribute="new">text placed here</element>

The element and attribute are repeated often throughout the xml document, but with different text. Is there a quick way to remove all of these elements and the associated text and then output or produce the original xml document minus these elements and text?

If I had to do this by hand suffice to say it would take weeks.

Any help appreciated.

Thanks
0
Comment
Question by:nhay59
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 7

Expert Comment

by:jax79sg
ID: 20344659
You may consider getting an xml editor to help you out with this.
A very good one would be XMLSpy. If this is a one time job, you can consider downloading the trial version and see if it fits your purpose.

Another method would be writing a small program, coupled with an xml parser. Then code that program to remove all tags you need to remove.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345075
Unless you need to manually evaluate each and every element before deleting it,
it would be ridiculous to try to do this by hand
... and by the way ... XMLSpy is about the worst commercial XML editor out there, for many valid reasons

If there is a certain logic in the elements that you want to remove,
no doubt that XSLT is the way to go.
XSLT is meant for transforming an XML document into another XML document
XSLT is easy and you can quickly describe the logic for removal

You are not very explicit in your question, so I can't be very explicit in my code
But your Stylesheet will contain a template that makes a deep copy of each element
and another template that describes the rule for deleting certain elements

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="*[@attribute = 'new']"></xsl:template>

</xsl:stylesheet>

In this example
the template starting with <xsl:template match="node()">
is known as the basis of an identity copy. If the stylesheet only had this template, your result would be exactly the ame as your source

the second template
 <xsl:template match="*[@attribute = 'new']"></xsl:template>
is empty and will do no processing on each element that has an attribute with name 'attribute' and value 'new'
(practically removing it from the tree)

If you can describe the reasons for deleting an element, I can translate that into code

For executing the XSLT,
- you can automate that inside Java, .Net, Python,...  (any language will do)
- execute it command line (eg. by using Saxon, google for "download saxon xslt")
- or use a decent IDE (integrated development environment) for XML
     + www.oxygenxml.com
     + www.stylusstudio.com
both are cheaper and a lot better than XML SPY, Spy is an expensive toy

happy to help more

Geert
0
 

Author Comment

by:nhay59
ID: 20345302
Hi,

Thanks for the reply. I basically just need to remove all occurrences of a particular element and the text associated with that element, so I have the same XML document minus that element and its text.

For example, I need to remove all occurrences of the following element,

<word colour="blue">text for the element.....</word>

where the text for the element will, of course, be different for each occurrence of the element and attribute. Therefore, I know the element and attribute I want to remove, but don't know the text for that element.

Thanks for all the help.
0
How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

 

Author Comment

by:nhay59
ID: 20345313
Hi,

Also, I use Oxygen for creating the XML and can remove all occurrences of the element and attribute from the document. However, I have no idea how to remove the text from within the element as well without specifying the text manually per element, which again would take a very long time. eg: I can remove all occurrences of the following element and attribute,

<word colour="blue"></word>

but the text 'text for the element' and any other text in other occurrences of this element and attribute remain in the XML. I need to remove the element, attribute and text for the element.

Any help appreciated.

Thanks
0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 20345395
So here is the XSLT you would need

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="word[@colour = 'blue']"></xsl:template>

</xsl:stylesheet>

Now, in Oxygen open a new XSLT document
push the button that has XSL over it
Make sure that your XML document is open an select it using the left drop down
your xslt can be Untitled1.xsl*, select that in the second dropdown
push the blue arrow, which starts the transform
in the third (right hand) pane click right (mouseclick) and "Save Results"
This new file is what you need

cheers

Geert
0
 

Author Closing Comment

by:nhay59
ID: 31410819
Hi,

Thank you so much, works perfectly. Have a great weekend.

Thanks
0
 

Author Comment

by:nhay59
ID: 20345438
Hi,

Thanks for all the help. The solution works exactly as required, and was very well explained.

Thanks.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20345469
welcome
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Test ddwrt:UserLookup 1 71
Add to XML (Powershell) 1 42
PHP delete contents of file- before writing to it 6 50
parsing xml using powershell 6 35
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question