Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Remove Cdata from xslt output

Posted on 2007-10-21
13
Medium Priority
?
5,550 Views
Last Modified: 2013-11-19
In this file:
http://rss.wunderground.com/auto/rss_full/CO/Aurora.xml?units=both
You will see a Cdata section under <description>.
I would like to use all the <description> elements except that one. Can I omit Cdata sections?
This is what I have so far:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="channel">
    <xsl:apply-templates select="item" />
  </xsl:template>
<xsl:template match="item">
<xsl:apply-templates select="item" />
<font face="verdana" color="#CCCCCC" size="2" />
      <b><xsl:value-of select="title" /></b><br />
      <xsl:value-of select="description" />
      <br /><br />
</xsl:template>
</xsl:stylesheet>

0
Comment
Question by:chekmate111
  • 5
  • 3
  • 2
  • +2
12 Comments
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 576 total points
ID: 20118552
> Can I omit Cdata sections?

No, the parser will access the XML before it is sent to the XSLT processor
and the CDATA section will appear for the XSLT as an ordinairy text node
So there is no way to differentiate between a description node that used to be a CDATA section and one that was simple text

You need to find other indicators in the text to ignore the description.
maybe you could use the existing of " | " in the data
If you can't do that, you could preprocess the XML
with some regular expression in the text stream, you could replace "<![CDATA[" with "<![CDATA[###"
and in the XSLT you could then remove the text nodes that start with ###
just a thought
If you want to do something about it, it needs to be outside the XML parser environment,
because basically for parsers text nodes and CDATA are more or less the same

I hope this helps

Geert
0
 
LVL 12

Assisted Solution

by:jkmyoung
jkmyoung earned 572 total points
ID: 20127069
It looks like there is only a cdata section if there is a child node, eg an image inside the description node.
You could do a test like so:
<xsl:if test="description[not(*)]">
... Output the description...
</xsl:if>
0
 

Author Comment

by:chekmate111
ID: 20127090
Could you write me an example using the
http://rss.wunderground.com/auto/rss_full/CO/Aurora.xml?units=both
as your file? The reason I ask is because I am not sure I understand the syntax of [not(*)].
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20127250
joseph,
I don't think not(*) will work because the <img> you are refering to,
is inside the CDATA section and will not be accessed as a child node, but as a "&lt;img ...." text node

if the existence of teh img pseudo tag is a possible trigger
you could use
description[contains(., '&lt;img')]
which is similar to checking for "|"

Bottom line is
- if you can find a trigger inside the text that will tell you not to use that description (eg. teh img pseudo tag or a | or whatever)
you can use the contains() in the test
- if not you will have to preprocess the XML before the parser gets to it

cheers

Geert
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20127262
in your code that would mean

replace
     <xsl:value-of select="description" />

with
<xsl:if test="not(contains(description, '&lt;img'))">
     <xsl:value-of select="description" />
</xsl:if>

cheers

Geert
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 20127276
if for one reason or another you would like the content of description, but not in its escaped form
you can do this
        <xsl:value-of select="description" disable-output-escaping="yes"/>
0
 

Author Comment

by:chekmate111
ID: 20127277
Okay, I think I understand now.
I would love to split the points between you two if possible. Would that be okay with you guys?
0
 
LVL 39

Assisted Solution

by:abel
abel earned 572 total points
ID: 20127342
If you have access to using XSLT 2.0, there's actually a non-trivial (and rather non-recommendable) workaround for finding whether a node has a CDATA section (using unparsed-text() and regular expressions in conjunction with normal tests). The problem is to get it right (you are actually going to parse it twice, once as nodes, once as text and you have to synchronize the two).

Effectively it is far more easier to either use an extension function (one that has access to the loaded DOM, which in turn has information on the CDATA-ness of a node) or, even easier, go with the pre-parse or text-recognition solution as proposed above.

(this was a point in the line of "it can be done", not that is *should* be done)

-- Abel --
0
 
LVL 39

Expert Comment

by:abel
ID: 20127351
PS: if you don't need to do it in one pass, then you can use XSLT 2.0 both for preprocessing (regex) and processing (xslt), like Geert proposed.
0
 

Author Comment

by:chekmate111
ID: 21346509
You could use xpath to work around it.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 21347813
I see that the asker has added a suggestion and asks that to be accepted as the solution.
What the asker suggests is NOT a solution to the original question.
   His solution requires parsing the XML to allow XPath to access to it, CDATA sections will be long gone then, as has been mentioned before.
The question has been answered correctly and extensively before.
So I DO object to this measure.
0
 
LVL 1

Expert Comment

by:modus_operandi
ID: 21419478
Force accepted.
modus_operandi
EE Moderator
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is Node.js? Node.js is a server side scripting language much like PHP or ASP but is used to implement the complete package of HTTP webserver and application framework. The difference is that Node.js’s execution engine is asynchronous and event…
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question