Wrap text of scripttags into CDATA

Hello all

I'm trying to wrap the inner text of script tags into CDATA section.
I have the following code already (which I copied from another side) but works partial :
Regex regScriptCDATA = new Regex("(<script[^<>]*>)([^<>]+)(<\\/script>)");
MatchCollection m = regScriptCDATA.Matches(strHtml);
String strScriptCDATA = "$1" + "//<![CDATA[\n" + "$2" + "//]]>\n" + "$3";
strHtml = regScriptCDATA.Replace(strHtml, strScriptCDATA);

Open in new window

My problem is that if the text contains < or > it won't wrap.

The problem is that I have a website which (still) needs to be XHTML compatible. It's an ASP.NET application with hundreds of usercontrols which have script inside their markup  sometimes wrapped into CDATA.

Instead of modifying each usercontrol and putting CDATA into these scripts I was thinking of modifying the HTML by replacing the script tags which don't have CDATA before rendering the site.

Using regular expressions would be preferable.
LVL 19
Albert Van HalenAnalyst developerAsked:
Who is Participating?
 
Terry WoodsConnect With a Mentor IT GuruCommented:
Try this pattern, using a negative lookahead to check each character between the script tags:

(<script[^<>]*>)((?:(?!CDATA|<\/script>)[\w\W])+)(<\/script>)

Open in new window


Also, rather than using [^<>], which gets stuck part way through the first script tag, I've used the negative lookahead to ensure we don't go past the closing script tag. This would also mean that other tags can be contained within the script tags.

It seems to work in myregextester.com

I used [\w\W] to match any one character, but if you activate single line mode you could just use . instead.
0
 
Terry WoodsIT GuruCommented:
Can you please give an example of the input, the output you're currently getting, and the output you want to get?
0
 
Albert Van HalenAnalyst developerAuthor Commented:
Hi Terry

The input which I have at the moment look like this
<script type="text/javascript">
function test() {
var x = 0;
if(x > 0)
    alert('test');
}
</script>
<script type="text/javascript">
function test2() {
var x = 0;
if(x == 0)
    alert('test');
}
</script>
<script type="text/javascript">
//<![CDATA[
function test3() {
var x = 0;
if(x > 0)
    alert('test');
}
//]]>
</script>

Open in new window

The output I'm getting is this
<script type="text/javascript">
function test() {
var x = 0;
if(x > 0)
    alert('test');
}
</script>
<script type="text/javascript">
//<![CDATA[
function test2() {
var x = 0;
if(x == 0)
    alert('test');
}
//]]>
</script>
<script type="text/javascript">
//<![CDATA[
function test3() {
var x = 0;
if(x > 0)
    alert('test');
}
//]]>
</script>

Open in new window

Not that the second code block is wrapped into CDATA but the first is not. The third code block was already wrapped into CDATA so that's OK.

The reason that the first code block isn't wrapped into CDATA is because the innertext of the script node contains a '>' character.

Basically I want to have a regex searching for script tags which do not contain CDATA.

I hope this is clear for you.
Thanks in advance !
0
 
Albert Van HalenAnalyst developerAuthor Commented:
Excellent, this is exactly what I want. Thanks !!
0
All Courses

From novice to tech pro — start learning today.