Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Need to clean up this html. Regex needed... urgent

Posted on 2004-04-23
9
Medium Priority
?
471 Views
Last Modified: 2012-06-27
Hi All,

I am in an extreme need to solve this. I know it's not defficult, but I never worked with regex.

In a simple version of html below, I need to get rid of all <sctipt>tags</script>, all <meta> tags. I need to leave this javascript function though:
<script language="javascript" type="text/javascript">
<!--
function OnLoadReport()
{
var pageHits = null;
var rep = new Report(1, 4, pageHits, false, docMapIds);
if (parent != self) parent.OnLoadReport(rep);
}
//-->
</script>

Note that the arguments can be different of the function Report(). Here is what I have so far...(a)
      public string removeJavaCode(string oldStr) {

                string pattern = @"<script[^>]*>.*?</script[^>]*>";
                string newStr  = Regex.Replace(oldStr,pattern,"");
                return oldStr;//newStr;
      }      

This, if I pass in string.Replace("\n","") removes all javascript. I need to leave that piece though.

Thanks for any help.

Puero

/////////////////////////////////////////////////
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
</title>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<META http-equiv="Content-Style-Type" content="text/css">
<META http-equiv="Content-Script-Type" content="text/javascript">


<style type="text/css">//here being a stylesheet</style>

<script language="javascript" type="text/javascript">
<!--
//-->
</script><script language="javascript" type="text/javascript" src="?rs:Command=Get&amp;rc:GetImage=8.00.743.00Report.js">

</script><script language="javascript" type="text/javascript">
<!--
function OnLoadReport()
{
var pageHits = null;
var rep = new Report(1, 4, pageHits, false, docMapIds);
if (parent != self) parent.OnLoadReport(rep);
}
//-->
</script>

<script language="javascript" type="text/javascript" src="ReportViewer.js"></script></head>
<body onload="javascript:OnLoadReport();" style="OVERFLOW: hidden; BORDER: 0px; MARGIN: 0px; PADDING: 0px">
<div id="oReportDiv" onresize="javascript:OnResizeDiv()" style="OVERFLOW: auto; WIDTH: 100%; HEIGHT: 100%">
//here being html code.
<script language="javascript" type="text/javascript">
<!--
var docMapIds = [];
//-->
</script>

</div>
</body>
</html>
///////////////////////////////////////////////////////
0
Comment
Question by:pureo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 

Author Comment

by:pureo
ID: 10908125
Hi,

I have those links too. I need regex expression for my problem and not link how to clean up html.

Thanks,
Puero
0
 
LVL 10

Expert Comment

by:eternal_21
ID: 10909776
Just to clarify,

You want to remove all SCRIPT and META tags, except:

  1. SCRIPT tags that have the function specified (OnLoadReport), or
  2. SCRIPT tags that have any javascript functions,
  3. SCRIPT tags with any language functions ?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:pureo
ID: 10909794
Hello,

please in my first posting, don't mind this line: <script language="javascript" type="text/javascript" src="ReportViewer.js"></script>, that one is not in the source before entering the function. So the html I need to modify is the same as I posted in my first post, except this line in the head section. Sorry about that.

Thanks a lot, this is how it should look after modifications:
the result should look like this:


/////////////////////////////////////////////////
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
</title>

<style type="text/css">//here being a stylesheet</style>

<script language="javascript" type="text/javascript">
<!--
function OnLoadReport()
{
var pageHits = null;
var rep = new Report(1, 4, pageHits, false, docMapIds);
if (parent != self) parent.OnLoadReport(rep);
}
//-->
</script>

</head>
<body onload="javascript:OnLoadReport();" style="OVERFLOW: hidden; BORDER: 0px; MARGIN: 0px; PADDING: 0px">
<div id="oReportDiv" onresize="javascript:OnResizeDiv()" style="OVERFLOW: auto; WIDTH: 100%; HEIGHT: 100%">
//here being html code.

</div>
</body>
</html>
///////////////////////////////////////////////////////
0
 
LVL 10

Expert Comment

by:eternal_21
ID: 10909814
What about this part:

<script language="javascript" type="text/javascript">
<!--
//-->
</script>
<script language="javascript" type="text/javascript" src="?rs:Command=Get&amp;rc:GetImage=8.00.743.00Report.js">
</script>

Is that in the source code as well?
0
 

Author Comment

by:pureo
ID: 10909823
Yes, that part is in the source code.

Thanks.
Pureo
0
 
LVL 10

Accepted Solution

by:
eternal_21 earned 2000 total points
ID: 10909871
The following function:

  public static string ParseHtml(string sourceString)
  {
    string newString;

    // javascriptPattern matches any <META ...> tags
    const string metaPattern = @"<META[^>]*>(\r)?\n?";
    Regex metaRegex;
    metaRegex = new Regex(metaPattern, RegexOptions.Singleline|RegexOptions.IgnoreCase);
    newString = metaRegex.Replace(sourceString, "");

    // javascriptPattern matches any <SCRIPT> block that does not have a '{' or a '}'.
    const string javascriptPattern = @"<SCRIPT[^>]*>[^{}]*?</SCRIPT>(\r)?\n?";
    Regex javascriptRegex;
    javascriptRegex = new Regex(javascriptPattern, RegexOptions.Singleline|RegexOptions.IgnoreCase);
    newString = javascriptRegex.Replace(newString, "");

    return newString;
  }

Produced the output:

### OUTPUT ###

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
</title>

<style type="text/css">//here being a stylesheet</style>

<script language="javascript" type="text/javascript">
<!--
function OnLoadReport()
{
var pageHits = null;
var rep = new Report(1, 4, pageHits, false, docMapIds);
if (parent != self) parent.OnLoadReport(rep);
}
//-->
</script>

</head>
<body onload="javascript:OnLoadReport();" style="OVERFLOW: hidden; BORDER: 0px; MARGIN: 0px; PADDING: 0px">
<div id="oReportDiv" onresize="javascript:OnResizeDiv()" style="OVERFLOW: auto; WIDTH: 100%; HEIGHT: 100%">

</div>
</body>
</html>

###

Based on this source code:

### SOURCE CODE ###

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
</title>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<META http-equiv="Content-Style-Type" content="text/css">
<META http-equiv="Content-Script-Type" content="text/javascript">

<style type="text/css">//here being a stylesheet</style>

<script language="javascript" type="text/javascript">
<!--
//-->
</script><script language="javascript" type="text/javascript" src="?rs:Command=Get&amp;rc:GetImage=8.00.743.00Report.js">

</script><script language="javascript" type="text/javascript">
<!--
function OnLoadReport()
{
var pageHits = null;
var rep = new Report(1, 4, pageHits, false, docMapIds);
if (parent != self) parent.OnLoadReport(rep);
}
//-->
</script>

<script language="javascript" type="text/javascript" src="ReportViewer.js"></script></head>
<body onload="javascript:OnLoadReport();" style="OVERFLOW: hidden; BORDER: 0px; MARGIN: 0px; PADDING: 0px">
<div id="oReportDiv" onresize="javascript:OnResizeDiv()" style="OVERFLOW: auto; WIDTH: 100%; HEIGHT: 100%">
<script language="javascript" type="text/javascript">
<!--
var docMapIds = [];
//-->
</script>

</div>
</body>
</html>

###
0
 
LVL 10

Expert Comment

by:eternal_21
ID: 10909872
Is that what you are looking for?
0
 

Author Comment

by:pureo
ID: 10909897
Nice, thanks a lot!

Pureo
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

660 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question