Posted on 2004-03-31
Can anyone help with this regular expression which I found seaching through the PAQ's.
Regex.Replace(HtmlData, @"\s+|\s*((<script>((?!</script>).)+</script>|<([^>]|""[^""]*""|'[^']*')*>)\s*)+", " ").Trim();
When I try to use it I receive the following error.
Description: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately.
Compiler Error Message: CS1010: Newline in constant
Line 29: Session["InsertCatID"] = Request.Form["Category"];
Line 30: Session["InsertCountryID"] = Request.Form["Country"];
Line 31: Session["InsertCityID"] = Request.Form["City"];
Line 32: Session["PenaltyID"] = Request.Form["County"];
Line 33: Session["InsertTitle"] = strTitle;
Source File: C:\**.aspx Line: 31
Line 31 has nothing to do with this bit of code.
I'm trying to use the code below:
StringBuilder strTextBuilder=new StringBuilder();
foreach (Match match in Regex.Replace(HtmlData, @"\s+|\s*((<script>((?!</script>).)+</script>|<([^>]|""[^""]*""|'[^']*')*>)\s*)+", " ").Trim(); RegexOptions.IgnoreCase|RegexOptions.Singleline))
strTextBuilder.Append(match.Value); // use match.Groups["content"].Value to get rid of the tag
What I need to achieve is remove all text between < and > also text between <script>code here</script> or simaliar, then remove excess white space so that text is presented neatly.
Big job and I have no idea how to do, even after reading Mastering Regular Expressions.
Any help would be appreciated