Link Conversion in C# (500 Pts)

I have thousands of HTML files that are going to be tied into a new system.  The new system uses IFrames to display the content in these HTML files.  The problem is that most of the these files have hundreds of links within each file, and the TARGET attribute varys from "_SELF" to "_TOP", etc.  

The HTML is read into a string using C#, and then outputted within a header/footer scenario inside the IFrames.  Ideally, I would like to use that string, do some sort of REGEX detection of those tags, correct them to have a TARGET="_PARENT" Attribute, and then perform the outputting to the IFrame.

All I need from you experts is a way to detect if a tag is a link, detect if it already has a a TARGET attribute, and make sure the TARGET attribute is set to "_Parent" before moving on to the next link in the string.

Looking forward to your answers!

Thanks in advance

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Fernando SotoRetiredCommented:
Hi mmarksbury;

I am no expert in HTML so I want to understand the request first.
1 - Detect if a tag is a link.
2 - Check to see if it has a TARGET
3 - If it has a TARGET make sure it is set to "_PARENT"

1 - Are links HREF only?
2 - Does any other tags other then links have a TARGET attribute ?
3 - Could I search for the attribute TARGET and check to see if is set to "_PARENT" without looking for links?
mmarksburyAuthor Commented:
You have the understanding correct.

Most likely, TARGET attributes will only apply to links, so it should be fine to only look for the TARGET attribute.
Fernando SotoRetiredCommented:
Hi mmarksbury;

This should do it for you. The code reads the entire  HTML file to be more efficient.

            private void bttnReplaceTarget_Click(object sender, System.EventArgs e)
                  // Sets up callback for regex
                  MatchEvaluator TargetEvaluator =
                        new MatchEvaluator(TARGETCheck);
                  // Sets up new regex with a pattern to match
                  Regex re = new Regex(@"TARGET\s*?=\s*?""(?<Attrib>.+?)""",
                        RegexOptions.IgnoreCase | RegexOptions.Singleline);
                  StreamReader sr = new StreamReader("ReplaceTargets.HTML");
                  StreamWriter sw = new StreamWriter("ModifiedTargets.HTML");
                  string htmlFile;
                  string modifiedHtmlFile;
                  // Read the entire HTML file in to string variable
                  // This is more efficient then reading one line at a time
                  htmlFile = sr.ReadToEnd();
                  // Call the regex class to replace the patterns in the file
                  modifiedHtmlFile = re.Replace(htmlFile, TargetEvaluator);
                  // Write the modified HTML file to disk
            private string TARGETCheck(Match m)
                  // This function is called by regex to do the
                  // actual replacement of the pattern
                  return m.Value.Replace(m.Groups["Attrib"].Value,

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mmarksburyAuthor Commented:
Works great, only if a link does not have a TARGET attribute, this code does not add one.  Any suggestions on how to do this?  I suspect you will have to do a match for a link tag, then check for the attribute and add it or change it depending on what is needed.
mmarksburyAuthor Commented:
With your help, I made it the rest of the way . . . Following is the code used.

string stringToMatch = "Some Html, <a href=\"somepage.htm\" target=\"self\"><br /><b>Test</b>";
string NewString = string.Empty;

Regex LinksRegex = new Regex(@"<a\s+([^>]+)>([^<]+)</a>");
foreach(Match M in LinksRegex.Matches(stringToMatch))
     Regex TargetRegex = new Regex(@"TARGET\s*?=\s*?""(?<Attrib>.+?)""", RegexOptions.IgnoreCase | RegexOptions.Singleline);
     Match TargetMatch = TargetRegex.Match(M.ToString());
     NewString = TargetRegex.Replace(NewString, TargetMatch.Value.Replace(TargetMatch.Groups["Attrib"].Value,"_top"));

Thanks.  Points awarded.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.