Solved

Change Tags inside HTML document programmatically

Posted on 2010-11-22
3
531 Views
Last Modified: 2013-12-17
Hi,

I need a way for to change some tags of HTML documents.
Eg.:
Load document
scan for <a>
- if found anchor read href and get URL, http://urltomyside.com/
- change http://urltomyside.com/ to http://urltoanyotherside.com
save document.

Any solution for?

Thanks

Andre
0
Comment
Question by:andre72
  • 2
3 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34188543
Where are you doing the changing? Within code (i.e. during runtime) or within the editor (i.e. during design time)?
0
 

Author Comment

by:andre72
ID: 34188670
I need to change it at runtime so the whole HTML content will be inside a string or stream ...
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34190162
I would think there would be some fancy class to work with a DOM document, but I don't know what it is off the top of my head, so I must fall back on regular expressions. Here is a pattern that should do the job, and an explanation of what it means:
source = System.Text.RegularExpressions.Regex.Replace(source, @"(<a\s+[^>]*href=[""']?)http://urltomyside\.com/", "$1http://urltoanyotherside.com", System.Text.RegularExpressions.RegexOptions.IgnoreCase);


// ( ... )                  -  capturing parentheses
// <a                       -  find a literal "<a"
// \s+                      -  find one or more ( + ) whitespace ( \s ) characters
// [^>]*                    -  find zero or more ( * ) of any character NOT ( [^ ...] ) a closing bracket ( > )
// href=                    -  find a literal "href="
// [""']?                   -  find zero or one ( ? ) of either a double- or single-quote ( ["'] ); there are two double-quotes because it has to be escaped for C#
// http://urltomyside\.com  - find the url; Note, the dot ( . ) has to be escaped ( \. ) for the pattern because it is a special character in regex


//  In the replace, you put the replacement URL as normal
//  (i.e. no special characters); however, we inclue
//  $1 at the beginning of it so that the text we captured
//  with the parentheses described above is inserted with
//  the replacement URL. If we don't include the $1, then
//  you will erase the "<a>" up to where the old URL is found.

Open in new window

0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question