Regex question

hello experts,

I have one html file. I want to

1) search this file for IMG tags with only src attribute, e.g.
<IMG src="C:\Some Folder\1.jpg"> should be selected, but <IMG alt="alt img text" src="C:\Some Other Folder\11.jpg"> shouldn't be selected)
2) obtain value of the src attribute [from all the selections as in (1)] (e.g. "C:\Some Folder\1.jpg")
3) copy the image file from src location to another location, and replace the src value to relative path of that new location[for each of the selection in (1)]

regex should be the faster way to do the above tasks, otherwise I could have accomplished this with linear text processing.

Who is Participating?
Jesse HouwingScrum Trainer | Microsoft MVP | ALM Ranger | ConsultantCommented:
In that case you need to use a MatchEvaluator

            public void test(string input)
                  Regex rx = new Regex("<img\\s+src=\"(?<filename>[^\"]+)\"\\s*>", RegexOptions.IgnoreCase);
                  rx.Replace(input, new MatchEvaluator(ReplacePath));

            private string ReplacePath(Match m)
                  string path = m.Groups["filename"].Value;
                  // do stuff with obtained path
                  path = "new path here";
                  return "<img\\s+src=\"" + path + "\"\\s*>"";

This will trigger the ReplacePath function for each and every match that's found. It allows you to do anything you want in the function. at the end you must return a string in the MatchEvaluator function. This return value will replace the original text of the match.
Bob LearnedCommented:
(<IMG src=)(?<file>)(.+)>

The file name can be found in the match.Groups("file").ToString() value.  You'll have to remove the quotes from the name.


Jesse HouwingScrum Trainer | Microsoft MVP | ALM Ranger | ConsultantCommented:

should work better as it always returns the correct filename.

you can use

Match m = RegEx.Match(text, pattern, options);
while (m.Success)
      string url = m.Groups["filename"].Value;
      File file = File.Open(url, mode);
(from the top of my head, leaving some blanks, my pc just broke down, so I'm currently reinstalling everything and can't check).

If this is not enough to make things work I'll try to whip up something in VS later tonight (when everything is up and running)
RoninTheAuthor Commented:
hey Bob, i'm afraid  (<IMG src=)(?<file>)(.+)> doesn't seem to work. I changed it to (<IMG\ssrc=)(?<file>)(.+)> ,then it gave me matches but no named group- matches.

ToAoM,  your regex does work, but how would I replace the matched value with another one(#3 in que).

ps: one correction :-)
m.NextMatch(); should be m=m.NextMatch();
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.