Solved

c# find multiple occurances of a html tag in a string then edit the tag

Posted on 2011-03-02
44
580 Views
Last Modified: 2012-05-11
Hi all i have a string as such

string EditorText = "test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
hpoefully you get the jist

what i need is a function that will search the string for the <img /> tag then modify the src of any occurances

so the above string would become
string EditorText = "test blah blakasjklmasd klasdkl <img src='edited1' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='edited2' />";

where the names would be incremented for each one

so for 10 images with all different names would be replaced with edtied1,edited2,edited3,edited4 and so on
but i also need so extract the source too for use elsewhere

so the whole string needs to be returned as is with the img src's replaced and i need the src paths in variables or strings

can this be done and how?

Thanks
0
Comment
Question by:awilderbeast
  • 17
  • 17
  • 9
  • +1
44 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 35017480
here:
public static void Main()
        {
            string src = "test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
            List<string> listImageSource = new List<string>();
            src = ExtractImages(src, ref listImageSource);
        }

        private static string ExtractImages(string src, ref List<string> listImageSource)
        {
            int counter = 1;
            int toIndex = 0, fromIndex = 0;

            do
            {
                if (toIndex > src.Length) break;
                fromIndex = src.IndexOf("<img", toIndex);
                toIndex = src.IndexOf("/>", fromIndex);
                string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
                string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
                listImageSource.Add(tokens[1]);
                src = src.Replace(part, string.Format("<img src='edited{0}' />", counter++));
            } while (true);
            return src;
        }

Open in new window

0
 
LVL 15

Expert Comment

by:angus_young_acdc
ID: 35017586
Do you specifically need them to be titled "edited1", "edited2", etc.  If not here is a fast an easy way to do it:
string updatedTex = Regex.Replace(EditorText, "<[^>]*>", "<img src='edited' />");
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35017659
I think I might use RegularExpressions here... this seemed to test okay for me, maybe sedgwick can have a look and make sure I didn't screw up the RegEx:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
	class Program
	{
		static void Main(string[] args)
		{
			

			string testString = "blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";

			testString = ReplaceImgTags(testString);

			Console.WriteLine(testString);
			Console.ReadKey();
		}

		static string ReplaceImgTags(string input)
		{
			Regex regex = new Regex(@"<img\s+src\s*=\s*('|"")(.*?)('|"")\s*/>", RegexOptions.IgnoreCase);

			int counter = 0;
			return regex.Replace(input, delegate(Match m)
			{
				counter++;
				return String.Format("<img src='edited{0}' />", counter);
			});
		}
	}
}

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35017680
@tgerbert
yes, it works like a charm with Regex, well done :)
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35017776
thanks that first bit works for me :)

next bit how do i get the old src out of it?
another function or the same one?

like i need to know

edited1 = /images/doggy.gif
edited2 = /images/kitty.gif

you follow?

thanks alot for your help thus far
0
 
LVL 33

Assisted Solution

by:Todd Gerbert
Todd Gerbert earned 250 total points
ID: 35017806
I think sedgwick's example already did that...here's the RegEx version modified:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
	class Program
	{
		static void Main(string[] args)
		{


			string testString = "blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
			List<string> originalImageSources = new List<string>();
			testString = ReplaceImgTags(testString, originalImageSources);

			Console.WriteLine("Modified string: " + testString);
			Console.WriteLine("Original image sources:");
			foreach (string img in originalImageSources)
				Console.WriteLine("\t" + img);
			Console.ReadKey();
		}

		static string ReplaceImgTags(string input, List<string> oldImageSources)
		{
			Regex regex = new Regex(@"<img\s+src\s*=\s*('|"")(.*?)('|"")\s*/>", RegexOptions.IgnoreCase);

			int counter = 0;
			return regex.Replace(input, delegate(Match m)
			{
				counter++;
				oldImageSources.Add(m.Groups[2].Value);
				return String.Format("<img src='edited{0}' />", counter);
			});
		}
	}
}

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35017857
@awilderbeast

i did that in my post.
the function populates list of string with the original image sources paths.
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35017897
how do i call an external string in replace of teststring

also how do i return each seperate value?
two lists?

i know ive mucked it up but im trying lol

thansk for your patience


public static void Main(string[] args, string StrSource)
    {

        List<string> originalImageSources = new List<string>();
        List<string> EditedImages = new List<string>();

        StrSource = ReplaceImgTags(StrSource, originalImageSources);

        foreach (string orginal in originalImageSources)
        {
            originalImageSources.Add(orginal);
        }
        foreach (string edited in EditedImages)
        {
            EditedImages.Add(edited);
        }
        return originalImageSources;
        return EditedImages;
    }
    static string ReplaceImgTags(string input, List<string> oldImageSources)
    {
        Regex regex = new Regex(@"<img\s+src\s*=\s*('|"")(.*?)('|"")\s*/>", RegexOptions.IgnoreCase);

        int counter = 0;
        return regex.Replace(input, delegate(Match m)
        {
            counter++;
            return String.Format("<img src='cid:image{0}' />", counter);
        });
    }

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35017990
if you need both edited and original image sources then use the following code.
the dictionary is now consists of keyValue pair which means that for each item in the dictionary, the key is the original image source and the value is the edited one.
so for example, after running the code, the dictionary will contain as shown in the screenshot.

public static void Main()
        {
            string src = "test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
            Dictionary<string, string> imageSources = new Dictionary<string, string>();
            string output = ExtractImages(src, ref imageSources);
        }

        private static string ExtractImages(string src, ref Dictionary<string, string> imageSources)
        {
            int counter = 1;
            int toIndex = 0, fromIndex = 0;

            do
            {
                if (toIndex > src.Length) break;
                fromIndex = src.IndexOf("<img", toIndex);
                toIndex = src.IndexOf("/>", fromIndex);
                string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
                string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
                imageSources.Add(tokens[1], string.Format("edited{0}.jpg", counter));
                src = src.Replace(part, string.Format("<img src='edited{0}.jpg' />", counter++));
            } while (true);
            return src;
        }

Open in new window

untitled.JPG
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35018119
thanks!! :)

one more... how do i use it all?

hpoefully my absimal code below will explain what id like to do with it lol




Source = functions.Main("test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d mo";
foreach (source as result){

output1.text += result.key;
output2.text += result.value;

}

editor.text = source.?the edited source with the replaced images?

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35018121
>> how do i call an external string in replace of teststring

I just used that for testing, you can put the ReplaceImgTags() method in any appropriate class, and as written you can pass any string to it.  So, for example, if wanted to read each line of c:\test.txt - replace the <img/> tags - then write that line to c:\modified.txt, your code might look kinda sort similar to (this isn't good code, just for the same of example):
StreamReader sr = new StreamReader("C:\\test.txt");
StreamWriter sw = new StreamWriter("C:\\modified.txt");
while (!sr.EndOfStream)
	sw.WriteLine(ReplaceImgTags(sr.ReadLine()));

Open in new window



>> also how do i return each seperate value? two lists?
Note that you have a minor typo in your latest snippet, oldImageSources.Add(m.Groups[2].Value); should appear between counter++; and return String.Format("<img src='edited{0}' />", counter);.  You pass in some string and a List<string> to the ReplaceImgTags method; ReplaceImgTags will return the string you passed in, but with the modified <img /> tags, and will populate the List<string> you passed in with the original image sources (in the order they're found in the string). I didn't bother saving the replacement values because they're known constants; e.g. the first image source in the list would have been replaced by "cid:image1", and the second source in the list would have been replaced by "cid:image2", etc.
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35018151
i created a new static class for you called HtmlImageExtractor.
to use it like this:

 string src = "test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
            Dictionary<string, string> imageSources = new Dictionary<string, string>();
            string output = HtmlImageExtractor.ExtractImages(src, ref imageSources);
static class HtmlImageExtractor
    {
        public static string ExtractImages(string src, ref Dictionary<string, string> imageSources)
        {
            int counter = 1;
            int toIndex = 0, fromIndex = 0;

            do
            {
                if (toIndex > src.Length) break;
                fromIndex = src.IndexOf("<img", toIndex);
                toIndex = src.IndexOf("/>", fromIndex);
                string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
                string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
                imageSources.Add(tokens[1], string.Format("edited{0}.jpg", counter));
                src = src.Replace(part, string.Format("<img src='edited{0}.jpg' />", counter++));
            } while (true);
            return src;
        }
    }

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35018181
then to dump the results to a file:

string src = "test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />";
            Dictionary<string, string> imageSources = new Dictionary<string, string>();
            string output = HtmlImageExtractor.ExtractImages(src, ref imageSources);
            File.WriteAllLines(@"c:\temp\origin_images.txt", imageSources.Keys.ToArray());
            File.WriteAllLines(@"c:\temp\edited_images.txt", imageSources.Values.ToArray());
            File.WriteAllText(@"c:\temp\edited.html", output);

Open in new window

0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35018422
sorry guys but now im really confused :S

let me try, seriously i have no idea what im doing here :S

the source will be a textbox from a webpage and the destination will be for now another textbox and another function

so i have SrcTextbox = images srcs get replaced with edit{0}

then i have dictionary
key = originalfilepath
value= edit{0}

these functions go in my functions file in my appcode folder
then they are called from any page that needs them

im confusing myself tryign to get round all these! lol
public static Dictionary<string, string> ImgReplacerDict(string StrSource)
    {
        Dictionary<string, string> imageSources = new Dictionary<string, string>();
        string output = ExtractImages(StrSource, ref imageSources);
        return imageSources;
        
    }
    public static string ImgReplacer(string StrSource)
    {
        string output = ExtractImages(StrSource);
        return output;
    }

    private static string ExtractImages(string src, ref Dictionary<string, string> imageSources)
    {
        int counter = 1;
        int toIndex = 0, fromIndex = 0;

        do
        {
            if (toIndex > src.Length) break;
            fromIndex = src.IndexOf("<img", toIndex);
            toIndex = src.IndexOf("/>", fromIndex);
            string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
            string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
            imageSources.Add(tokens[1], string.Format("edited{0}.jpg", counter));
            src = src.Replace(part, string.Format("<img src='edited{0}.jpg' />", counter++));
        } while (true);
        return src;
    }

Open in new window

0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35018919
ok trying to follow your examples i put it in the same page and put it on a click event

the code beow comes back with

Index was outside the bounds of the array.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.IndexOutOfRangeException: Index was outside the bounds of the array.

Source Error:


Line 376:            string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
Line 377:            string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
Line 378:            imageSources.Add(tokens[1], string.Format("edited{0}.jpg", counter));
Line 379:            src = src.Replace(part, string.Format("<img src='edited{0}.jpg' />", counter++));
Line 380:        } while (true);
protected void Test(object sender, EventArgs e)
    {
        string src = Editor1.XHTML;
        Dictionary<string, string> imageSources = new Dictionary<string, string>();
        string output = ExtractImages(src, ref imageSources);

        foreach (var item in imageSources)
        {
            OutPut2.Text += "<br />" + item.Key;
            OutPut3.Text += "<br />" + item.Value;
            Editor1.Text = output;
        }
        EditorUpdatePanel.Update();
    }

    

    private static string ExtractImages(string src, ref Dictionary<string, string> imageSources)
    {
        int counter = 1;
        int toIndex = 0, fromIndex = 0;

        do
        {
            if (toIndex > src.Length) break;
            fromIndex = src.IndexOf("<img", toIndex);
            toIndex = src.IndexOf("/>", fromIndex);
            string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
            string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
            imageSources.Add(tokens[1], string.Format("edited{0}.jpg", counter));
            src = src.Replace(part, string.Format("<img src='edited{0}.jpg' />", counter++));
        } while (true);
        return src;
    }

Open in new window

0
 
LVL 33

Assisted Solution

by:Todd Gerbert
Todd Gerbert earned 250 total points
ID: 35019040
Okay, now we're kinda getting into more of a general C# concept kinda question, and away from the specifics of replacing an text in a string. Whether you use sedgewick's approach or the RegEx method method is largely a matter of your personal preference, select whichever answer you're most comfortable with (they are 6 of one and a half-dozen of another for the most part).

Whichever way you go the general concept I'm trying to explain now is still applicable, but NOT necessarily the exact code I'm about to post. I expect you to be able to glean some basic conceptual knowledge and apply it to your particular situation, modifying the code examples in this post as necessary to fit your project.

I have a test web page with three text boxes: SourceTextBox, DestinationTextBox, and ImagePathsMapTextBox and one button.  After clicking the button, the text boxes respective contents are:
SourceTextBox:
test blah blakasjklmasd klasdkl <img src='/images/doggy.jpg' /> kaslasmnkldklvmnklvml;d more rubbish text <img src='/images/kitty.jpg' />

Open in new window

DestinationTextBox:
test blah blakasjklmasd klasdkl cid:image1 kaslasmnkldklvmnklvml;d more rubbish text cid:image2

Open in new window

ImagePathsMapTextBox:
/images/doggy.jpg => cid:image1
/images/kitty.jpg => cid:image2

Open in new window



To achieve this I added a class file, ImgTagReplacer.cs, to my App_Code folder.  The contents of this file is below; note that I modified it slightly from my above posts to allow for a little more flexibility.
using System;
using System.Collections.Generic;
using System.Web;
using System.Text.RegularExpressions;

public class ImgTagReplacer
{
	private static Regex regex = new Regex(@"<img\s+src\s*=\s*('|"")(.*?)('|"")\s*/>", RegexOptions.Compiled | RegexOptions.IgnoreCase);

	public static string ReplaceImgTags(string SourceHtml, string ReplacementFormat, int FirstNumber, List<string> OriginalImageSources)
	{
		try
		{
			String.Format(ReplacementFormat, FirstNumber);
		}
		catch (ArgumentNullException)
		{
			throw new ArgumentException(
				"ReplacementFormat should contain the string to serve as the replacement for the <img> source; it MUST contain at least \"{0}\", to indicate where the number occurs in the replacement.",
				"ReplacementFormat");
		}
		catch (FormatException)
		{
			throw new ArgumentException(
				"ReplacementFormat should contain the string to serve as the replacement for the <img> source; it MUST contain at least \"{0}\", to indicate where the number occurs in the replacement.",
				"ReplacementFormat");
		}

		int counter = FirstNumber - 1;
		return regex.Replace(SourceHtml, delegate(Match m)
		{
			counter++;
			if (OriginalImageSources != null)
				OriginalImageSources.Add(m.Groups[2].Value);
			return String.Format(ReplacementFormat, counter);
		});
	}
}

Open in new window


In the button click event on my test page, I call ImgTagReplacer.ReplaceImgTags() to replace the source of each <img /> with cid:image1, cid:image2, etc, and get a list of the original img sources before replacment.  This is my test web page's code:
using System;
using System.Collections.Generic;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Text;

public partial class _Default : System.Web.UI.Page
{
	protected void ProcessSourceButton_Click(object sender, EventArgs e)
	{
		// Create a List<string> to hold the original image sources
		List<string> origImgSources = new List<string>();

		// Start with cid:image1 and go up 1 for each <img> tag
		int firstNumber = 1;

		// What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
		string replacement = "cid:image{0}";

		// Run the SourceTextBox contents through the <img> tag replacer, and assign
		// the results to the DestinationTextBox
		DestinationTextBox.Text = ImgTagReplacer.ReplaceImgTags(
			SourceTextBox.Text, // The HTML source to have <img> tags replaced
			replacement, // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
			firstNumber, // Start with cid:image1 and go up 1 for each <img> tag
			origImgSources // The list of strings that will contain the original sources
		);

		// Now, write the original images and what they were replaced with
		// to a temporary StringBuilder
		StringBuilder replacementMap = new StringBuilder();
		foreach (string originalSource in origImgSources)
		{
			replacementMap.AppendFormat("{0} => {1}\r\n", originalSource, String.Format(replacement, firstNumber));
			firstNumber++;
		}
		// Then write that StringBuilder to the ImgPath
		ImagePathsMapTextBox.Text = replacementMap.ToString();
	}
}

Open in new window


Note that if I wanted to replace the image sources with "Hello57World", "Hello58World" and so on, I could call ImgTagReplacer.ReplaceImgTags(sourceString, "Hello{0}World", 57, someStringList);; also, if you don't care about the original sources you can pass null as the last parameter instead of a List<string>.
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35019140
Sorry, I made a typo - line 35 in my ImgTagReplacer.cs example should be:
return String.Format("<img src='{0}' />", String.Format(ReplacementFormat, counter));
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35019149
...which is one very good reason to take away the general idea from a code example, rather than copy & paste it verbatim. ;)
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35019524
thanks alot! progress, wish i understood it all, thanks for commenting it, one day ill understand what it all means!

few questions, is it possible to put the results in a dictinoary so i can use them for anything
i was goign to attempt myself but im not familar with the replacementmap.append , confused again haha

also the output is comign out
/uploads/12.jpg" border="0" alt="" width="100" height="100 => cid:image1 /uploads/Cindy_Crawford.jpg" border="0" alt="" width="124" height="100 => cid:image2

ive attempted to make a dictionary think i might of done it if you think its ok.
i got no errors but its not quite right as you will already know

it returns
Key[/uploads/10n2.jpg" border="0" alt="" width="135" height="85, /uploads/10n2.jpg" border="0" alt="" width="135" height="85]
value[/uploads/10n2.jpg" border="0" alt="" width="135" height="85, /uploads/10n2.jpg" border="0" alt="" width="135" height="85]
cid:image1
Key[/uploads/10n2.jpg" border="0" alt="" width="135" height="85, /uploads/10n2.jpg" border="0" alt="" width="135" height="85]
cid:image1
[/uploads/12.jpg" border="0" alt="" width="100" height="100, /uploads/12.jpg" border="0" alt="" width="100" height="100]
value[/uploads/10n2.jpg" border="0" alt="" width="135" height="85, /uploads/10n2.jpg" border="0" alt="" width="135" height="85]
cid:image1
[/uploads/12.jpg" border="0" alt="" width="100" height="100, /uploads/12.jpg" border="0" alt="" width="100" height="100]
cid:image2

thanks for your help, i really appreciate it
############################# PAGE ALTERATIONS ###############################
Dictionary<string, string> origImgSources = new Dictionary<string, string>();
        StringBuilder replacementMap = new StringBuilder();
        foreach (KeyValuePair<string, string> originalSource in origImgSources)
        {
            Editor1.Text += "Key" + replacementMap.AppendFormat("{0}\r\n", originalSource, String.Format(replacement, firstNumber));
            Editor1.Text += "value" + replacementMap.AppendFormat("{1}\r\n", originalSource, String.Format(replacement, firstNumber));
            firstNumber++;
        }

################### APPCODE ALTERATIONS
public static string ReplaceImgTags(string SourceHtml, string ReplacementFormat, int FirstNumber, Dictionary<string, string> OriginalImageSources)
                OriginalImageSources.Add(m.Groups[2].Value, m.Groups[2].Value);

Open in new window

0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35019621
Expect to walk away with samples & ideas, not a finished project - see my comment above: http:#35019140.

Have a look at the documentation for String.Format(): http://msdn.microsoft.com/en-us/library/fht0f5be.aspx, and give it a try in simple console applications for testing/experimenting's sake.  You can also place a breakpoint in your implementation of the ReplaceImgTags() method and step through it a line at a time to get an idea of what's going on.

You like you're on the right track with your change to dictionary, but if you assume m.Groups[2].Value is equal to "bubba", then aren't you adding "bubba=bubba" to your OriginalImageSources dictionary?
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35019782
the string format, thats to do with the img retunring all this  border="0" alt="" width="135" height="85, yes
just so i dont get confused anymore

for a bout 5 minutes then i was looking into string.format for the key value thing then i realised stringformat is nothign do with that


so m.groups i need to find out how to split that into a key and value as at the moment im like you say, bubba bubba'ing

my head hurts!!!! lol

so i should create a windows application to test things out so i can see whats happening in my tests?
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35019890
My RegEx was off a little, try this one: "<img\s+[^>]*src\s*=\s*('|"")(.*?)('|"")[^>]*/>"
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 42

Expert Comment

by:sedgwick
ID: 35024664
on line 13 i think u meant:
m.Groups[2].Key, m.Groups[2].Value
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35025023
yeah i thought key > value  bu it says

system.text.regularexpressionsgroup has no definition for key! :S
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35025053
i lost u with the regex.
which code are u using?
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35025125
tgeberts code in post 35019040

you said"
on line 13 i think u meant:
m.Groups[2].Key, m.Groups[2].Value"

there is no key attrib, see below image for what comes up

also my collegaue who was off yesterday whos making this app with me (who also knows alot more than me when it comes to programming) has just said that the htmlagilitypack can do what we spent all day yesterday doing in less code

you heard of it?
Untitled-1.png
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35025377
it's a 3rd party tool, so it's a different game.
do u wanna have your app depend on closed-box app which cannot be debug, no idea if support is available?
there alot of disadvantages of using 3rd party utilities.
especially if it's for relatively easy html parsing.

let me walkthrough u in order to close this thread.
u said u use @tgeberts code.

@tgeberts:

can u extend your ImgTagReplacer to populate source and edited image paths?
public static string ReplaceImgTags(string SourceHtml, string ReplacementFormat, int FirstNumber, List<string> OriginalImageSources, List<string> EditedImageSources)
      {
.
//populate EditedImageSources list...
.

\
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35025394
ok here it is, change ImgTagReplacer class to this one:

public class ImgTagReplacer
    {
        private static Regex regex = new Regex(@"<img\s+src\s*=\s*('|"")(.*?)('|"")\s*/>", RegexOptions.Compiled | RegexOptions.IgnoreCase);

        public static string ReplaceImgTags(string SourceHtml, string ReplacementFormat, int FirstNumber,
            Dictionary<string, string> imageSources)
        {
            try
            {
                String.Format(ReplacementFormat, FirstNumber);
            }
            catch (ArgumentNullException)
            {
                throw new ArgumentException(
                    "ReplacementFormat should contain the string to serve as the replacement for the <img> source; it MUST contain at least \"{0}\", to indicate where the number occurs in the replacement.",
                    "ReplacementFormat");
            }
            catch (FormatException)
            {
                throw new ArgumentException(
                    "ReplacementFormat should contain the string to serve as the replacement for the <img> source; it MUST contain at least \"{0}\", to indicate where the number occurs in the replacement.",
                    "ReplacementFormat");
            }

            int counter = FirstNumber - 1;
            return regex.Replace(SourceHtml, delegate(Match m)
            {
                counter++;
                string srcImg = m.Groups[2].Value;
                string editedImg = String.Format(ReplacementFormat, counter);
                imageSources.Add(srcImg, editedImg);
                return srcImg;
            });
        }
    }

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35025401
in ProcessSourceButton_Click change the code as follow:

// Create a Dictionary<string, string> to hold the original image sources
            Dictionary<string, string> imgSources = new Dictionary<string, string>();

            // Start with cid:image1 and go up 1 for each <img> tag
            int firstNumber = 1;

            // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
            string replacement = "cid:image{0}";

            // Run the SourceTextBox contents through the <img> tag replacer, and assign
            // the results to the DestinationTextBox
            string output = ImgTagReplacer.ReplaceImgTags(
                EditorText, // The HTML source to have <img> tags replaced
                replacement, // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
                firstNumber, // Start with cid:image1 and go up 1 for each <img> tag
                imgSources // The dictionary of strings that will contain the original and edited sources
            );

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35025407
now, imgSources contains original images and edited images path.
imgSource.Keys --> original images path
imgSource.Values --> edited images path
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35025582
i can get key value now :)

just some last bugs in it...
output is comign out
"demo this is a test /uploads/12.jpg" border="0" alt="" width="100" height="100 testing some more /uploads/Cindy_Crawford.jpg" border="0" alt="" width="124" height="100 go go go !"
when it should ouput
"demo this is a test <img src="image1" border="0" alt="" width="100" height="100" /> testing some more <img src="image2" border="0" alt="" width="124" height="100" /> go go go !"

and the key is turning
/uploads/12.jpg" border="0" alt="" width="100" height="100
instead of
/uploads/12.jpg"

thanks
0
 
LVL 42

Assisted Solution

by:sedgwick
sedgwick earned 250 total points
ID: 35025666
the regex will not work if you have attributes within the img tag.
use the posted function instead.

to use it, instantiate the dictionary and pass relevant arguments:
   
// Create a Dictionary to hold the original/edited image sources
            Dictionary<string,string> imgSources = new Dictionary<string,string>();

            // Start with cid:image1 and go up 1 for each <img> tag
            int firstNumber = 1;

            // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
            string replacement = "cid:image{0}";

            // Run the SourceTextBox contents through the <img> tag replacer, and assign
            // the results to the DestinationTextBox
            DestinationTextBox.Text = ImgTagReplacer.ExtractImages(
                  SourceTextBox.Text, // The HTML source to have <img> tags replaced
                  replacement, // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
                  firstNumber, // Start with cid:image1 and go up 1 for each <img> tag
                  imgSources // The dictionary of strings that will contain the sources
            );


public static string ExtractImages(string src, string replacement, int counter, 
            ref Dictionary<string, string> imageSources)
        {
            int toIndex = 0, fromIndex = 0;

            do
            {
                if (toIndex > src.Length) break;
                fromIndex = src.IndexOf("<img", toIndex);
                toIndex = src.IndexOf("/>", fromIndex);
                string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
                string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
                imageSources.Add(tokens[1], string.Format(replacement, counter));
                src = src.Replace(tokens[1], string.Format(replacement, counter++));
            } while (true);
            return src;
        }

Open in new window

0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35025788
does this new ver work iwth other tags in the img tag? just tested the new ver, and it returned the below


Server Error in '/' Application.

Index was outside the bounds of the array.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. 

Exception Details: System.IndexOutOfRangeException: Index was outside the bounds of the array.

Source Error: 


Line 115:            string part = src.Substring(fromIndex, toIndex - fromIndex + 2);
Line 116:            string[] tokens = part.Split(new string[] { "'" }, System.StringSplitOptions.RemoveEmptyEntries);
Line 117:            imageSources.Add(tokens[1], string.Format(replacement, counter));
Line 118:            src = src.Replace(tokens[1], string.Format(replacement, counter++));
Line 119:        } while (true);

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35026127
which line throws the exception?
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35026140
this one
imageSources.Add(tokens[1], string.Format(replacement, counter));
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35026253
when the exception is thrown what is the value of "part" (line 115)?
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35026269
i just hit the keys and stook in two sample images

see below
asdasd as das as<br />
<img src="/uploads/12.jpg" border="0" alt="" width="100" height="100" /><br />
as d<br />
a sdas&nbsp;<br />
a sd &nbsp;&nbsp;<img src="/uploads/17.jpg" border="0" alt="" width="124" height="100" />

Open in new window

0
 
LVL 42

Accepted Solution

by:
sedgwick earned 250 total points
ID: 35026303
change line 116 to:
string[] tokens = part.Split(new string[] { "\"" }, System.StringSplitOptions.RemoveEmptyEntries);
0
 
LVL 1

Author Closing Comment

by:awilderbeast
ID: 35026346
thanks guys!!!

really appreciated it!
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35026503
hi again guys, was wondering if you could take a look at this for me
http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_26860863.html

its using your code btu there seems to be something wrong at the end (not with your code, mine! haha)
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35027102
hi guys, fixed that other problem, but sorry to be a pain... there seems to be another couple of bugs in it

see below

BUG 1 happened when there where lots of images in the string, i can post the content of bug 1 if you wish?
BUG 2 happens when there are no images in the string to replace

im more than happy to open another question if you wish, you deserve it for helping me this much

Cheers
#################### BUG 1 #################################
System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: startIndex at System.String.IndexOf(String value, Int32 startIndex, Int32 count, StringComparison comparisonType) at System.String.IndexOf(String value, Int32 startIndex, StringComparison comparisonType) at Functions.ExtractImages(String src, String replacement, Int32 counter, Dictionary`2& imageSources) in e:\netfolder\App_Code\functions.cs:line 114 at Documents.SendEmail(Object sender, EventArgs e) in e:\netfolder\documents\documents.aspx.cs:line 333

line 114 is :toIndex = src.IndexOf("/>", fromIndex);
line 333 is the function being run
            htmlBody = Functions.ExtractImages(
                Editor1.XHTML, // The HTML source to have <img> tags replaced
                replacement, // What to replace the src with - the {0} part represents where the number should go - so you could use image{0}.jpg to get image1.jpg, image2.jpg, image3.jpg, etc
                firstNumber, // Start with cid:image1 and go up 1 for each <img> tag
                ref imgSources  // The list of strings that will contain the original sources
            );

##################  BUG 2 ###############################
System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: startIndex at System.String.IndexOf(String value, Int32 startIndex, Int32 count, StringComparison comparisonType) at System.String.IndexOf(String value, Int32 startIndex, StringComparison comparisonType) at Functions.ExtractImages(String src, String replacement, Int32 counter, Dictionary`2& imageSources) in e:\netfolder\App_Code\functions.cs:line 114 at Documents.SendEmail(Object sender, EventArgs e) in e:\netfolder\documents\documents.aspx.cs:line 333

Open in new window

0
 
LVL 42

Expert Comment

by:sedgwick
ID: 35027152
post another question in which post the above code and error description along with html string
0
 
LVL 1

Author Comment

by:awilderbeast
ID: 35027286
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35028393
Geez, I hope you guys are in Europe or something, and not just up at 3:00AM discussing string parsing.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

In order to hide the "ugly" records selectors (triangles) in the rowheaders, here are some suggestions. Microsoft doesn't have a direct method/property to do it. You can only hide the rowheader column. First solution, the easy way The first sol…
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now