[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2010
  • Last Modified:

Advanced C# string parse to pull specific string and substrings out of a larger string.

I have a C# string that represents HTML code.  I need to parse the string and find each instance of the img element.  Once the img element is taken out, I need to then extract the src attribute from the img element.  

Example if I have the C# string, I need to find <img src="http://www.someurl.com/image1.gif" /> and assign it to a string.  Then I need to parse the new string for the src http://www.someurl.com/image1.gif.  I am somewhat familar with C# string functions, but I am not sure how to begin working on this.

If you need more clarification please let me know.  Thank you for your help.
0
shanemay
Asked:
shanemay
1 Solution
 
gdupadhyayCommented:
You can do it by regural expression easily.
Add

using System.IO;
using System.Text;
using System.Text.RegularExpressions;

Now in function you have to write following:
// File Path.
string filename = "C:\\test\\Test.txt";

string pattern = @"<img .* />";

FileStream file = new FileStream(filename, FileMode.Open, FileAccess.Read);
StreamReader sr = new StreamReader(file);
string strFileText;
strFileText = sr.ReadToEnd();
Regex re = new Regex(pattern1);
MatchCollection mc = re.Matches(strFileText);


int mIdx = 0;
string strTemp1;
string strTemp2;
foreach (Match m in mc)
{
string strTemp = m.ToString();
strTemp1 = strTemp.Replace("<img src=", "");
strTemp2 = strTemp1.Replace(" />", "");
}

The final string strTemp2 is "http://www.someurl.com/image1.gif".

I have tested this code and working fine.

Please let me know, if you have any question.

Good Luck

 
0
 
shanemayAuthor Commented:
Thank you so much for the quick response.  I really appreciate the crystal clear code example.  This is exactly what I needed,  I did not think I would have this working today.  Again, thank you so much.  
0
 
mkosbieCommented:
RegEx's are definitely the way to go, but the code provided is pretty cumbersome.  It won't match any img tag with more than a src attribute (eg <img src="img.jpg" id="img1">, and it does a lot of extra processing. You can extract everything you need in one pass with a function like this (this returns the sources in an arraylist):
    private ArrayList getImageSources(String HTML)
    {
        Regex re = new Regex("<img\\s[^>]*?src=[\"']([^\"']+)[\"'][^>]*>", RegexOptions.IgnoreCase);
        MatchCollection matches = re.Matches(HTML);
 
        ArrayList sources = new ArrayList();
        foreach (Match m in matches) {
            sources.Add(m.Groups[1]);
        }
 
        return sources;
    }

Open in new window

0

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

Tackle projects and never again get stuck behind a technical roadblock.
Join Now