• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 51
  • Last Modified:

C#: HtmlAgilityPack Get the ID of selected node

I'm using HtmlAgilityPack to traverse some html  
Given the code I've created a dotnetfiddle

How do I get the number from the current div selected ID?
I'm extracting text from Red and navy fonts is this the best way to do it? and how Do I know what the exception is

I've not got this far yet but given an image url and a file path what is the best way to save an image
  • 3
2 Solutions
Ioannis ParaskevopoulosCommented:

Check my forked fiddle.

What i did was instead of using directly DescendantsAndSelf, i first got a SelectSingleNode and then enumerate upon its DescendantsAndSelf.
That allowed me to get the id of the divs. Then it was just a way of manipulating the value.

In a very simple example i just used a substring, but in a more real environment i think you should do more checks cause a simple substring could raise exceptions. For instance, what happens if the id is not of this exact pattern?

Well i guess this is covered anyway cause you only select the divs that have an id that match the pattern, so you should be ok with the substring as well.

trevor1940Author Commented:
Thank you

Can you tell me what the differance is between

         var DivHTML = htmlDoc.DocumentNode.DescendantsAndSelf("//div[starts-with(@id,'post_message_')]");
            foreach (var divNodes in DivHTML)

Open in new window

and Your
         var DivHTML = htmlDoc.DocumentNode.SelectSingleNode("//div[starts-with(@id,'post_message_')]");
            foreach (var divNodes in DivHTML.DescendantsAndSelf())

Open in new window

I thought SelectSingleNode would be used for a tag like "body"
trevor1940Author Commented:
I was trying  to  get the number from the div id using regular exspresion
                        Regex RegEXID = new Regex(@"post_message_(\d+)");
                        Match matchID = RegEXID.Match(div.Attributes["id"].Value);
                        if (matchID.Success)
                            var myID = matchID.Value;
                            Console.WriteLine("MyID = " + myID);

Open in new window

I thought (\d+) matches just the number but I'm getting the whole ID
käµfm³d 👽Commented:
I thought (\d+) matches just the number but I'm getting the whole ID
It does, but you're confusing a match with a capture group. You put the numeric portion in a capture group, but in your C# you're not referring to said capture group. Instead:

var myID = matchID.Groups[1].Value;

Open in new window

trevor1940Author Commented:
Thanx for your help

I'd like to know the difference between the two selection methods for future reference

I didn't know about match groups
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now