• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 51
  • Last Modified:

C#: HtmlAgilityPack Get the ID of selected node

I'm using HtmlAgilityPack to traverse some html  
Given the code I've created a dotnetfiddle

How do I get the number from the current div selected ID?
I'm extracting text from Red and navy fonts is this the best way to do it? and how Do I know what the exception is

I've not got this far yet but given an image url and a file path what is the best way to save an image
0
trevor1940
Asked:
trevor1940
  • 3
2 Solutions
 
Ioannis ParaskevopoulosCommented:
Hi,

Check my forked fiddle.

What i did was instead of using directly DescendantsAndSelf, i first got a SelectSingleNode and then enumerate upon its DescendantsAndSelf.
That allowed me to get the id of the divs. Then it was just a way of manipulating the value.

In a very simple example i just used a substring, but in a more real environment i think you should do more checks cause a simple substring could raise exceptions. For instance, what happens if the id is not of this exact pattern?

Well i guess this is covered anyway cause you only select the divs that have an id that match the pattern, so you should be ok with the substring as well.

Giannis
0
 
trevor1940Author Commented:
Thank you

Can you tell me what the differance is between

         var DivHTML = htmlDoc.DocumentNode.DescendantsAndSelf("//div[starts-with(@id,'post_message_')]");
            foreach (var divNodes in DivHTML)

Open in new window


and Your
         var DivHTML = htmlDoc.DocumentNode.SelectSingleNode("//div[starts-with(@id,'post_message_')]");
            foreach (var divNodes in DivHTML.DescendantsAndSelf())

Open in new window


I thought SelectSingleNode would be used for a tag like "body"
0
 
trevor1940Author Commented:
I was trying  to  get the number from the div id using regular exspresion
                        Regex RegEXID = new Regex(@"post_message_(\d+)");
                        Match matchID = RegEXID.Match(div.Attributes["id"].Value);
                        if (matchID.Success)
                        {
                            var myID = matchID.Value;
                            Console.WriteLine("MyID = " + myID);
                        }
                      

Open in new window

I thought (\d+) matches just the number but I'm getting the whole ID
0
 
käµfm³d 👽Commented:
I thought (\d+) matches just the number but I'm getting the whole ID
It does, but you're confusing a match with a capture group. You put the numeric portion in a capture group, but in your C# you're not referring to said capture group. Instead:

var myID = matchID.Groups[1].Value;

Open in new window

0
 
trevor1940Author Commented:
Thanx for your help

I'd like to know the difference between the two selection methods for future reference

I didn't know about match groups
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now