Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

HtmlAgilityPack question!

Posted on 2014-02-17
7
Medium Priority
?
467 Views
Last Modified: 2014-03-09
Hi!

I'm parsing some webpages and found HtmlAgilityPack. the Documentation is not the best i've seen so i'll ask here!

i'm trying to get the data from  html looking like this

...
<h2> Some header 1 </h2>
<p> some text...1.....  </p>
...
<h2> Some header 2 </h2>
<p> some text...2.....  </p>

the page have lots of these.
i need to get the text from every <p> that comes after the <h2> tags

trying these snippets but i guess i'm missing something
i was hoping the Nextsibling would have fixed it but it doesnt.

For Each laiskuri In testdoc.DocumentNode.Descendants("h2")
            Dim test1 = laiskuri.InnerHtml
            Dim tets3 = laiskuri.NextSibling.InnerHtml
            Dim test4 = laiskuri.NextSibling.OuterHtml
            Dim test5 = laiskuri.ParentNode
            Dim test6 = laiskuri.PreviousSibling
            Dim test7 = laiskuri.PreviousSibling.InnerHtml
            Dim test8 = laiskuri.HasChildNodes
            Dim test9 = laiskuri.FirstChild.InnerHtml
            Dim test10 = laiskuri.FirstChild.InnerText
            Dim test11 = laiskuri.FirstChild.OuterHtml
        Next
0
Comment
Question by:jamppi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 19

Expert Comment

by:Ken Butters
ID: 39865537
In your case P would not be a descendant of h2.

a Descendant (or child) is embedded ... and would look like this:

<h2>
      <p> some text</p>
</h2>

you need following-sibling...

if you think of it as an sort of outline... your <h2> and your <p> are at the same outline level... so they would be considered "siblings"... you want the sibling of <p> that follows <h2>

Which you would define like this: following-sibling::p

here is an example:

http://stackoverflow.com/questions/14929921/htmlagilitypack-xml-capturing-following-sibling
where "p" is the "sibling that follows h2.

Note: the example is xpath... but xpath can be used in selectNodes which is part of the HTML Agility pack.

so something like this in your example:

testdoc.DocumentNode.selectNodes("./h2/following-sibling::p")
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39866394
Can you provide an example of the exact HTML structure you are working with? I tried your code in a new project, and it seems to work fine for me.
0
 

Author Comment

by:jamppi
ID: 39867261
Ken:

tried  testdoc.DocumentNode.selectNodes("./h2/following-sibling::p") but i get
{"Object reference not set to an instance of an object."}

testdoc is not 'nothing'. so that is not the reason to the error.
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 

Author Comment

by:jamppi
ID: 39867283
kaufmed:  

<div class="cl"></div>
 
<h2> Some header 1 </h2>
<p> some text...1.....  </p>

<a name="xxxx"></a>

<div class="cl"></div>

<h2> Some header 2 </h2>
<p> some text...2.....  </p>

<a name="xxxx"></a>
0
 
LVL 19

Expert Comment

by:Ken Butters
ID: 39867455
Couple of things...

First I'm assuming that this is just a snippet of your html doc right?   so at some point you have everything enclosed in a single tag... something like this:
<html>
  <div class="cl"></div>
  <h2> Some header 1 </h2>
  <p> some text...1.....  </p>
  <a name="xxxx"></a>
  <div class="cl"></div>
  <h2> Some header 2 </h2>
  <p> some text...2.....  </p>
  <a name="xxxx"></a>
</html>

Open in new window


second...I'm assuming that you have defined "testdoc" and done a "load" on it of your html page?  (if you stop in trace does testDoc contain the html of the page?)

Try making this slight change:

Instead of : testdoc.DocumentNode.selectNodes("./h2/following-sibling::p")
Try using : testdoc.DocumentNode.selectNodes("//h2/following-sibling::p")

(changing the leading dot-slash to slash-slash)
0
 

Author Comment

by:jamppi
ID: 39870017
That workt just fine!  Can i get the <h2> innerhtml at the same time ?
0
 
LVL 19

Accepted Solution

by:
Ken Butters earned 1200 total points
ID: 39870320
When you say "at the same time" do you mean you want them returned in the same array?

You can do that.... but seems to be like that would be cumbersome, because as you loop through the result, you have to check to see if you are working with an H2 item or a P item.

However... if that is what you are looking for...  you'd use the "|" operator.

testdoc.DocumentNode.selectNodes("//h2|//h2/following-sibling::p")
0

Featured Post

Cloud Training Guides

FREE GUIDES: In-depth and hand-crafted Linux, AWS, OpenStack, DevOps, Azure, and Cloud training guides created by Linux Academy instructors and the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question