Avatar of mmalik15
mmalik15
 asked on

how to extract all the hyperlinks on this webpage

on this web page http://www.scie-socialcareonline.org.uk/topics.asp?guid=64f07a36-85f2-4aac-a862-61b9116190ad if we click on expand all in the list of browse topics. How can we extract all the hyperlinks of the with titles like adoption, access to birth records etc
ASP.NETC#.NET Programming

Avatar of undefined
Last Comment
mmalik15

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
kaufmed

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
mmalik15

ASKER
Many thanks again kaufmed..

how can i exclude rss link in the xpath? Apart from that its working fine.

Also could you kindly tell me any xpath tool to extract the information from html DOM or what's the best approach to write xpath for html dom?
kaufmed

Oh, sorry. I meant to exclude that as well:

doc.DocumentNode.SelectNodes("//span[@class='branch']//a[not(starts-with(@href, 'javascript:')) and not(starts-with(@href, 'rss/'))]")

Open in new window

mmalik15

ASKER
Brilliant kaufmed. Its working perfectly.

I use Altova to test any xpath on xml documents but wonder if  there is a similar tool to test Html DOM.
Your help has saved me hundreds of hours of internet surfing.
fblack61
kaufmed

I don't know of any. HTML is becoming more in line with XML with new standards that are released. Most of the frameworks people use today to build HTML do so such that the HTML is well-formed (similar to XML). As such, you should be able to use Altova on any well-formed HTML since HTML is (technically) a subset of XML (even though HTML was around first). Unless you are dealing with someone who hand-code their web page, you should be OK using Altova.
kaufmed

P.S.

One of the reasons HTML Agility Pack is so popular is that the team sought to make a library that could handle (as best as one can) mal-formed HTML. HAP takes some liberties in making the source HTML well-formed so that you can use XPath against the loaded document.
mmalik15

ASKER
Thanks kaufmed... Its worth having EE membership because of the presence of people like you!
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.