Link to home
Create AccountLog in
Avatar of mmalik15
mmalik15

asked on

extracting hyperlinks using htmlagilitypack

how to get all the title hyperlinks on this web page (http://www.scie-socialcareonline.org.uk/topic.asp?guid=3aca5bbd-bc85-11d4-ba18-009027f63525) present inside the paging table e.g. The first hyperlink for the title is "http://www.scie-socialcareonline.org.uk/profile.asp?guid=81199db9-4835-4df3-be46-603e44fc20b9"

using htmlagilitypack
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of mmalik15
mmalik15

ASKER

many thanks for the comment kaufmed.

The only issue i m facing now is to get the total number of pages. I m trying to use this regex
(?s)(?i)Page 1.*([0-9]{3}).*

Open in new window

 and its returning me 103 which is correct.

Can I ask how can we tweak this regex to pick 3 or more digit numbers as well?
Wht you have is close. I would ditch the dot-stars:

Page\s+1\s+of\s+([0-9]{3})

Open in new window

thanks for the comment again but what I m asking is if in future we have values like below

Page 1 of 1037 or

Page 1 of 19

then what shall be our regex as this Page\s+1\s+of\s+([0-9]{3}) will always return three digits
Ah, sorry. Use:

Page\s+1\s+of\s+([0-9]+)

Open in new window

thanks