mmalik15
asked on
how to write a crawler to extract all links from a website
I want to write an application which can accept a start url of a website and return me all the links present on that website and throw them in a text file.
Any ideas ?
thanks
Any ideas ?
thanks
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Many thanks guys.
NP. Glad to help = )
One thing to note is that my code is not a full solution. For example, you may end up crawling the whole web because my code just searches for all links on a page--it doesn't confirm whether or not the links point to the current site or not. You should be able to account for this easily; just know that that "flaw" is there.
One thing to note is that my code is not a full solution. For example, you may end up crawling the whole web because my code just searches for all links on a page--it doesn't confirm whether or not the links point to the current site or not. You should be able to account for this easily; just know that that "flaw" is there.
http://www.dotnetperls.com/scraping-html