Parsing HTML to get all link addresses on a page.
Posted on 1998-12-28
I am looking for a way to parse through HTML code and extract all the links in that code. I am trying to make a Perl script that will retrieve all the files in a directoy on a remote WWW server. I was told I need to use a socket, retrieve the Directory as a web browser would (In HTML) and then parse it to get the links.
I also have a question as to will this work for both HTTP and FTP servers?
If you have any suggestions as to how to go about getting a directory listing of a remote server please let me know. It would be much appreciated