Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 831
  • Last Modified:

extract links from html page


I'd like to extract all the links, e.i : <a href='/XXXXXXXXX'>More Information...</a>, in the html page and import them into a table so I can later get into each link and pull out more data on that page.

1 Solution
ramromconsultant Commented:
One way to do this is using Python (www.python.org) This is free, and there is a urllib2 module for getting a web page, and an also free 3rd party module BeautifulSoup for locating and extracting tags. Then you'd write the results to a tab delimited file for import into Access.

If you want to consider this approach , let me know.

You would need a browser, such as M$'s WebBrowser ActiveX control. You can then navigate to the page. See examples at {http:/Q_21597349.html} and {http:/Q_21824078.html}.

Once the page is open, the browser's document object will have:

    .links.length   -- number of links on the page
    .links(0).href   -- href of first link
    .links(0).innerText   -- displayed name of the link
    .links[0].text   -- javascript version of the above.

Incidentally, you can get the list of  links from any page (including this one) by pasing the following into your address bar:

javascript:function f() {d=document; t='<html><body><h2>'+d.title+'</h2> <h3>'+d.location.href+'</h3> <hr><ol>'; for(i=0;i<d.links.length;i++) {l=d.links[i]; t+='<li><b>'+l.text+'</b><br>'+l.href} d.write(t); document.close()}; f();

If you like it, make a bookmark out of it...

Hope this helps,
ishadowmeAuthor Commented:
thank you

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now