Solved

How to extract specific headings or links from a webpage using Excel/VBA

Posted on 2015-01-11
14
311 Views
Last Modified: 2015-01-29
How can I extract specific data from websites into an Excel spreadsheet?  

For example, consider the website http://www.asx.com.au/asx/research/company.do#!/BHP

I would like to:

1) Copy to a cell in a spreadsheet (or otherwise extract) the link under the heading "Contact Details" on that webpage (i.e.  http://www.bhpbilliton.com/)

2) Copy to a cell in a spreadsheet (or otherwise extract) the link for the file titled "Annual Report 2014" (displayed under the heading "BHP announcements" after "Financial & Periodic Reports" is selected from the dropdown menu) and then save the linked file to a specified location on my computer.

What is the easiest way to do this using VBA?
0
Comment
Question by:Let's Go
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 2
  • +1
14 Comments
 
LVL 33

Expert Comment

by:Rob Henson
ID: 40544204
If the data on the web page is stored as a table within the webpage, you can create a web query in Excel to extract the table of data.

I just tried but unfortunately my security settings or version of browser won't let me see the web-page within the Web Query window in the same way as I can just see the page in my ordinary browser.

To create the web query, open a new sheet and on the Data tab select the "From Web" button on the left hand end of the ribbon. This opens up a browser like window within Excel. Enter the web address at the top of the window and the web-page will be displayed. Anything that is a data table will have a yellow arrow next to it, click the arrow and it will go Green.

For the table of Annual Report links, hopefully the URL for the PDF file will come down with the data table so clicking on the link will then open the PDF.

Thanks,
Rob H
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40547899
Thanks for your response.

Unfortunately the data I need does not have any yellow arrows identifying data tables next to the information I need (although the advertisements do!)

Any other ideas?
0
 
LVL 33

Expert Comment

by:Rob Henson
ID: 40548496
Sorry, don't have sufficient knowledge in other methods to advise.

Thanks
Rob
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40557263
You should be able to import it in its entirety, and use formulas to locate that information.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40559123
Do you mean opening the webpage from File/Open?  Unfortunately, that does not help with my question.
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40559370
No, I mean opening it from Data - Get External Data - New Web Query.
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40560246
Hi,
attached an Excel file. The first part is already working but I have to adjust the second one.
In general I use the XMLHttp object and directly request the Ajax data which are requested by the web page itself. Then I use a nifty library to parse the JSON string (https://code.google.com/p/vba-json/).

I will work on the second part but this might take another day.

HTH
Rainer
json.xls
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40561138
Thanks for your help, it looks like your work is going to be really helpful.
0
 
LVL 44

Accepted Solution

by:
Rainer Jeschor earned 500 total points
ID: 40562046
Hi,
and here the working final version.

HTH
Rainer
json.xls
0
 
LVL 6

Author Closing Comment

by:Let's Go
ID: 40563332
That's fantastic.  Thanks for such a quick response.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40576795
Hi, Rainer

I have another question.  How do you find which Ajax data are requested by the web page itself?

In other words, cell D5 of your spreadsheet lists the Request URL of http://data.asx.com.au/data/1/company/BHP/announcements?callback=angular.callbacks._8&count=10&market_sensitive=false&type=03.  How do you derive that URL from the webpage mentioned in my original post?
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40576814
Hi,
that looks like a new question with new points :-)
Just kidding.
You can use any browser which has an extension to track the web page requests. Normally I prefer chrome but IE 11 has also decent developer tool bar.
In Chrome open the built-in developer extensions, and go to the tab network. Then simply reload/open the page and you can see all requests (images, script files and ajax calls)
HTH
Rainer
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40577186
Thanks.

Here are some extra points....................... ;-)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
Access developers frequently have requirements to interact with Excel (import from or output to) in their applications.  You might be able to accomplish this with the TransferSpreadsheet and OutputTo methods, but in this series of articles I will di…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

761 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question