Solved

How to extract specific headings or links from a webpage using Excel/VBA

Posted on 2015-01-11
14
273 Views
Last Modified: 2015-01-29
How can I extract specific data from websites into an Excel spreadsheet?  

For example, consider the website http://www.asx.com.au/asx/research/company.do#!/BHP

I would like to:

1) Copy to a cell in a spreadsheet (or otherwise extract) the link under the heading "Contact Details" on that webpage (i.e.  http://www.bhpbilliton.com/)

2) Copy to a cell in a spreadsheet (or otherwise extract) the link for the file titled "Annual Report 2014" (displayed under the heading "BHP announcements" after "Financial & Periodic Reports" is selected from the dropdown menu) and then save the linked file to a specified location on my computer.

What is the easiest way to do this using VBA?
0
Comment
Question by:Let's Go
  • 6
  • 3
  • 2
  • +1
14 Comments
 
LVL 32

Expert Comment

by:Rob Henson
ID: 40544204
If the data on the web page is stored as a table within the webpage, you can create a web query in Excel to extract the table of data.

I just tried but unfortunately my security settings or version of browser won't let me see the web-page within the Web Query window in the same way as I can just see the page in my ordinary browser.

To create the web query, open a new sheet and on the Data tab select the "From Web" button on the left hand end of the ribbon. This opens up a browser like window within Excel. Enter the web address at the top of the window and the web-page will be displayed. Anything that is a data table will have a yellow arrow next to it, click the arrow and it will go Green.

For the table of Annual Report links, hopefully the URL for the PDF file will come down with the data table so clicking on the link will then open the PDF.

Thanks,
Rob H
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40547899
Thanks for your response.

Unfortunately the data I need does not have any yellow arrows identifying data tables next to the information I need (although the advertisements do!)

Any other ideas?
0
 
LVL 32

Expert Comment

by:Rob Henson
ID: 40548496
Sorry, don't have sufficient knowledge in other methods to advise.

Thanks
Rob
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40557263
You should be able to import it in its entirety, and use formulas to locate that information.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40559123
Do you mean opening the webpage from File/Open?  Unfortunately, that does not help with my question.
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40559370
No, I mean opening it from Data - Get External Data - New Web Query.
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40560246
Hi,
attached an Excel file. The first part is already working but I have to adjust the second one.
In general I use the XMLHttp object and directly request the Ajax data which are requested by the web page itself. Then I use a nifty library to parse the JSON string (https://code.google.com/p/vba-json/).

I will work on the second part but this might take another day.

HTH
Rainer
json.xls
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40561138
Thanks for your help, it looks like your work is going to be really helpful.
0
 
LVL 44

Accepted Solution

by:
Rainer Jeschor earned 500 total points
ID: 40562046
Hi,
and here the working final version.

HTH
Rainer
json.xls
0
 
LVL 6

Author Closing Comment

by:Let's Go
ID: 40563332
That's fantastic.  Thanks for such a quick response.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40576795
Hi, Rainer

I have another question.  How do you find which Ajax data are requested by the web page itself?

In other words, cell D5 of your spreadsheet lists the Request URL of http://data.asx.com.au/data/1/company/BHP/announcements?callback=angular.callbacks._8&count=10&market_sensitive=false&type=03.  How do you derive that URL from the webpage mentioned in my original post?
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40576814
Hi,
that looks like a new question with new points :-)
Just kidding.
You can use any browser which has an extension to track the web page requests. Normally I prefer chrome but IE 11 has also decent developer tool bar.
In Chrome open the built-in developer extensions, and go to the tab network. Then simply reload/open the page and you can see all requests (images, script files and ajax calls)
HTH
Rainer
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40577186
Thanks.

Here are some extra points....................... ;-)
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Preface This is the third article about the EE Collaborative Login Project. A Better Website Login System (http://www.experts-exchange.com/A_2902.html) introduces the Login System and shows how to implement a login page. The EE Collaborative Logiā€¦
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question