Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

How to extract specific headings or links from a webpage using Excel/VBA

Posted on 2015-01-11
14
Medium Priority
?
454 Views
Last Modified: 2015-01-29
How can I extract specific data from websites into an Excel spreadsheet?  

For example, consider the website http://www.asx.com.au/asx/research/company.do#!/BHP

I would like to:

1) Copy to a cell in a spreadsheet (or otherwise extract) the link under the heading "Contact Details" on that webpage (i.e.  http://www.bhpbilliton.com/)

2) Copy to a cell in a spreadsheet (or otherwise extract) the link for the file titled "Annual Report 2014" (displayed under the heading "BHP announcements" after "Financial & Periodic Reports" is selected from the dropdown menu) and then save the linked file to a specified location on my computer.

What is the easiest way to do this using VBA?
0
Comment
Question by:Let's Go
  • 6
  • 3
  • 2
  • +1
14 Comments
 
LVL 34

Expert Comment

by:Rob Henson
ID: 40544204
If the data on the web page is stored as a table within the webpage, you can create a web query in Excel to extract the table of data.

I just tried but unfortunately my security settings or version of browser won't let me see the web-page within the Web Query window in the same way as I can just see the page in my ordinary browser.

To create the web query, open a new sheet and on the Data tab select the "From Web" button on the left hand end of the ribbon. This opens up a browser like window within Excel. Enter the web address at the top of the window and the web-page will be displayed. Anything that is a data table will have a yellow arrow next to it, click the arrow and it will go Green.

For the table of Annual Report links, hopefully the URL for the PDF file will come down with the data table so clicking on the link will then open the PDF.

Thanks,
Rob H
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40547899
Thanks for your response.

Unfortunately the data I need does not have any yellow arrows identifying data tables next to the information I need (although the advertisements do!)

Any other ideas?
0
 
LVL 34

Expert Comment

by:Rob Henson
ID: 40548496
Sorry, don't have sufficient knowledge in other methods to advise.

Thanks
Rob
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40557263
You should be able to import it in its entirety, and use formulas to locate that information.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40559123
Do you mean opening the webpage from File/Open?  Unfortunately, that does not help with my question.
0
 
LVL 24

Expert Comment

by:Phillip Burton
ID: 40559370
No, I mean opening it from Data - Get External Data - New Web Query.
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40560246
Hi,
attached an Excel file. The first part is already working but I have to adjust the second one.
In general I use the XMLHttp object and directly request the Ajax data which are requested by the web page itself. Then I use a nifty library to parse the JSON string (https://code.google.com/p/vba-json/).

I will work on the second part but this might take another day.

HTH
Rainer
json.xls
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40561138
Thanks for your help, it looks like your work is going to be really helpful.
0
 
LVL 44

Accepted Solution

by:
Rainer Jeschor earned 2000 total points
ID: 40562046
Hi,
and here the working final version.

HTH
Rainer
json.xls
0
 
LVL 6

Author Closing Comment

by:Let's Go
ID: 40563332
That's fantastic.  Thanks for such a quick response.
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40576795
Hi, Rainer

I have another question.  How do you find which Ajax data are requested by the web page itself?

In other words, cell D5 of your spreadsheet lists the Request URL of http://data.asx.com.au/data/1/company/BHP/announcements?callback=angular.callbacks._8&count=10&market_sensitive=false&type=03.  How do you derive that URL from the webpage mentioned in my original post?
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 40576814
Hi,
that looks like a new question with new points :-)
Just kidding.
You can use any browser which has an extension to track the web page requests. Normally I prefer chrome but IE 11 has also decent developer tool bar.
In Chrome open the built-in developer extensions, and go to the tab network. Then simply reload/open the page and you can see all requests (images, script files and ajax calls)
HTH
Rainer
0
 
LVL 6

Author Comment

by:Let's Go
ID: 40577186
Thanks.

Here are some extra points....................... ;-)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

With User Account Control (UAC) enabled in Windows 7, one needs to open an elevated Command Prompt in order to run scripts under administrative privileges. Although the elevated Command Prompt accomplishes the task, the question How to run as script…
Windows Explorer lets you open cabinet (cab) files like any other folder. In VBA you can easily handle normal files and folders, but opening and indeed creating cabinet files takes a lot more - and that's you'll find here.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question