Link to home
Start Free TrialLog in
Avatar of Marc Salant
Marc Salant

asked on

VBA access to DOM and elements on web page

I am trying to access a webpage using VBA. When I get the results from the webpage, using winhttprequest or msxml2.xmlhttp60, I am getting the page source, but this has yet to process the javascript which will then give me the elements that I need. I need to access the results that can be seen on the DOM when I inspect elements on either Chrome or IE. I can't figure out how to get to this processed page results. I have tried to use the IE controls in VBA and passing the request through IE, but to no avail
Avatar of Mark Edwards
Mark Edwards
Flag of United States of America image

Sounds like you are wanting to analyze a rendered webpage.  What technology are you using to run your VBA?  MS Office has a web browser control that you can do just about anything with, including analyzing rendered web pages.
I do it all the time with a web browser control in MS Access as an application.
Avatar of Marc Salant
Marc Salant

ASKER

yes, that is correct. I want to read/parse the rendered webpage.

I am hoping to do this using VBA in excel, but am stuck. I have used in the past the winhttprequest libraries, but this is bringing me back the raw server response... tried to use an IE object, and XMLhttp, but no luck.

Thanks
While I've seen others use the web browser control to great effect in Excel, I've only used it in Access.  Hopefully someone who is familiar with using it in Excel will join in.

In the meantime, I'll render as much assistance as I can.
Start with this:

Public WithEvents wb As WebBrowser

Open in new window


where WebBrowser is the reference to the web browser library
Some references you will want to add from your IDE's Tools/References... menu item are:
Microsoft Internet Controls
Microsoft HTML Object Library
Microsoft Scripting Runtime

After you have added the wb object declaration to a VBA code module, just type "wb" in a procedure and then type a dot (.) after it and watch the intellisense bring up all the properties and methods you can use.
Of particular importance is the "Document" object.

I'm sure you can take it from there...
Also, what's really cool about the web browser control's BeforeNavigate2() method:
Private Sub wb_BeforeNavigate2( ByVal pDisp As Object, ByRef URL As Variant, ByRef Flags As Variant, ByRef TargetFrameName As Variant,     ByRef PostData As Variant, ByRef Headers As Variant, ByRef Cancel As Boolean)

Open in new window

is that you can use it to intercept all outgoing URL transmissions to see what's going out, and restrict/block stuff you don't want going out (like cookie info, advertising site URLs, or your browsing history to Google, etc.)!
p.s.  The web browser control in .Net does NOT have the Navigate2() method, so you can't do the cool stuff you can do with the Office web browser in .Net!

MS Office/Access/VBA/Webbrowser.... YOU ROCK!
I am looking into it... thanks for the direction.
sadly, there is a known issue with the web browser object in excel 2013. I need to edit the registery... ha.

once I call the webpage, how do I ask for the rendered source code and not the page source?

Thanks
btw, I can probably work in access, really doesn't matter since I am just using the vba shell, not really the front end.
Take a look at this and see if it doesn't point you in the right direction:

https://docs.microsoft.com/en-us/dotnet/framework/winforms/controls/webbrowser-control-overview
In the WebBroswser control you always access the rendered page by the Document property, which contains the rendered DOM results. You might have to "click" buttons or whatever is required to trigger the required JavaScript code to get access to your desired page.
I continued to look for a solution over the weekend. Does not seem like it is very easily accessible to get the rendered page. Definitely not easily possible with the webbrowser objects. Several posts about using Selenium, but I looked for that and didn't have great success...

I want to access that rendered DOM page. My selection for the javascript control are all set via the url, so what I am trying to do is possible, it's just the page source returned is not the final rendered page I am looking at on the screen.

https://shop.ford.com/showroom/?gnav=header-shop&linktype=inventory#/

Trying to look at cars at the dealerships.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.