Solved

HTML source access

Posted on 2002-07-23
18
211 Views
Last Modified: 2010-05-02
Hi,

I am trying to programatically access the HTML source code for a given URL (equivalent of performing a 'view source' operation on any browser). I have tried using the 'Microsoft Internet Explorer Library' :

browser.Navigate2("[some url]")
source=browser.document.documentElement.outerHTML

however without success, as the information obtained does not come out correctly on a consistent basis (i.e. it does not work incase of certain url's). All help on how to consistently obtain the source for a given url would be appreciated.

Aseem
0
Comment
Question by:aseem_dayal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 3
  • +4
18 Comments
 
LVL 2

Expert Comment

by:priya_pbk
ID: 7171459
Give a reference to Microsoft Internet controls by going to Tools->Components-> and click Microsoft Internet controls

I tried this way(2 command buttons on the form)
Private Sub Command1_Click()
WebBrowser1.Navigate "http://www.experts-exchange.com"
End Sub

Private Sub Command2_Click()
MsgBox WebBrowser1.Document.documentElement.outerHtml
End Sub

-priya
0
 

Author Comment

by:aseem_dayal
ID: 7171498
Hi Priya,

Thanks for the input, but as I have already mentioned in my question, the 'browser.Document.documentElement.outerHTML' does not consistently give the correct URL source.
0
 
LVL 2

Expert Comment

by:priya_pbk
ID: 7171509
but weren't you referencing it to 'Microsoft Internet Explorer Library' ..is that the same as the component reference to "Microsoft Internet controls"(which i had mentioned), whereby you have to put the web browser manually on your form.

and
>>does not consistently give the correct URL source.
why? what happens, i have used it lot many times. what does it show you?

-priya
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7171927
that's couldn't be:
MsgBox WebBrowser1.Document.documentElement.innerHtml
instead?
i used that without problem.
You have to use it in documentcomplete event of webbrowser.
0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 100 total points
ID: 7171969
Try this:
Make a reference to MSXML (v2.6, v3 or v4)

Private Function GetHTML() As String
Dim httpObj As MSXML2.XMLHTTP

Set httpObj = New MSXML2.XMLHTTP
With httpObj
  .open "GET", "http://www.msn.com", False
  .send
  GetHTML = .responseText
End With
Set httpObj = Nothing

End Function

Note:  If you are using a prior version to v2.6, than change the code as follows:
Dim httpObj As MSXML.XMLHTTPRequest

Set httpObj = New MSXML.XMLHTTPRequest

Anthony
0
 
LVL 28

Expert Comment

by:AzraSound
ID: 7172090
I'm with Richie, ensure the page is FULLY loaded before attempting to get its source.  If it hasnt loaded yet, there is a chance its source hasnt been completely downloaded yet.
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172094
well, if you will not use webbrowser control, you could use inet control instead

function GrabHtml(url as string) as string
dim s as string
s= inet1.openurl(url,icString)
GrabHtml=s
end sub
0
 
LVL 4

Expert Comment

by:RichW
ID: 7172449
Here's how you can make sure the source is fully loaded.

WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
   DoEvents
Loop
strText = WebBrowser1.Document.body.innertext
strHTML = WebBrowser1.Document.body.innerhtml

RichW
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172472
I have had the same problem on trying to get at a logged in internet bankking page that displays my account info.

I think that maybe this is similiar.
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172474
Sorry, I was trying this way

strHTML = WebBrowser1.Document.body.outerhtml
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172527
To state my comment more clear:
' wb1 is a WebBrowser control

Private Sub Form_Load()
WB1.Navigate "wwww.somedomain.com/some/index.html"

End Sub

Private Sub WB1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
If (pDisp Is WB1.Object) Then
     debug.print wb1.document.documentelement.innerhtml
   
End If
End Sub
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172599
I just got it like this

after the page has opened I needed to get at the frames that the document was filled with


Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To jlobj.frames.length - 1
 Debug.Print jlobj.frames(a).Document.body.outerhtml
Next a

0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172616
Watch the object names - should have been

Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To parentObj.frames.length - 1
    Debug.Print parentObj.frames(a).Document.body.outerhtml
Next a
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172664
but we  weren't talking about frames, or i missed something?
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172892
Frames are about the only reason that I can think of that would result in inconsistent operation.
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172960
Sorry, not to me.
If page has frames, docummentelement.innerhtml would shows HTML contents of main document (those "frameset" bunch of things) only.
0
 

Author Comment

by:aseem_dayal
ID: 7173476
Priya :

1. Yes the 'Microsoft Internet Explorer Library' works  same as the Web-Browser control.

2. The inconsistency that I encountered was when trying to obtain source HTML from pages generated from an exchange OWA server, in certain instances, incase you have access to OWA : the page generated in response to a mail reply does not produce the correct HTML.

Richie Simonetti/AzraSound/RichW :

I have ensured that I access the HTML source only after the 'navigation completed' event occurs.

acperkins :

Will try your suggestion and get back.


Aseem

0
 

Author Comment

by:aseem_dayal
ID: 7173634
acperkins solution works like a charm !

Not only does provide the information faster than any other methods, it works consistently across all URLS.

To everyone involved in this discussion, I would recommend that they use 'MSXML.XMLHTTP' as a defacto standard for obtaining source URL's.

Thanks for the contributions.

Aseem

0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question