• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 227
  • Last Modified:

HTML source access

Hi,

I am trying to programatically access the HTML source code for a given URL (equivalent of performing a 'view source' operation on any browser). I have tried using the 'Microsoft Internet Explorer Library' :

browser.Navigate2("[some url]")
source=browser.document.documentElement.outerHTML

however without success, as the information obtained does not come out correctly on a consistent basis (i.e. it does not work incase of certain url's). All help on how to consistently obtain the source for a given url would be appreciated.

Aseem
0
aseem_dayal
Asked:
aseem_dayal
  • 5
  • 5
  • 3
  • +4
1 Solution
 
priya_pbkCommented:
Give a reference to Microsoft Internet controls by going to Tools->Components-> and click Microsoft Internet controls

I tried this way(2 command buttons on the form)
Private Sub Command1_Click()
WebBrowser1.Navigate "http://www.experts-exchange.com"
End Sub

Private Sub Command2_Click()
MsgBox WebBrowser1.Document.documentElement.outerHtml
End Sub

-priya
0
 
aseem_dayalAuthor Commented:
Hi Priya,

Thanks for the input, but as I have already mentioned in my question, the 'browser.Document.documentElement.outerHTML' does not consistently give the correct URL source.
0
 
priya_pbkCommented:
but weren't you referencing it to 'Microsoft Internet Explorer Library' ..is that the same as the component reference to "Microsoft Internet controls"(which i had mentioned), whereby you have to put the web browser manually on your form.

and
>>does not consistently give the correct URL source.
why? what happens, i have used it lot many times. what does it show you?

-priya
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
Richie_SimonettiIT OperationsCommented:
that's couldn't be:
MsgBox WebBrowser1.Document.documentElement.innerHtml
instead?
i used that without problem.
You have to use it in documentcomplete event of webbrowser.
0
 
Anthony PerkinsCommented:
Try this:
Make a reference to MSXML (v2.6, v3 or v4)

Private Function GetHTML() As String
Dim httpObj As MSXML2.XMLHTTP

Set httpObj = New MSXML2.XMLHTTP
With httpObj
  .open "GET", "http://www.msn.com", False
  .send
  GetHTML = .responseText
End With
Set httpObj = Nothing

End Function

Note:  If you are using a prior version to v2.6, than change the code as follows:
Dim httpObj As MSXML.XMLHTTPRequest

Set httpObj = New MSXML.XMLHTTPRequest

Anthony
0
 
AzraSoundCommented:
I'm with Richie, ensure the page is FULLY loaded before attempting to get its source.  If it hasnt loaded yet, there is a chance its source hasnt been completely downloaded yet.
0
 
Richie_SimonettiIT OperationsCommented:
well, if you will not use webbrowser control, you could use inet control instead

function GrabHtml(url as string) as string
dim s as string
s= inet1.openurl(url,icString)
GrabHtml=s
end sub
0
 
RichWCommented:
Here's how you can make sure the source is fully loaded.

WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
   DoEvents
Loop
strText = WebBrowser1.Document.body.innertext
strHTML = WebBrowser1.Document.body.innerhtml

RichW
0
 
Hornet241Commented:
I have had the same problem on trying to get at a logged in internet bankking page that displays my account info.

I think that maybe this is similiar.
0
 
Hornet241Commented:
Sorry, I was trying this way

strHTML = WebBrowser1.Document.body.outerhtml
0
 
Richie_SimonettiIT OperationsCommented:
To state my comment more clear:
' wb1 is a WebBrowser control

Private Sub Form_Load()
WB1.Navigate "wwww.somedomain.com/some/index.html"

End Sub

Private Sub WB1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
If (pDisp Is WB1.Object) Then
     debug.print wb1.document.documentelement.innerhtml
   
End If
End Sub
0
 
Hornet241Commented:
I just got it like this

after the page has opened I needed to get at the frames that the document was filled with


Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To jlobj.frames.length - 1
 Debug.Print jlobj.frames(a).Document.body.outerhtml
Next a

0
 
Hornet241Commented:
Watch the object names - should have been

Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To parentObj.frames.length - 1
    Debug.Print parentObj.frames(a).Document.body.outerhtml
Next a
0
 
Richie_SimonettiIT OperationsCommented:
but we  weren't talking about frames, or i missed something?
0
 
Hornet241Commented:
Frames are about the only reason that I can think of that would result in inconsistent operation.
0
 
Richie_SimonettiIT OperationsCommented:
Sorry, not to me.
If page has frames, docummentelement.innerhtml would shows HTML contents of main document (those "frameset" bunch of things) only.
0
 
aseem_dayalAuthor Commented:
Priya :

1. Yes the 'Microsoft Internet Explorer Library' works  same as the Web-Browser control.

2. The inconsistency that I encountered was when trying to obtain source HTML from pages generated from an exchange OWA server, in certain instances, incase you have access to OWA : the page generated in response to a mail reply does not produce the correct HTML.

Richie Simonetti/AzraSound/RichW :

I have ensured that I access the HTML source only after the 'navigation completed' event occurs.

acperkins :

Will try your suggestion and get back.


Aseem

0
 
aseem_dayalAuthor Commented:
acperkins solution works like a charm !

Not only does provide the information faster than any other methods, it works consistently across all URLS.

To everyone involved in this discussion, I would recommend that they use 'MSXML.XMLHTTP' as a defacto standard for obtaining source URL's.

Thanks for the contributions.

Aseem

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 5
  • 3
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now