aseem_dayal
asked on
HTML source access
Hi,
I am trying to programatically access the HTML source code for a given URL (equivalent of performing a 'view source' operation on any browser). I have tried using the 'Microsoft Internet Explorer Library' :
browser.Navigate2("[some url]")
source=browser.document.do cumentElem ent.outerH TML
however without success, as the information obtained does not come out correctly on a consistent basis (i.e. it does not work incase of certain url's). All help on how to consistently obtain the source for a given url would be appreciated.
Aseem
I am trying to programatically access the HTML source code for a given URL (equivalent of performing a 'view source' operation on any browser). I have tried using the 'Microsoft Internet Explorer Library' :
browser.Navigate2("[some url]")
source=browser.document.do
however without success, as the information obtained does not come out correctly on a consistent basis (i.e. it does not work incase of certain url's). All help on how to consistently obtain the source for a given url would be appreciated.
Aseem
ASKER
Hi Priya,
Thanks for the input, but as I have already mentioned in my question, the 'browser.Document.document Element.ou terHTML' does not consistently give the correct URL source.
Thanks for the input, but as I have already mentioned in my question, the 'browser.Document.document
but weren't you referencing it to 'Microsoft Internet Explorer Library' ..is that the same as the component reference to "Microsoft Internet controls"(which i had mentioned), whereby you have to put the web browser manually on your form.
and
>>does not consistently give the correct URL source.
why? what happens, i have used it lot many times. what does it show you?
-priya
and
>>does not consistently give the correct URL source.
why? what happens, i have used it lot many times. what does it show you?
-priya
that's couldn't be:
MsgBox WebBrowser1.Document.docum entElement .innerHtml
instead?
i used that without problem.
You have to use it in documentcomplete event of webbrowser.
MsgBox WebBrowser1.Document.docum
instead?
i used that without problem.
You have to use it in documentcomplete event of webbrowser.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I'm with Richie, ensure the page is FULLY loaded before attempting to get its source. If it hasnt loaded yet, there is a chance its source hasnt been completely downloaded yet.
well, if you will not use webbrowser control, you could use inet control instead
function GrabHtml(url as string) as string
dim s as string
s= inet1.openurl(url,icString )
GrabHtml=s
end sub
function GrabHtml(url as string) as string
dim s as string
s= inet1.openurl(url,icString
GrabHtml=s
end sub
Here's how you can make sure the source is fully loaded.
WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
DoEvents
Loop
strText = WebBrowser1.Document.body. innertext
strHTML = WebBrowser1.Document.body. innerhtml
RichW
WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
DoEvents
Loop
strText = WebBrowser1.Document.body.
strHTML = WebBrowser1.Document.body.
RichW
I have had the same problem on trying to get at a logged in internet bankking page that displays my account info.
I think that maybe this is similiar.
I think that maybe this is similiar.
Sorry, I was trying this way
strHTML = WebBrowser1.Document.body. outerhtml
strHTML = WebBrowser1.Document.body.
To state my comment more clear:
' wb1 is a WebBrowser control
Private Sub Form_Load()
WB1.Navigate "wwww.somedomain.com/some/index.html"
End Sub
Private Sub WB1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
If (pDisp Is WB1.Object) Then
debug.print wb1.document.documenteleme nt.innerht ml
End If
End Sub
' wb1 is a WebBrowser control
Private Sub Form_Load()
WB1.Navigate "wwww.somedomain.com/some/index.html"
End Sub
Private Sub WB1_DocumentComplete(ByVal
If (pDisp Is WB1.Object) Then
debug.print wb1.document.documenteleme
End If
End Sub
I just got it like this
after the page has opened I needed to get at the frames that the document was filled with
Set parentObj = WebBrowser1.Document.paren tWindow
For a = 0 To jlobj.frames.length - 1
Debug.Print jlobj.frames(a).Document.b ody.outerh tml
Next a
after the page has opened I needed to get at the frames that the document was filled with
Set parentObj = WebBrowser1.Document.paren
For a = 0 To jlobj.frames.length - 1
Debug.Print jlobj.frames(a).Document.b
Next a
Watch the object names - should have been
Set parentObj = WebBrowser1.Document.paren tWindow
For a = 0 To parentObj.frames.length - 1
Debug.Print parentObj.frames(a).Docume nt.body.ou terhtml
Next a
Set parentObj = WebBrowser1.Document.paren
For a = 0 To parentObj.frames.length - 1
Debug.Print parentObj.frames(a).Docume
Next a
but we weren't talking about frames, or i missed something?
Frames are about the only reason that I can think of that would result in inconsistent operation.
Sorry, not to me.
If page has frames, docummentelement.innerhtml would shows HTML contents of main document (those "frameset" bunch of things) only.
If page has frames, docummentelement.innerhtml
ASKER
Priya :
1. Yes the 'Microsoft Internet Explorer Library' works same as the Web-Browser control.
2. The inconsistency that I encountered was when trying to obtain source HTML from pages generated from an exchange OWA server, in certain instances, incase you have access to OWA : the page generated in response to a mail reply does not produce the correct HTML.
Richie Simonetti/AzraSound/RichW :
I have ensured that I access the HTML source only after the 'navigation completed' event occurs.
acperkins :
Will try your suggestion and get back.
Aseem
1. Yes the 'Microsoft Internet Explorer Library' works same as the Web-Browser control.
2. The inconsistency that I encountered was when trying to obtain source HTML from pages generated from an exchange OWA server, in certain instances, incase you have access to OWA : the page generated in response to a mail reply does not produce the correct HTML.
Richie Simonetti/AzraSound/RichW :
I have ensured that I access the HTML source only after the 'navigation completed' event occurs.
acperkins :
Will try your suggestion and get back.
Aseem
ASKER
acperkins solution works like a charm !
Not only does provide the information faster than any other methods, it works consistently across all URLS.
To everyone involved in this discussion, I would recommend that they use 'MSXML.XMLHTTP' as a defacto standard for obtaining source URL's.
Thanks for the contributions.
Aseem
Not only does provide the information faster than any other methods, it works consistently across all URLS.
To everyone involved in this discussion, I would recommend that they use 'MSXML.XMLHTTP' as a defacto standard for obtaining source URL's.
Thanks for the contributions.
Aseem
I tried this way(2 command buttons on the form)
Private Sub Command1_Click()
WebBrowser1.Navigate "https://www.experts-exchange.com"
End Sub
Private Sub Command2_Click()
MsgBox WebBrowser1.Document.docum
End Sub
-priya