Solved

HTML source access

Posted on 2002-07-23
18
206 Views
Last Modified: 2010-05-02
Hi,

I am trying to programatically access the HTML source code for a given URL (equivalent of performing a 'view source' operation on any browser). I have tried using the 'Microsoft Internet Explorer Library' :

browser.Navigate2("[some url]")
source=browser.document.documentElement.outerHTML

however without success, as the information obtained does not come out correctly on a consistent basis (i.e. it does not work incase of certain url's). All help on how to consistently obtain the source for a given url would be appreciated.

Aseem
0
Comment
Question by:aseem_dayal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 3
  • +4
18 Comments
 
LVL 2

Expert Comment

by:priya_pbk
ID: 7171459
Give a reference to Microsoft Internet controls by going to Tools->Components-> and click Microsoft Internet controls

I tried this way(2 command buttons on the form)
Private Sub Command1_Click()
WebBrowser1.Navigate "http://www.experts-exchange.com"
End Sub

Private Sub Command2_Click()
MsgBox WebBrowser1.Document.documentElement.outerHtml
End Sub

-priya
0
 

Author Comment

by:aseem_dayal
ID: 7171498
Hi Priya,

Thanks for the input, but as I have already mentioned in my question, the 'browser.Document.documentElement.outerHTML' does not consistently give the correct URL source.
0
 
LVL 2

Expert Comment

by:priya_pbk
ID: 7171509
but weren't you referencing it to 'Microsoft Internet Explorer Library' ..is that the same as the component reference to "Microsoft Internet controls"(which i had mentioned), whereby you have to put the web browser manually on your form.

and
>>does not consistently give the correct URL source.
why? what happens, i have used it lot many times. what does it show you?

-priya
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7171927
that's couldn't be:
MsgBox WebBrowser1.Document.documentElement.innerHtml
instead?
i used that without problem.
You have to use it in documentcomplete event of webbrowser.
0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 100 total points
ID: 7171969
Try this:
Make a reference to MSXML (v2.6, v3 or v4)

Private Function GetHTML() As String
Dim httpObj As MSXML2.XMLHTTP

Set httpObj = New MSXML2.XMLHTTP
With httpObj
  .open "GET", "http://www.msn.com", False
  .send
  GetHTML = .responseText
End With
Set httpObj = Nothing

End Function

Note:  If you are using a prior version to v2.6, than change the code as follows:
Dim httpObj As MSXML.XMLHTTPRequest

Set httpObj = New MSXML.XMLHTTPRequest

Anthony
0
 
LVL 28

Expert Comment

by:AzraSound
ID: 7172090
I'm with Richie, ensure the page is FULLY loaded before attempting to get its source.  If it hasnt loaded yet, there is a chance its source hasnt been completely downloaded yet.
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172094
well, if you will not use webbrowser control, you could use inet control instead

function GrabHtml(url as string) as string
dim s as string
s= inet1.openurl(url,icString)
GrabHtml=s
end sub
0
 
LVL 4

Expert Comment

by:RichW
ID: 7172449
Here's how you can make sure the source is fully loaded.

WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
   DoEvents
Loop
strText = WebBrowser1.Document.body.innertext
strHTML = WebBrowser1.Document.body.innerhtml

RichW
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172472
I have had the same problem on trying to get at a logged in internet bankking page that displays my account info.

I think that maybe this is similiar.
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172474
Sorry, I was trying this way

strHTML = WebBrowser1.Document.body.outerhtml
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172527
To state my comment more clear:
' wb1 is a WebBrowser control

Private Sub Form_Load()
WB1.Navigate "wwww.somedomain.com/some/index.html"

End Sub

Private Sub WB1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
If (pDisp Is WB1.Object) Then
     debug.print wb1.document.documentelement.innerhtml
   
End If
End Sub
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172599
I just got it like this

after the page has opened I needed to get at the frames that the document was filled with


Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To jlobj.frames.length - 1
 Debug.Print jlobj.frames(a).Document.body.outerhtml
Next a

0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172616
Watch the object names - should have been

Set parentObj = WebBrowser1.Document.parentWindow

For a = 0 To parentObj.frames.length - 1
    Debug.Print parentObj.frames(a).Document.body.outerhtml
Next a
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172664
but we  weren't talking about frames, or i missed something?
0
 
LVL 3

Expert Comment

by:Hornet241
ID: 7172892
Frames are about the only reason that I can think of that would result in inconsistent operation.
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7172960
Sorry, not to me.
If page has frames, docummentelement.innerhtml would shows HTML contents of main document (those "frameset" bunch of things) only.
0
 

Author Comment

by:aseem_dayal
ID: 7173476
Priya :

1. Yes the 'Microsoft Internet Explorer Library' works  same as the Web-Browser control.

2. The inconsistency that I encountered was when trying to obtain source HTML from pages generated from an exchange OWA server, in certain instances, incase you have access to OWA : the page generated in response to a mail reply does not produce the correct HTML.

Richie Simonetti/AzraSound/RichW :

I have ensured that I access the HTML source only after the 'navigation completed' event occurs.

acperkins :

Will try your suggestion and get back.


Aseem

0
 

Author Comment

by:aseem_dayal
ID: 7173634
acperkins solution works like a charm !

Not only does provide the information faster than any other methods, it works consistently across all URLS.

To everyone involved in this discussion, I would recommend that they use 'MSXML.XMLHTTP' as a defacto standard for obtaining source URL's.

Thanks for the contributions.

Aseem

0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I’ve seen a number of people looking for examples of how to access web services from VB6.  I’ve been using a test harness I built in VB6 (using many resources I found online) that I use for small projects to work out how to communicate with web serv…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question