Solved

Retrieving HTML source from a URL

Posted on 2002-07-17
17
278 Views
Last Modified: 2008-03-10
I've tried the INet control, the WebBrowser control, and the MSHTML library without success.  I simply want to pass in a URL to something and get back the entire HTML source which I can then parse through for my own purposes.  Surely there must be an easy way to do this!
0
Comment
Question by:rmayer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +4
17 Comments
 
LVL 4

Expert Comment

by:RichW
ID: 7159986
I use the WebBrowser control.

WebBrowser1.Navigate "http://www.webpage.com"
Do While WebBrowser1.ReadyState < 4 '= READYSTATE_COMPLETE
    DoEvents
Loop
strText = WebBrowser1.Document.body.innertext
strHTML = WebBrowser1.Document.body.innerhtml

One for the text and one for the HTML.

RichW
0
 
LVL 4

Expert Comment

by:Glowman
ID: 7160037
Rmayer,

The Internet Control will work to get the entire HTML source just call the OpenUrl method and assign its result to a string variable.  Like such:
Songs = frmMain.Inet.OpenURL("http://" & IpAddy & ":1214")
where Inet is the Internet Transfer control.  Hope it helps.

G
0
 

Author Comment

by:rmayer
ID: 7160287
To RichW:

Your suggestion suffers the same problem as that of the previous person who suggested using the WebBrowser control.  If I set the Visible property to False, it doesn't work.
0
SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

 

Author Comment

by:rmayer
ID: 7160315
To GlowMan:

Your use of the INet control only returns a small portion of the HTML source, not the entire thing.
0
 

Author Comment

by:rmayer
ID: 7160316
To GlowMan:

Your use of the INet control only returns a small portion of the HTML source, not the entire thing.
0
 
LVL 4

Expert Comment

by:Glowman
ID: 7160331
Rmayer,

It has always worked for me.  What is the url of the site your looking for, maybe I can try here.

G
0
 
LVL 4

Expert Comment

by:RichW
ID: 7160333
Then make the WebBrowser control very small on the form, and cover it with a label object that has the same background color as the form, or at least place it behind another object already on the form, so it's not seen.

0
 
LVL 5

Expert Comment

by:jayeshshah
ID: 7160506
could you tell me how are u using the  INet control. we are using the same and are getting HTML source.
0
 

Expert Comment

by:jsm11482
ID: 7160553
If you are trying to get the source of a page that is server-driven (does the URL end in .asp, .jsp, etc..) then the INet control will ususally NOT work! The WebBrowser control will load the page just as internet explorer would, and you can get the source from a webbrowser control as follows:

strSource=WebBrowser.Document.documentElement.outerhtml

I am working on a project that sounds similar to yours (getting source and parsing it).  Is it possible that you could post the URL that you are trying to get the source of?

Hope this helps!
-Josh
0
 

Expert Comment

by:jsm11482
ID: 7160577
also, if you are using the INET control, make sure that your byte array is large enough to hold all of the source code, see this example:

Private Sub getSourceCode(strURL as String)
  Dim strData() as String

  strData=Inet.OpenURL(strURL,icString)
End Sub

Inet is a WinInet control.  This way, the strData array will grow to meet your needs.  I see no reason why this wouldn't work!

-Josh
0
 
LVL 4

Expert Comment

by:RichW
ID: 7160661
As I already said WebBrowser1.Document.body.innerhtml works fine for me with the WebBrowser control.  I'm able to get all the text and parse it out.  I hide the control on my form, because I don't want anyone seeing it either.

I believe the WebBrowser control would be best for what you want to do.

RichW


0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 7160849
Make a reference to MSXML (v2.6, v3 or v4)
Try this code (it seems to work for me):

Private Sub cmdOK_Click()
Dim httpObj As MSXML2.XMLHTTP

Set httpObj = New MSXML2.XMLHTTP
With httpObj
  .open "GET", "http://www.msn.com", False
  .send
  txtOutput.Text = .responseText
End With
Set httpObj = Nothing

End Sub

Note:  If you are using a prior version to v2.6, than change the code as follows:
Dim httpObj As MSXML.XMLHTTPRequest

Set httpObj = New MSXML.XMLHTTPRequest

Anthony
0
 

Accepted Solution

by:
compwarm earned 100 total points
ID: 7164374
Try this

Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Public Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFile(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then DownloadFile = True
End Function
Private Sub Form_Load()
    DownloadFile "http://www.allapi.net", "c:\allapi.htm"
End Sub
0
 

Author Comment

by:rmayer
ID: 7166254
I've found this suggestion to be the best so far.  While I had some success with the WebBrowser control, it's drawbacks are as follows:  1) the HTML source I captured sometimes differed greatly from what IE's View Source would show me, 2) it has an unnecessary visual aspect that requires the use of some hokey workaround like making it very tiny and/or hiding it behind some other controls, and 3) it requires distribution of an ActiveX control not likely to be found on any given workstation.  This solution uses a function from what I believe is a native Windows DLL and allows me to perfectly reproduce what IE's View Source gives me using a very small amount of code.
0
 

Expert Comment

by:jsm11482
ID: 7166727
You could just set the WebBrowser's visible property to false ya know!
-Josh
0
 

Author Comment

by:rmayer
ID: 7166903
Josh,

I tried that, of course, but the control quits working when you do that.  Try the code first offered by RichW (see above), but first set the Visible property to False and you will see what I mean.
0
 
LVL 4

Expert Comment

by:RichW
ID: 7167174
Yeah, but you could easily hide it on your form.  Using an API call isn't always the best solution, especially if you have to worry about different and older versions of Windows.

Whatever's best for you, I guess.

Cheers,

RichW
0

Featured Post

Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
vb6 connector to SQL Server 2 42
vbModal 12 73
Set WorkSheet  not Working 9 62
Macro problems with Excel file 6 51
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
This article describes some techniques which will make your VBA or Visual Basic Classic code easier to understand and maintain, whether by you, your replacement, or another Experts-Exchange expert.
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question