We help IT Professionals succeed at work.

Urgent : IHTMLDocument2 with Inet control

diptiman
diptiman asked
on
Hi,

I am making an application that will download all tables and it's contents from a site.

I am using INET to download the page on to my disk and then I will try and get all the tables using DOM model.

The problem is that I can do that with WEBBROWSER control (no problem with that one) but because the page might contain graphics which are no use to me, i just want to download the HTML code. that's why I needed INET. but now when the file is on my disk, how can I access the DOM pointer ie. IHTMLDocument2 with the HTM file saved without using the BROWSER control (repeat without BROWSER control)

Thanks for the answer in advance, a code snippet will make it more valuable.

Thanks and regards
Comment
Watch Question

Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
Unfortunatelly, there is a problem with thta since you cannot create a new htmldocument and set its innerhtml to the contents of inet's downloaded file.
May be we could do it in other way?
what's the problem with download pages with webbrowser?
You can specify to not download images.
Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
I know one way you can do something similar to what you want:

' set a reference to Microsoft HTML object library

Option Explicit
Dim IEDoc As HTMLDocument

Private Sub Form_Load()
Set IEDoc = New HTMLDocument
With IEDoc
    MsgBox .documentElement.innerHTML
    .body.innerHTML = "mytext"
    MsgBox .documentElement.innerHTML
End With

End Sub

As you can see, we could write inside Body tag but not inside complete document (Starting at "<HTML>" tag...)

Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
Since EE was down, i have downloaded all my paqs with Webbrowser control and save them to disk. Maybe that could help you. Let me know.
(Is a little slow but you not miss contents.
I have downloaded only html, not images)
Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
Another approach could do using inet to download pages with Inet and navigate each one with webbrowser and using DOM from it.
Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
Hi, how much URGENT is this question ;)?

Author

Commented:
hi richie,

it's really urgent in that way!
the problem is that when i browse a website with webbrowser control i get scripts, images, stylesheets everything downloaded on to the PC that makes a 4 KB HTML source file to well over 20K. we here in india have very slow line speeds. that's why i said that i will use ITC. in that way i will avoid downloading everything since ITC does not download linked resources!

i have already made a program that works on webbrowser, but it takes the same time to FIRE onload() event as a normal IE window. That makes my program useless.

The task is that i hust want to download the page, check it's table contents and then feed it in a database.

Now may be you get my problem.

I can give you expert points if you can tell me a backdoor. as far as i remember there is a C++ code that makes a TLB that makes webbrowser to download only source. if you can arrange that TLB or the c++ source, i can give you another 50 points!

Thanks anyway.
IT Operations
CERTIFIED EXPERT
Commented:
Well, since you need only body contents, we can grab it without any problem.
' set a reference to Microsoft HTML object library
Download those files with inet as you already do.
Add a webbrowser component to your program.

Option Explicit

Function GrabBodyContents(HTMLText As String) As String
Dim lStart As Long, lEnd As Long
Dim strtemp As String
lStart = 1
lStart = InStr(lStart, HTMLText, "<body", vbTextCompare)
lEnd = InStr(lStart, HTMLText, ">", vbTextCompare)
strtemp = Mid$(HTMLText, lEnd + 1)
lEnd = InStr(lStart, strtemp, "</body>", vbTextCompare)
GrabBodyContents = Mid$(strtemp, 1, lEnd - 1)
End Function


Private Sub Form_Load()
WB1.Navigate "about:blank"

End Sub

Private Sub WB1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
If (pDisp Is WB1.Object) Then
    Dim iedoc As HTMLDocument
    Set iedoc = WB1.Document
    Dim ff As Integer
    ff = FreeFile
    ' you could replace hard-coded path and filename with a list of files or
    ' a commondialog...
    Open "c:\yourchoice.htm" For Input As #ff
         'MsgBox GrabBodyContents(Input(LOF(ff), 1))
         iedoc.body.innerHTML = GrabBodyContents(Input(LOF(ff), 1))
    Close #ff
'    Dim tbl As IHTMLTable
'    .........
   
End If
End Sub


This is a raw schemma but we can start from here...

Author

Commented:
I wanted something graceful. That code was not from only webbrowser (remember) you included inet.

But you worked for it and you het points for that.

I am again telling you that there is one MSF which makes a webbrowser to only download the HTML source.

I have the c++ source to the TLB (if you know c++) i can give you the code. It says some exception error and never compiles. If you can rectify that problem i can give you more points.

what you say...
Richie_SimonettiIT Operations
CERTIFIED EXPERT

Commented:
Thanks for "C" grade!

Explore More ContentExplore courses, solutions, and other research materials related to this topic.